Re: reading csv file, operation on column or columns

Mich Talebzadeh Sun, 20 Mar 2016 15:38:59 -0700

Pretty straight forward as pointed out by Ted.

--read csv file into a df
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("/data/stg/table2")


scala> df.printSchema
root
 |-- Invoice Number: string (nullable = true)
 |-- Payment date: string (nullable = true)
 |-- Net: string (nullable = true)
 |-- VAT: string (nullable = true)
 |-- Total: string (nullable = true)
--
--rename the first column as InvoiceNumber getting rid of space
--
scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber")
df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
date: string, Net: string, VAT: string, Total: string]
--
--drop column Total
--
scala> val df_2 = df_1.drop("Total")
df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
date: string, Net: string, VAT: string]
--
-- Change InvoiceNumber from String to Integer
--
scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer")
df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string,
VAT: string, InvoiceNumber: int]


HTH













Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 March 2016 at 22:15, Ted Yu <[email protected]> wrote:

> Please refer to the following methods of DataFrame:
>
>   def withColumn(colName: String, col: Column): DataFrame = {
>
>   def drop(colName: String): DataFrame = {
>
> On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar <[email protected]
> > wrote:
>
>> Gurus,
>>
>> I would like to read a csv file into a Data Frame but able to rename the
>> column name, change a column type from String to Integer or drop the column
>> from further analysis before saving data as parquet file?
>>
>> Thanks
>>
>
>

Re: reading csv file, operation on column or columns

Reply via email to