Re: reading csv file, operation on column or columns

Ted Yu Sun, 20 Mar 2016 15:49:02 -0700

Mich:
Looks like convertColumn() is method of your own - I don't see it in Spark
code base.


On Sun, Mar 20, 2016 at 3:38 PM, Mich Talebzadeh <[email protected]>
wrote:

> Pretty straight forward as pointed out by Ted.
>
> --read csv file into a df
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header", "true").load("/data/stg/table2")
>
> scala> df.printSchema
> root
>  |-- Invoice Number: string (nullable = true)
>  |-- Payment date: string (nullable = true)
>  |-- Net: string (nullable = true)
>  |-- VAT: string (nullable = true)
>  |-- Total: string (nullable = true)
> --
> --rename the first column as InvoiceNumber getting rid of space
> --
> scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber")
> df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
> date: string, Net: string, VAT: string, Total: string]
> --
> --drop column Total
> --
> scala> val df_2 = df_1.drop("Total")
> df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
> date: string, Net: string, VAT: string]
> --
> -- Change InvoiceNumber from String to Integer
> --
> scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer")
> df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string,
> VAT: string, InvoiceNumber: int]
>
>
> HTH
>
>
>
>
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 March 2016 at 22:15, Ted Yu <[email protected]> wrote:
>
>> Please refer to the following methods of DataFrame:
>>
>>   def withColumn(colName: String, col: Column): DataFrame = {
>>
>>   def drop(colName: String): DataFrame = {
>>
>> On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar <
>> [email protected]> wrote:
>>
>>> Gurus,
>>>
>>> I would like to read a csv file into a Data Frame but able to rename the
>>> column name, change a column type from String to Integer or drop the column
>>> from further analysis before saving data as parquet file?
>>>
>>> Thanks
>>>
>>
>>
>

Re: reading csv file, operation on column or columns

Reply via email to