Apologies. Good point
def convertColumn(df: org.apache.spark.sql.DataFrame, name:String,
newType:String) = {
| val df_1 = df.withColumnRenamed(name, "ConvertColumn")
| df_1.withColumn(name,
df_1.col("ConvertColumn").cast(newType)).drop("ConvertColumn")
| }
val df_3 = convertColumn(df_2, "InvoiceNumber","Integer")
df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string,
VAT: string, InvoiceNumber: int]
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 20 March 2016 at 22:48, Ted Yu <[email protected]> wrote:
> Mich:
> Looks like convertColumn() is method of your own - I don't see it in Spark
> code base.
>
> On Sun, Mar 20, 2016 at 3:38 PM, Mich Talebzadeh <
> [email protected]> wrote:
>
>> Pretty straight forward as pointed out by Ted.
>>
>> --read csv file into a df
>> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
>> "true").option("header", "true").load("/data/stg/table2")
>>
>> scala> df.printSchema
>> root
>> |-- Invoice Number: string (nullable = true)
>> |-- Payment date: string (nullable = true)
>> |-- Net: string (nullable = true)
>> |-- VAT: string (nullable = true)
>> |-- Total: string (nullable = true)
>> --
>> --rename the first column as InvoiceNumber getting rid of space
>> --
>> scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber")
>> df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
>> date: string, Net: string, VAT: string, Total: string]
>> --
>> --drop column Total
>> --
>> scala> val df_2 = df_1.drop("Total")
>> df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
>> date: string, Net: string, VAT: string]
>> --
>> -- Change InvoiceNumber from String to Integer
>> --
>> scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer")
>> df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net:
>> string, VAT: string, InvoiceNumber: int]
>>
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 20 March 2016 at 22:15, Ted Yu <[email protected]> wrote:
>>
>>> Please refer to the following methods of DataFrame:
>>>
>>> def withColumn(colName: String, col: Column): DataFrame = {
>>>
>>> def drop(colName: String): DataFrame = {
>>>
>>> On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar <
>>> [email protected]> wrote:
>>>
>>>> Gurus,
>>>>
>>>> I would like to read a csv file into a Data Frame but able to rename
>>>> the column name, change a column type from String to Integer or drop the
>>>> column from further analysis before saving data as parquet file?
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>