Pretty straight forward as pointed out by Ted. --read csv file into a df val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("/data/stg/table2")
scala> df.printSchema root |-- Invoice Number: string (nullable = true) |-- Payment date: string (nullable = true) |-- Net: string (nullable = true) |-- VAT: string (nullable = true) |-- Total: string (nullable = true) -- --rename the first column as InvoiceNumber getting rid of space -- scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber") df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment date: string, Net: string, VAT: string, Total: string] -- --drop column Total -- scala> val df_2 = df_1.drop("Total") df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment date: string, Net: string, VAT: string] -- -- Change InvoiceNumber from String to Integer -- scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer") df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string, VAT: string, InvoiceNumber: int] HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 20 March 2016 at 22:15, Ted Yu <yuzhih...@gmail.com> wrote: > Please refer to the following methods of DataFrame: > > def withColumn(colName: String, col: Column): DataFrame = { > > def drop(colName: String): DataFrame = { > > On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar <ashok34...@yahoo.com.invalid > > wrote: > >> Gurus, >> >> I would like to read a csv file into a Data Frame but able to rename the >> column name, change a column type from String to Integer or drop the column >> from further analysis before saving data as parquet file? >> >> Thanks >> > >