Mich: Looks like convertColumn() is method of your own - I don't see it in Spark code base.
On Sun, Mar 20, 2016 at 3:38 PM, Mich Talebzadeh <[email protected]> wrote: > Pretty straight forward as pointed out by Ted. > > --read csv file into a df > val df = > sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", > "true").option("header", "true").load("/data/stg/table2") > > scala> df.printSchema > root > |-- Invoice Number: string (nullable = true) > |-- Payment date: string (nullable = true) > |-- Net: string (nullable = true) > |-- VAT: string (nullable = true) > |-- Total: string (nullable = true) > -- > --rename the first column as InvoiceNumber getting rid of space > -- > scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber") > df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment > date: string, Net: string, VAT: string, Total: string] > -- > --drop column Total > -- > scala> val df_2 = df_1.drop("Total") > df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment > date: string, Net: string, VAT: string] > -- > -- Change InvoiceNumber from String to Integer > -- > scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer") > df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string, > VAT: string, InvoiceNumber: int] > > > HTH > > > > > > > > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 20 March 2016 at 22:15, Ted Yu <[email protected]> wrote: > >> Please refer to the following methods of DataFrame: >> >> def withColumn(colName: String, col: Column): DataFrame = { >> >> def drop(colName: String): DataFrame = { >> >> On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar < >> [email protected]> wrote: >> >>> Gurus, >>> >>> I would like to read a csv file into a Data Frame but able to rename the >>> column name, change a column type from String to Integer or drop the column >>> from further analysis before saving data as parquet file? >>> >>> Thanks >>> >> >> >
