Re: reading csv file, operation on column or columns

2016-03-20 Thread Mich Talebzadeh
Apologies. Good point def convertColumn(df: org.apache.spark.sql.DataFrame, name:String, newType:String) = { | val df_1 = df.withColumnRenamed(name, "ConvertColumn") | df_1.withColumn(name, df_1.col("ConvertColumn").cast(newType)).drop("ConvertColumn") | } val df_3 =

Re: reading csv file, operation on column or columns

2016-03-20 Thread Ted Yu
Mich: Looks like convertColumn() is method of your own - I don't see it in Spark code base. On Sun, Mar 20, 2016 at 3:38 PM, Mich Talebzadeh wrote: > Pretty straight forward as pointed out by Ted. > > --read csv file into a df > val df = >

Re: reading csv file, operation on column or columns

2016-03-20 Thread Mich Talebzadeh
Pretty straight forward as pointed out by Ted. --read csv file into a df val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("/data/stg/table2") scala> df.printSchema root |-- Invoice Number: string (nullable = true) |--

Re: reading csv file, operation on column or columns

2016-03-20 Thread Ted Yu
Please refer to the following methods of DataFrame: def withColumn(colName: String, col: Column): DataFrame = { def drop(colName: String): DataFrame = { On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar wrote: > Gurus, > > I would like to read a csv file into a

reading csv file, operation on column or columns

2016-03-20 Thread Ashok Kumar
Gurus, I would like to read a csv file into a Data Frame but able to rename the column name, change a column type from String to Integer or drop the column from further analysis before saving data as parquet file? Thanks