Pretty straight forward as pointed out by Ted.
--read csv file into a df
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("/data/stg/table2")
scala> df.printSchema
root
|-- Invoice Number: string (nullable = true)
|-- Payment date: string (nullable = true)
|-- Net: string (nullable = true)
|-- VAT: string (nullable = true)
|-- Total: string (nullable = true)
--
--rename the first column as InvoiceNumber getting rid of space
--
scala> val df_1 = df.withColumnRenamed("Invoice Number","InvoiceNumber")
df_1: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
date: string, Net: string, VAT: string, Total: string]
--
--drop column Total
--
scala> val df_2 = df_1.drop("Total")
df_2: org.apache.spark.sql.DataFrame = [InvoiceNumber: string, Payment
date: string, Net: string, VAT: string]
--
-- Change InvoiceNumber from String to Integer
--
scala> val df_3 = convertColumn(df_2, "InvoiceNumber","Integer")
df_3: org.apache.spark.sql.DataFrame = [Payment date: string, Net: string,
VAT: string, InvoiceNumber: int]
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 20 March 2016 at 22:15, Ted Yu <[email protected]> wrote:
> Please refer to the following methods of DataFrame:
>
> def withColumn(colName: String, col: Column): DataFrame = {
>
> def drop(colName: String): DataFrame = {
>
> On Sun, Mar 20, 2016 at 2:47 PM, Ashok Kumar <[email protected]
> > wrote:
>
>> Gurus,
>>
>> I would like to read a csv file into a Data Frame but able to rename the
>> column name, change a column type from String to Integer or drop the column
>> from further analysis before saving data as parquet file?
>>
>> Thanks
>>
>
>