Re: How to specify column type when saving DataFrame as parquet file?

2015-08-14 Thread Francis Lau
Jyun Fan Here is how I have been doing it. I found that I needed to define the schema when loading the JSON file first Francis import datetime from pyspark.sql.types import * # Define schema upSchema = StructType([ StructField("field 1", StringType(), True), StructField("field 2", LongType(

Re: How to specify column type when saving DataFrame as parquet file?

2015-08-14 Thread Raghavendra Pandey
I think you can try dataFrame create api that takes RDD[Row] and Struct type... On Aug 11, 2015 4:28 PM, "Jyun-Fan Tsai" wrote: > Hi all, > I'm using Spark 1.4.1. I create a DataFrame from json file. There is > a column C that all values are null in the json file. I found that > the datatype o

How to specify column type when saving DataFrame as parquet file?

2015-08-11 Thread Jyun-Fan Tsai
Hi all, I'm using Spark 1.4.1. I create a DataFrame from json file. There is a column C that all values are null in the json file. I found that the datatype of column C in the created DataFrame is string. However, I would like to specify the column as Long when saving it as parquet file. What