Re: AvroParquetWriter equivalent in Spark 1.3 sqlContext Save or createDataFrame Interfaces?

Cheng Lian Tue, 19 May 2015 04:00:08 -0700

That's right. Also, Spark SQL can automatically infer schema from JSONdatasets. You don't need to specify an Avro schema:


sqlContext.jsonFile("json/path").saveAsParquetFile("parquet/path")


or with the new reader/writer API introduced in 1.4-SNAPSHOT:

   sqlContext.read.json("json/path").write.parquet("parquet/path")

Cheng

On 5/19/15 6:07 PM, Ewan Leith wrote:

Thanks Cheng, that makes sense.
So for new dataframe creation (not conversion from Avro but from JSONor CSV inputs) in Spark we shouldn’t worry about using Avro at all,just use the Spark SQL StructType when building new Dataframes? If so,that will be a lot simpler!
Thanks,

Ewan

*From:*Cheng Lian [mailto:[email protected]]
*Sent:* 19 May 2015 11:01
*To:* Ewan Leith; [email protected]
*Subject:* Re: AvroParquetWriter equivalent in Spark 1.3 sqlContextSave or createDataFrame Interfaces?
Hi Ewan,
Different from AvroParquetWriter, in Spark SQL we uses StructType asthe intermediate schema format. So when converting Avro files toParquet files, we internally converts Avro schema to Spark SQLStructType first, and then convert StructType to Parquet schema.
Cheng

On 5/19/15 4:42 PM, Ewan Leith wrote:

    Hi all,

    I might be missing something, but does the new Spark 1.3
    sqlContext save interface support using Avro as the schema
    structure when writing Parquet files, in a similar way to
    AvroParquetWriter (which I’ve got working)?

    I've seen how you can load an avro file and save it as parquet
    
fromhttps://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html,
    but not using the 2 together.

    Thanks, and apologies if I've missed something obvious!

    Ewan

Re: AvroParquetWriter equivalent in Spark 1.3 sqlContext Save or createDataFrame Interfaces?

Reply via email to