Re: Writing wide parquet file in Spark SQL

2015-03-15 Thread Cheng Lian
This article by Ryan Blue should be helpful to understand the problem http://ingest.tips/2015/01/31/parquet-row-group-size/ The TL;DR is, you may decrease |parquet.block.size| to reduce memory consumption. Anyway, 100K columns is a really big burden for Parquet, but I guess your data should

Re: Writing wide parquet file in Spark SQL

2015-03-11 Thread Ravindra
Even I am keen to learn an answer for this but as an alternate you can use hive to create a table stored as parquet and then use it in spark. On Wed, Mar 11, 2015 at 1:44 AM kpeng1 kpe...@gmail.com wrote: Hi All, I am currently trying to write a very wide file into parquet using spark sql.