This article by Ryan Blue should be helpful to understand the problem
http://ingest.tips/2015/01/31/parquet-row-group-size/
The TL;DR is, you may decrease |parquet.block.size| to reduce memory
consumption. Anyway, 100K columns is a really big burden for Parquet,
but I guess your data should
Even I am keen to learn an answer for this but as an alternate you can use
hive to create a table stored as parquet and then use it in spark.
On Wed, Mar 11, 2015 at 1:44 AM kpeng1 kpe...@gmail.com wrote:
Hi All,
I am currently trying to write a very wide file into parquet using spark
sql.