Re: Writing wide parquet file in Spark SQL

2015-03-15 Thread Cheng Lian
This article by Ryan Blue should be helpful to understand the problem 
http://ingest.tips/2015/01/31/parquet-row-group-size/


The TL;DR is, you may decrease |parquet.block.size| to reduce memory 
consumption. Anyway, 100K columns is a really big burden for Parquet, 
but I guess your data should be pretty sparse.


Cheng

On 3/11/15 4:13 AM, kpeng1 wrote:


Hi All,

I am currently trying to write a very wide file into parquet using spark
sql.  I have 100K column records that I am trying to write out, but of
course I am running into space issues(out of memory - heap space).  I was
wondering if there are any tweaks or work arounds for this.

I am basically calling saveAsParquetFile on the schemaRDD.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



​


Re: Writing wide parquet file in Spark SQL

2015-03-11 Thread Ravindra
Even I am keen to learn an answer for this but as an alternate you can use
hive to create a table stored as parquet and then use it in spark.

On Wed, Mar 11, 2015 at 1:44 AM kpeng1 kpe...@gmail.com wrote:

 Hi All,

 I am currently trying to write a very wide file into parquet using spark
 sql.  I have 100K column records that I am trying to write out, but of
 course I am running into space issues(out of memory - heap space).  I was
 wondering if there are any tweaks or work arounds for this.

 I am basically calling saveAsParquetFile on the schemaRDD.





 --
 View this message in context: http://apache-spark-user-list.
 1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Writing wide parquet file in Spark SQL

2015-03-10 Thread kpeng1
Hi All,

I am currently trying to write a very wide file into parquet using spark
sql.  I have 100K column records that I am trying to write out, but of
course I am running into space issues(out of memory - heap space).  I was
wondering if there are any tweaks or work arounds for this.

I am basically calling saveAsParquetFile on the schemaRDD.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org