For what it's worth, my data set has around 85 columns in Parquet format as
well. I have tried bumping the permgen up to 512m but I'm still getting
errors in the driver thread.
On Wed, Jul 22, 2015 at 1:20 PM, Jerry Lam wrote:
> Hi guys,
>
> I noticed that too. Anders, can you confirm that it wo
Hi guys,
I noticed that too. Anders, can you confirm that it works on Spark 1.5
snapshot? This is what I tried at the end. It seems it is 1.4 issue.
Best Regards,
Jerry
On Wed, Jul 22, 2015 at 11:46 AM, Anders Arpteg wrote:
> No, never really resolved the problem, except by increasing the per
No, never really resolved the problem, except by increasing the permgem
space which only partially solved it. Still have to restart the job
multiple times so make the whole job complete (it stores intermediate
results).
The parquet data sources have about 70 columns, and yes Cheng, it works
fine w
How many columns are there in these Parquet files? Could you load a
small portion of the original large dataset successfully?
Cheng
On 6/25/15 5:52 PM, Anders Arpteg wrote:
Yes, both the driver and the executors. Works a little bit better with
more space, but still a leak that will cause fai
Hi Anders,
Did you ever get to the bottom of this issue? I'm encountering it too, but
only in "yarn-cluster" mode running on spark 1.4.0. I was thinking of
trying 1.4.1 today.
Michael
On Thu, Jun 25, 2015 at 5:52 AM, Anders Arpteg wrote:
> Yes, both the driver and the executors. Works a little
Yes, both the driver and the executors. Works a little bit better with more
space, but still a leak that will cause failure after a number of reads.
There are about 700 different data sources that needs to be loaded, lots of
data...
tor 25 jun 2015 08:02 Sabarish Sasidharan
skrev:
> Did you try
Did you try increasing the perm gen for the driver?
Regards
Sab
On 24-Jun-2015 4:40 pm, "Anders Arpteg" wrote:
> When reading large (and many) datasets with the Spark 1.4.0 DataFrames
> parquet reader (the org.apache.spark.sql.parquet format), the following
> exceptions are thrown:
>
> Exception
When reading large (and many) datasets with the Spark 1.4.0 DataFrames
parquet reader (the org.apache.spark.sql.parquet format), the following
exceptions are thrown:
Exception in thread "task-result-getter-0"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread