Re: OutOfMemory with wide (289 column) dataframe

Ted Yu Fri, 01 Apr 2016 12:12:23 -0700

bq. This was a big help!

The email (maybe only addressed to you) didn't come with your latest reply.


Do you mind sharing it ?

Thanks

On Fri, Apr 1, 2016 at 11:37 AM, ludflu <lud...@gmail.com> wrote:

> This was a big help! For the benefit of my fellow travelers running spark
> on
> EMR:
>
> I made a json file with the following:
>
> [ { "Classification": "yarn-site", "Properties": {
> "yarn.nodemanager.pmem-check-enabled": "false",
> "yarn.nodemanager.vmem-check-enabled": "false" } } ]
>
> and then I created my cluster like so:
>
> aws emr create-cluster --configurations
> file:///Users/jsnavely/project/frick/spark_config/nomem.json
> ...
>
> The other thing I noticed was that one of the dataframes I was joining
> against was actually coming from
> a gzip'd json file. gzip files are NOT splittable, so it wasn't properly
> parallelized, which means that the join were causing alot of memory
> pressure. I recompressed it was bzip2 and my job has been running with no
> errors.
>
> Thanks again!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-with-wide-289-column-dataframe-tp26651p26660.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: OutOfMemory with wide (289 column) dataframe

Reply via email to