bq. This was a big help! The email (maybe only addressed to you) didn't come with your latest reply.
Do you mind sharing it ? Thanks On Fri, Apr 1, 2016 at 11:37 AM, ludflu <lud...@gmail.com> wrote: > This was a big help! For the benefit of my fellow travelers running spark > on > EMR: > > I made a json file with the following: > > [ { "Classification": "yarn-site", "Properties": { > "yarn.nodemanager.pmem-check-enabled": "false", > "yarn.nodemanager.vmem-check-enabled": "false" } } ] > > and then I created my cluster like so: > > aws emr create-cluster --configurations > file:///Users/jsnavely/project/frick/spark_config/nomem.json > ... > > The other thing I noticed was that one of the dataframes I was joining > against was actually coming from > a gzip'd json file. gzip files are NOT splittable, so it wasn't properly > parallelized, which means that the join were causing alot of memory > pressure. I recompressed it was bzip2 and my job has been running with no > errors. > > Thanks again! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-with-wide-289-column-dataframe-tp26651p26660.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >