I would second the suggest that one of the spark committers weigh in. Many times the repartition() command fails, no matter how many times I run it.
This is more of an 0.x behavior than a 1.0.2 behavior. anyone? Dale. On 10/8/14, 1:06 AM, "Paul Wais" <pw...@yelp.com> wrote: >Looks like an OOM issue? Have you tried persisting your RDDs to allow >disk writes? > >I've seen a lot of similar crashes in a Spark app that reads from HDFS >and does joins. I.e. I've seen "java.io.IOException: Filesystem >closed," "Executor lost," "FetchFailed," etc etc with >non-deterministic crashes. I've tried persisting RDDs, tuning other >params, and verifying that the Executor JVMs don't come close to their >max allocated memory during operation. > >Looking through user@ tonight, there are a ton of email threads with >similar crashes and no answers. It looks like a lot of people are >struggling with OOMs. > >Could one of the Spark committers please comment on this thread, or >one of the other unanswered threads with similar crashes? Is this >simply how Spark behaves if Executors OOM? What can the user do other >than increase memory or reduce RDD size? (And how can one deduce how >much of either is needed?) > >One general workaround for OOMs could be to programmatically break the >job input (i.e. from HDFS, input from #parallelize() ) into chunks, >and only create/process RDDs related to one chunk at a time. However, >this approach has the limitations of Spark Streaming and no formal >library support. What might be nice is that if tasks fail, Spark >could try to re-partition in order to avoid OOMs. > > > >On Fri, Oct 3, 2014 at 2:55 AM, jamborta <jambo...@gmail.com> wrote: >> I have two nodes with 96G ram 16 cores, my setup is as follows: >> >> conf = (SparkConf() >> .setMaster("yarn-cluster") >> .set("spark.executor.memory", "30G") >> .set("spark.cores.max", 32) >> .set("spark.executor.instances", 2) >> .set("spark.executor.cores", 8) >> .set("spark.akka.timeout", 10000) >> .set("spark.akka.askTimeout", 100) >> .set("spark.akka.frameSize", 500) >> .set("spark.cleaner.ttl", 86400) >> .set("spark.tast.maxFailures", 16) >> .set("spark.worker.timeout", 150) >> >> thanks a lot, >> >> >> >> >> -- >> View this message in context: >>http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repar >>tition-tp13462p15674.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org