Re: Any issues with repartition?

2014-10-09 Thread Akhil Das
After a bit of research, i figured out that the one of the worker was hung
on cleaning up GC and the connection usually times out since the default is
60Seconds, so i set it to a higher number and it eliminated this issue. You
may want to try this:

sc.set(spark.core.connection.ack.wait.timeout,600)
sc.set(spark.akka.frameSize,50)


Thanks
Best Regards

On Wed, Oct 8, 2014 at 6:06 PM, jamborta jambo...@gmail.com wrote:

 I am still puzzled on this. In my case the data is allowed to write to
 disk,
 and I usually get different errors if it is out of memory.

 My guess is that akka kills the executors for some reason.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15929.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Any issues with repartition?

2014-10-08 Thread Paul Wais
Looks like an OOM issue?  Have you tried persisting your RDDs to allow
disk writes?

I've seen a lot of similar crashes in a Spark app that reads from HDFS
and does joins.  I.e. I've seen java.io.IOException: Filesystem
closed, Executor lost, FetchFailed, etc etc with
non-deterministic crashes.  I've tried persisting RDDs, tuning other
params, and verifying that the Executor JVMs don't come close to their
max allocated memory during operation.

Looking through user@ tonight, there are a ton of email threads with
similar crashes and no answers.  It looks like a lot of people are
struggling with OOMs.

Could one of the Spark committers please comment on this thread, or
one of the other unanswered threads with similar crashes?  Is this
simply how Spark behaves if Executors OOM?  What can the user do other
than increase memory or reduce RDD size?  (And how can one deduce how
much of either is needed?)

One general workaround for OOMs could be to programmatically break the
job input (i.e. from HDFS, input from #parallelize() ) into chunks,
and only create/process RDDs related to one chunk at a time.  However,
this approach has the limitations of Spark Streaming and no formal
library support.  What might be nice is that if tasks fail, Spark
could try to re-partition in order to avoid OOMs.



On Fri, Oct 3, 2014 at 2:55 AM, jamborta jambo...@gmail.com wrote:
 I have two nodes with 96G ram 16 cores, my setup is as follows:

 conf = (SparkConf()
 .setMaster(yarn-cluster)
 .set(spark.executor.memory, 30G)
 .set(spark.cores.max, 32)
 .set(spark.executor.instances, 2)
 .set(spark.executor.cores, 8)
 .set(spark.akka.timeout, 1)
 .set(spark.akka.askTimeout, 100)
 .set(spark.akka.frameSize, 500)
 .set(spark.cleaner.ttl, 86400)
 .set(spark.tast.maxFailures, 16)
 .set(spark.worker.timeout, 150)

 thanks a lot,




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15674.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Any issues with repartition?

2014-10-03 Thread Akhil Das
What is your cluster setup? and how much memory are you allocating to the
executor?

Thanks
Best Regards

On Fri, Oct 3, 2014 at 7:52 AM, jamborta jambo...@gmail.com wrote:

 Hi Arun,

 Have you found a solution? Seems that I have the same problem.

 thanks,



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15654.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Any issues with repartition?

2014-10-03 Thread jamborta
I have two nodes with 96G ram 16 cores, my setup is as follows:

conf = (SparkConf()
.setMaster(yarn-cluster)
.set(spark.executor.memory, 30G)
.set(spark.cores.max, 32)
.set(spark.executor.instances, 2)
.set(spark.executor.cores, 8)
.set(spark.akka.timeout, 1)
.set(spark.akka.askTimeout, 100)
.set(spark.akka.frameSize, 500)
.set(spark.cleaner.ttl, 86400)
.set(spark.tast.maxFailures, 16)
.set(spark.worker.timeout, 150)

thanks a lot,




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Any issues with repartition?

2014-10-02 Thread jamborta
Hi Arun,

Have you found a solution? Seems that I have the same problem.

thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15654.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org