Thanks for the link. Unfortunately, I turned on rdd compression and nothing changed. I tried moving netty -> nio and no change :(
On Thu, Feb 26, 2015 at 2:01 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Not many that i know of, but i bumped into this one > https://issues.apache.org/jira/browse/SPARK-4516 > > Thanks > Best Regards > > On Thu, Feb 26, 2015 at 3:26 PM, Victor Tso-Guillen <v...@paxata.com> > wrote: > >> Is there any potential problem from 1.1.1 to 1.2.1 with shuffle >> dependencies that produce no data? >> >> On Thu, Feb 26, 2015 at 1:56 AM, Victor Tso-Guillen <v...@paxata.com> >> wrote: >> >>> The data is small. The job is composed of many small stages. >>> >>> * I found that with fewer than 222 the problem exhibits. What will be >>> gained by going higher? >>> * Pushing up the parallelism only pushes up the boundary at which the >>> system appears to hang. I'm worried about some sort of message loss or >>> inconsistency. >>> * Yes, we are using Kryo. >>> * I'll try that, but I'm again a little confused why you're recommending >>> this. I'm stumped so might as well? >>> >>> On Wed, Feb 25, 2015 at 11:13 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> What operation are you trying to do and how big is the data that you >>>> are operating on? >>>> >>>> Here's a few things which you can try: >>>> >>>> - Repartition the RDD to a higher number than 222 >>>> - Specify the master as local[*] or local[10] >>>> - Use Kryo Serializer (.set("spark.serializer", >>>> "org.apache.spark.serializer.KryoSerializer")) >>>> - Enable RDD Compression (.set("spark.rdd.compress","true") ) >>>> >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Thu, Feb 26, 2015 at 10:15 AM, Victor Tso-Guillen <v...@paxata.com> >>>> wrote: >>>> >>>>> I'm getting this really reliably on Spark 1.2.1. Basically I'm in >>>>> local mode with parallelism at 8. I have 222 tasks and I never seem to get >>>>> far past 40. Usually in the 20s to 30s it will just hang. The last logging >>>>> is below, and a screenshot of the UI. >>>>> >>>>> 2015-02-25 20:39:55.779 GMT-0800 INFO [task-result-getter-3] >>>>> TaskSetManager - Finished task 3.0 in stage 16.0 (TID 22) in 612 ms on >>>>> localhost (1/5) >>>>> 2015-02-25 20:39:55.825 GMT-0800 INFO [Executor task launch >>>>> worker-10] Executor - Finished task 1.0 in stage 16.0 (TID 20). 2492 bytes >>>>> result sent to driver >>>>> 2015-02-25 20:39:55.825 GMT-0800 INFO [Executor task launch worker-8] >>>>> Executor - Finished task 2.0 in stage 16.0 (TID 21). 2492 bytes result >>>>> sent >>>>> to driver >>>>> 2015-02-25 20:39:55.831 GMT-0800 INFO [task-result-getter-0] >>>>> TaskSetManager - Finished task 1.0 in stage 16.0 (TID 20) in 670 ms on >>>>> localhost (2/5) >>>>> 2015-02-25 20:39:55.836 GMT-0800 INFO [task-result-getter-1] >>>>> TaskSetManager - Finished task 2.0 in stage 16.0 (TID 21) in 674 ms on >>>>> localhost (3/5) >>>>> 2015-02-25 20:39:55.891 GMT-0800 INFO [Executor task launch worker-9] >>>>> Executor - Finished task 0.0 in stage 16.0 (TID 19). 2492 bytes result >>>>> sent >>>>> to driver >>>>> 2015-02-25 20:39:55.896 GMT-0800 INFO [task-result-getter-2] >>>>> TaskSetManager - Finished task 0.0 in stage 16.0 (TID 19) in 740 ms on >>>>> localhost (4/5) >>>>> >>>>> [image: Inline image 1] >>>>> What should I make of this? Where do I start? >>>>> >>>>> Thanks, >>>>> Victor >>>>> >>>> >>>> >>> >> >