subject:"\"Re\\\: Missing shuffle files\""

Re: Missing shuffle files

2015-02-28 Thread Corey Nolet

Just wanted to point out- raising the memory-head (as I saw in the logs) was the fix for this issue and I have not seen dying executors since this calue was increased On Tue, Feb 24, 2015 at 3:52 AM, Anders Arpteg wrote: > If you thinking of the yarn memory overhead, then yes, I have increased >

Re: Missing shuffle files

2015-02-24 Thread Anders Arpteg

If you thinking of the yarn memory overhead, then yes, I have increased that as well. However, I'm glad to say that my job finished successfully finally. Besides the timeout and memory settings, performing repartitioning (with shuffling) at the right time seems to be the key to make this large job

Re: Missing shuffle files

2015-02-23 Thread Corey Nolet

I *think* this may have been related to the default memory overhead setting being too low. I raised the value to 1G it and tried my job again but i had to leave the office before it finished. It did get further but I'm not exactly sure if that's just because i raised the memory. I'll see tomorrow-

Re: Missing shuffle files

2015-02-23 Thread Corey Nolet

I've got the opposite problem with regards to partitioning. I've got over 6000 partitions for some of these RDDs which immediately blows the heap somehow- I'm still not exactly sure how. If I coalesce them down to about 600-800 partitions, I get the problems where the executors are dying without an

Re: Missing shuffle files

2015-02-23 Thread Anders Arpteg

Sounds very similar to what I experienced Corey. Something that seems to at least help with my problems is to have more partitions. Am already fighting between ending up with too many partitions in the end and having too few in the beginning. By coalescing at late as possible and avoiding too few i

Re: Missing shuffle files

2015-02-23 Thread Corey Nolet

I'm looking @ my yarn container logs for some of the executors which appear to be failing (with the missing shuffle files). I see exceptions that say "client.TransportClientFactor: Found inactive connection to host/ip:port, closing it." Right after that I see "shuffle.RetryingBlockFetcher: Excepti

Re: Missing shuffle files

2015-02-23 Thread Anders Arpteg

No, unfortunately we're not making use of dynamic allocation or the external shuffle service. Hoping that we could reconfigure our cluster to make use of it, but since it requires changes to the cluster itself (and not just the Spark app), it could take some time. Unsure if task 450 was acting as

Re: Missing shuffle files

2015-02-22 Thread Sameer Farooqui

Do you guys have dynamic allocation turned on for YARN? Anders, was Task 450 in your job acting like a Reducer and fetching the Map spill output data from a different node? If a Reducer task can't read the remote data it needs, that could cause the stage to fail. Sometimes this forces the previou

Re: Missing shuffle files

2015-02-21 Thread Petar Zecevic

Could you try to turn on the external shuffle service? spark.shuffle.service.enable= true On 21.2.2015. 17:50, Corey Nolet wrote: I'm experiencing the same issue. Upon closer inspection I'm noticing that executors are being lost as well. Thing is, I can't figure out how they are dying. I'm u

Re: Missing shuffle files

2015-02-21 Thread Corey Nolet

I'm experiencing the same issue. Upon closer inspection I'm noticing that executors are being lost as well. Thing is, I can't figure out how they are dying. I'm using MEMORY_AND_DISK_SER and i've got over 1.3TB of memory allocated for the application. I was thinking perhaps it was possible that a s

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

Re: Missing shuffle files

10 matches

Site Navigation

Mail list logo

Footer information