Re: Spark worker threads waiting

2014-03-24 Thread sparrow
ork traffic during that period to see performance. > Regards > Mayur > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Fri, Mar 21, 2014 at 8:33 AM, sparrow <[hidden

Re: Spark worker threads waiting

2014-03-21 Thread sparrow
le. > Regards > Mayur > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Mar 20, 2014 at 9:55 AM, sparrow <[hidden > email]<http://user/SendEmail.jtp?type=node&

Re: Spark worker threads waiting

2014-03-20 Thread sparrow
This is what the web UI looks like: [image: Inline image 1] I also tail all the worker logs and theese are the last entries before the waiting begins: 14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, minRequest: 10066329 14/03/20 13:29:10 INFO Blo

Re: Large shuffle RDD

2014-03-14 Thread sparrow
found out what the problem was. It turned out that spark was consuming too much memory and not enough was left for OS. When doing large shuffle writes, performance is greatly reduced if there is not enough memory left for OS cache buffer. We have changed our configuration that spark on workers on

Re: Out of memory on large RDDs

2014-03-11 Thread sparrow
I don't understand how exactly will that help. There are no persisted RDD's in storage. Our input data is ~ 100GB, but output of the flatMap is ~40Mb. The small RDD is then persisted. Memory configuration should not affect shuffle data if I understand you correctly? On Tue, Mar 11, 2014 at 4:5