ork traffic during that period to see performance.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Mar 21, 2014 at 8:33 AM, sparrow <[hidden
le.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Mar 20, 2014 at 9:55 AM, sparrow <[hidden
> email]<http://user/SendEmail.jtp?type=node&
This is what the web UI looks like:
[image: Inline image 1]
I also tail all the worker logs and theese are the last entries before the
waiting begins:
14/03/20 13:29:10 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
maxBytesInFlight: 50331648, minRequest: 10066329
14/03/20 13:29:10 INFO Blo
found out what the problem was. It turned out that spark was consuming too
much memory and not enough was left for OS. When doing large shuffle writes,
performance is greatly reduced if there is not enough memory left for OS
cache buffer.
We have changed our configuration that spark on workers on
I don't understand how exactly will that help. There are no persisted RDD's
in storage. Our input data is ~ 100GB, but output of the flatMap is ~40Mb.
The small RDD is then persisted.
Memory configuration should not affect shuffle data if I understand you
correctly?
On Tue, Mar 11, 2014 at 4:5