Re: oome from blockmanager

2013-12-01 Thread Stephen Haberman
The short term solutions have already been discussed: decrease the number of reducers (and mappers, if you need them to be tied) or potentially turn off compression if Snappy is holding too much buffer space. Just to follow up with this (sorry for the delay; I was busy/out for Thanksgiving),

RE: oome from blockmanager

2013-11-22 Thread Shao, Saisai
partitions size, but small partition size will lead to more partition numbers that also consumes lots of memory. Thanks Jerry From: Aaron Davidson [mailto:ilike...@gmail.com] Sent: Friday, November 22, 2013 2:54 PM To: user@spark.incubator.apache.org Subject: Re: oome from blockmanager Thanks

Re: oome from blockmanager

2013-11-22 Thread Stephen Haberman
Hi Aaron, Clearly either of the latter two solutions [no compression or 1 executor/node] will produce a significant slowdown Just curious, but why would turning off compression lead to a significant slow down? Just more IO, I guess? FWIW, the job we'd been discussing with 18k partitions, I

Re: oome from blockmanager

2013-11-22 Thread Aaron Davidson
of memory. Thanks Jerry *From:* Aaron Davidson [mailto:ilike...@gmail.com] *Sent:* Friday, November 22, 2013 2:54 PM *To:* user@spark.incubator.apache.org *Subject:* Re: oome from blockmanager Thanks for your feedback; I think this is a very important issue on the usability front. One

Re: oome from blockmanager

2013-11-22 Thread 邵赛赛
:* Friday, November 22, 2013 2:54 PM *To:* user@spark.incubator.apache.org *Subject:* Re: oome from blockmanager Thanks for your feedback; I think this is a very important issue on the usability front. One thing to consider is that at some data size, one simply needs larger or more nodes. m1

Re: oome from blockmanager

2013-11-21 Thread Stephen Haberman
Hi guys, When you do a shuffle form N map partitions to M reduce partitions, there are N * M output blocks created and each one is tracked. That's why all these per-block overheads are causing you to OOM. So, I'm peering into a heap dump from a 18,000-partition shuffle and thought I would

Re: oome from blockmanager

2013-11-21 Thread Stephen Haberman
Hi Patrick/Aaron, Sorry to revive this thread, but we're seeing some OOMEs errors again (running with master a few commits after Aaron's optimizations). I can tweak our job, but I just wanted to ask some clarifications. Another way to fix it is to modify your job to create fewer partitions.

Re: oome from blockmanager

2013-11-05 Thread Aaron Davidson
As a followup on this, the memory footprint of all shuffle metadata has been greatly reduced. For your original workload with 7k mappers, 7k reducers, and 5 machines, the total metadata size should have decreased from ~3.3 GB to ~80 MB. On Tue, Oct 29, 2013 at 9:07 AM, Aaron Davidson

Re: oome from blockmanager

2013-10-28 Thread Stephen Haberman
Hey guys, As a follow up, I raised our target partition size to 600mb (up from 64mb), which split this report's 500gb of tiny S3 files into ~700 partitions, and everything ran much smoother. In retrospect, this was the same issue we'd ran into before, having too many partitions, and had

Re: oome from blockmanager

2013-10-26 Thread Patrick Wendell
Hey Stephen, The issue is this. When you do a shuffle form N map partitions to M reduce partitions, there are N * M output blocks created and each one is tracked. That's why all these per-block overheads are causing you to OOM. One way to fix this is to reduce the per-block overhead which Josh

Re: oome from blockmanager

2013-10-26 Thread Patrick Wendell
Also just to clarify my answer. We definitely need to do the mentioned optimizations because it can definitely happen that you legitimately need thousands of mappers and reduce splits on each machine. I was just suggesting to also play around with the partitions in this case, since the dataset