Re: oome from blockmanager

Aaron Davidson Sat, 26 Oct 2013 13:56:28 -0700

Thanks again for the great detail! Your setup sounds correct, and all the
numbers you're seeing suggest that no foul play (i.e., bug) is at work. I
think this is a simple case of the blockInfo and blockToFileSegmentMap being
poorly optimized, especially the latter. I will look into reducing the
memory footprint of the blockToFileSegmentMap using similar techniques as
Josh mentioned for the blockInfo map. I've created a jira
(SPARK-946<https://spark-project.atlassian.net/browse/SPARK-946>)
to track this issue.


In the meantime, 4GB of memory does seem really low, so more or
higher-memory machines is probably the only route to get your job running
successfully right now.


On Sat, Oct 26, 2013 at 1:13 PM, Stephen Haberman <
[email protected]> wrote:

> Hi Patrick,
>
> > Just wondering, how many reducers are you using in this shuffle? By
> > 7,000 partitions, I'm assuming you mean the map side of the shuffle.
> > What about the reduce side?
>
> 7,000 on that side as well.
>
> We're loading about a month's worth of data in one RDD, with ~7,000
> partitions, and cogrouping it with another RDD with 50 partitions, and
> the resulting RDD also has 7,000 partitions.
>
> (As, since we don't have spark.default.parallelism set, the
> defaultPartitioner logic chooses the max of [50, 7,0000] to be the next
> partition size.)
>
> I believe that is what you're asking by number of reducers? The number
> of partitions in the post-cogroup ShuffledRDD?
>
> Also, AFAICT, I don't believe we get to the reduce/ShuffledRDD side of
> this cogroup--after 2,000-3,000 ShuffleMapTasks on the map side is when
> it bogs down.
>
> Thanks,
> Stephen
>
>

Re: oome from blockmanager

Reply via email to