Re: question about data skew and memory issues

Gourav Sengupta Sun, 19 Dec 2021 06:43:49 -0800

Hi,
also if you are using SPARK 3.2.x please try to see the documentation on
handling skew using SPARK settings.


Regards,
Gourav Sengupta

On Tue, Dec 14, 2021 at 6:01 PM David Diebold <davidjdieb...@gmail.com>
wrote:

> Hello all,
>
> I was wondering if it possible to encounter out of memory exceptions on
> spark executors when doing some aggregation, when a dataset is skewed.
> Let's say we have a dataset with two columns:
> - key : int
> - value : float
> And I want to aggregate values by key.
> Let's say that we have a tons of key equal to 0.
>
> Is it possible to encounter some out of memory exception after the shuffle
> ?
> My expectation would be that the executor responsible of aggregating the
> '0' partition could indeed have some oom exception if it tries to put all
> the files of this partition in memory before processing them.
> But why would it need to put them in memory when doing in aggregation ? It
> looks to me that aggregation can be performed in a stream fashion, so I
> would not expect any oom at all..
>
> Thank you in advance for your lights :)
> David
>
>
>

Re: question about data skew and memory issues

Reply via email to