GitHub user ericl opened a pull request:
https://github.com/apache/spark/pull/15016
[SPARK-16525] RowBasedKeyValueBatch should use default page size to prevent
OOMs
## What changes were proposed in this pull request?
Before this task, we would always allocate 64MB per aggregation task, even
when running in low-memory situations such as local mode. This changes it to
use the memory manager default page size, which is automatically reduced from
64MB in these situations.
cc @ooq @JoshRosen
## How was this patch tested?
Tested manually with `bin/spark-shell --master=local[32]` and verifying
that `(1 to math.pow(10, 3).toInt).toDF("n").withColumn("m", 'n %
2).groupBy('m).agg(sum('n)).show` does not crash.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ericl/spark sc-4483
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15016.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15016
----
commit 5e642aeb60e41fbc8e09789c3693ebd76ba78324
Author: Eric Liang <[email protected]>
Date: 2016-09-08T20:53:22Z
use default page size
commit 1d6a8456a23e3f582855997d3f0b0a9dbbd3018a
Author: Eric Liang <[email protected]>
Date: 2016-09-08T21:08:43Z
Thu Sep 8 14:08:43 PDT 2016
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]