GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/19394
SPARK-22170: Reduce memory consumption in broadcast joins.
This updates the broadcast join code path to lazily decompress pages and
iterate through UnsafeRows to prevent all rows from being held in memory
while the broadcast table is being built.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark broadcast-driver-memory
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19394.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19394
----
commit d82ffca19c84589dfcb87a81996fb3ce2e31e91a
Author: Ryan Blue <[email protected]>
Date: 2017-08-11T00:41:56Z
SPARK-22170: Reduce memory consumption in broadcast joins.
This updates the broadcast join code path to lazily decompress pages and
iterate through UnsafeRows to prevent all rows from being held in memory
while the broadcast table is being built.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]