GitHub user glentakahashi opened a pull request:
https://github.com/apache/spark/pull/20372
Improved block merging logic for partitions
## What changes were proposed in this pull request?
Change DataSourceScanExec so that when grouping blocks together into
partitions, also checks the end of the sorted list of splits to more
efficiently fill out partitions.
## How was this patch tested?
Updated old test to reflect the new logic, which causes the # of partitions
to drop from 4 -> 3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/glentakahashi/spark
feature/improved-block-merging
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20372.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20372
----
commit c575977a5952bf50b605be8079c9be1e30f3bd36
Author: Glen Takahashi <gtakahashi@...>
Date: 2018-01-23T23:22:34Z
Improved block merging logic for partitions
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]