GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/8825
[SPARK-10704] Consolidate HashShuffleReader and ShuffleReader; refactor
ShuffleManager.getReader()
The current shuffle code has an interface named ShuffleReader with only one
implementation, HashShuffleReader. This naming is confusing, since the same
read path code is used for both sort- and hash-based shuffle. This patch
consolidates these into a single class named ShuffleReader.
In addition, there are aspects of ShuffleManager.getReader()'s API which
don't make a lot of sense: it exposes the ability to request a contiguous range
of shuffle partitions, but this feature isn't supported by any ShuffleReader
implementations and isn't used anywhere in the existing code. If we do want to
implement support for fetching multiple partitions then we'll likely want to
support the ability to fetch sets of partitions with non-contiguous ids, which
this interface doesn't support. As a result, this patch also changes this API
to accept a single partition id.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark shuffle-reader-cleanup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8825.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8825
----
commit ad532e49b0093c8f6529ede09d794c975fb1ce69
Author: Josh Rosen <[email protected]>
Date: 2015-09-18T17:44:50Z
Rename HashShuffleReader to ShuffleReader.
commit 90e01c09ea79c7c9c0006878434a32c3d568bc6b
Author: Josh Rosen <[email protected]>
Date: 2015-09-18T18:09:30Z
Change getReader() to only accept a single partition:
There was no ShuffleReader implementation that supported the ability to
fetch
a range of partitions. If we do want to implement support for fetching
multiple
partitions then we'll likely want to support the ability to fetch sets of
partitions with non-contiguous ids, which this interface doesn't support.
commit a0e7fc6aad69fe22602700d9f6f74716b15d7d86
Author: Josh Rosen <[email protected]>
Date: 2015-09-18T18:48:07Z
Consolidate getReader() implementations.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]