Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/11105#discussion_r86383782
--- Diff: core/src/main/scala/org/apache/spark/rdd/ShuffledRDD.scala ---
@@ -104,10 +105,26 @@ class ShuffledRDD[K: ClassTag, V: ClassTag, C:
ClassTag](
}
override def compute(split: Partition, context: TaskContext):
Iterator[(K, C)] = {
+ // Use -1 for our Shuffle ID since we are on the read side of the
shuffle.
+ val shuffleWriteId = -1
+ // If our task has data property accumulators we need to keep track of
which partitions
+ // we are processing.
+ if (context.taskMetrics.hasDataPropertyAccumulators()) {
+ context.setRDDPartitionInfo(id, shuffleWriteId, split.index)
+ }
val dep = dependencies.head.asInstanceOf[ShuffleDependency[K, V, C]]
- SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index,
split.index + 1, context)
+ val itr = SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle,
split.index, split.index + 1,
+ context)
--- End diff --
I am looking closely at the combiner code to try to confirm this. I think
I believe it, I don't think its *guaranteed* to be true in the future. Eg.,
right now the combiners do an `insertAll` into the `ExternalAppendOnlyMap`
before reading from it. But there is no reason spark couldn't change so that
what it actually does is just insert the *next* key from all incoming streams
into the `ExternalAppendOnlyMap`, and then feed that one key to the downstream
iterators.
At the very least, we need a test to ensure this doesn't break if that
internal implementation were to change. (Does a test like that already exist?)
Again, I'm still mulling over whether there is even a good use to bother
supporting this at all ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]