venkata91 commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r675024572
##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -122,6 +119,18 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C:
ClassTag](
*/
private[this] var _shuffleMergedFinalized: Boolean = false
+ /**
+ * shuffleSequenceId is used to give temporal ordering to the executions of
a ShuffleDependency.
+ * This is required in order to handle indeterministic stage retries for
push-based shuffle.
+ */
+ private[this] var _shuffleSequenceId: Int = -1
Review comment:
Yes, you're correct. Currently for determinate stage we don't increment
the `shuffleSequenceId`. Currently we disable merge in the case when shuffle is
finalized. But I can think of one case where we can do in the future which is
in a retry attempt of a stage for the same shuffle ID we can generate new set
of files and have both the older and newer attempt blocks to be read on the
reducer side effectively increasing the overall merged blocks read. Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]