[GitHub] [spark] venkata91 commented on a change in pull request #33034: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

GitBox Thu, 22 Jul 2021 10:41:29 -0700


venkata91 commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r675024572




##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -122,6 +119,18 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
    */
   private[this] var _shuffleMergedFinalized: Boolean = false
 
+  /**
+   * shuffleSequenceId is used to give temporal ordering to the executions of 
a ShuffleDependency.
+   * This is required in order to handle indeterministic stage retries for 
push-based shuffle.
+   */
+  private[this] var _shuffleSequenceId: Int = -1

Review comment:
       Yes, you're correct. Currently for determinate stage we don't increment 
the `shuffleSequenceId`. Currently we disable merge in the case when shuffle is 
finalized. But I can think of one case where we can do in the future which is 
in a retry attempt of a stage for the same shuffle ID we can generate new set 
of files and have both the older and newer attempt blocks to be read on the 
reducer side effectively increasing the overall merged blocks read. Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] venkata91 commented on a change in pull request #33034: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

Reply via email to