Ngone51 commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r675315413



##########
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##########
@@ -122,6 +119,18 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
    */
   private[this] var _shuffleMergedFinalized: Boolean = false
 
+  /**
+   * shuffleSequenceId is used to give temporal ordering to the executions of 
a ShuffleDependency.
+   * This is required in order to handle indeterministic stage retries for 
push-based shuffle.
+   */
+  private[this] var _shuffleSequenceId: Int = -1

Review comment:
       It sounds like the idea is opposite to the current solution for the 
indeterminate stage (where an older fetch/push always fails), so I imagine it 
would introduce many inconsistent changes for the determinate stage, which 
could be troublesome.
   
   Bseides, if we want to read data from older and newer attempts together, 
IIUC, reset ShuffleDependency._shuffleMergedFinalized to false would allow data 
from newer attempts to be merged into the same merged files with older 
attempts, so we can also read them together. And this sounds like more easier 
to do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to