ivoson commented on code in PR #50230:
URL: https://github.com/apache/spark/pull/50230#discussion_r2315181368


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -1650,6 +1650,21 @@ package object config {
         s"The buffer size must be greater than 0 and less than or equal to 
${Int.MaxValue}.")
       .createWithDefault(4096)
 
+  private[spark] val SHUFFLE_ORDER_INDEPENDENT_CHECKSUM_ENABLED =
+    ConfigBuilder("spark.shuffle.orderIndependentChecksum.enabled")
+      .doc("Whether to calculate order independent checksum for the shuffle 
data or not. If " +
+        "enabled, Spark will calculate a checksum that is independent of the 
input row order for " +
+        "each mapper and returns the checksums from executors to driver. 
Different from the above" +
+        "checksum, the order independent remains the same even if the shuffle 
row order changes. " +
+        "While the above checksum is sensitive to shuffle data ordering to 
detect file " +

Review Comment:
   actually it's supposed to refer to the checksum value computed when 
`spark.shuffle.checksum.enabled` is true. updated the doc.



##########
core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala:
##########
@@ -58,6 +58,12 @@ private[spark] sealed trait MapStatus extends 
ShuffleOutputStatus {
    * partitionId of the task or taskContext.taskAttemptId is used.
    */
   def mapId: Long
+
+  /**
+   * The checksum value of this shuffle map task, which can be used to 
evaluate whether the
+   * output data have changed across different map task retries.

Review Comment:
   thanks, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to