[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

GitBox Mon, 12 Jul 2021 13:01:27 -0700


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668027402




##########
File path: 
core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
##########
@@ -0,0 +1,66 @@
+package org.apache.spark.shuffle.checksum;
+
+import java.util.Locale;
+import java.util.zip.Adler32;
+import java.util.zip.CRC32;
+import java.util.zip.Checksum;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkException;
+import org.apache.spark.internal.config.package$;
+import org.apache.spark.storage.ShuffleChecksumBlockId;
+
+public class ShuffleChecksumHelper {
+
+  public static boolean isShuffleChecksumEnabled(SparkConf conf) {
+    return (boolean) conf.get(package$.MODULE$.SHUFFLE_CHECKSUM_ENABLED());
+  }
+
+  public static Checksum[] createPartitionChecksumsIfEnabled(int 
numPartitions, SparkConf conf)
+    throws SparkException {
+    Checksum[] partitionChecksums;
+
+    if (!isShuffleChecksumEnabled(conf)) {
+      partitionChecksums = new Checksum[0];
+      return partitionChecksums;
+    }
+
+    String checksumAlgo = 
shuffleChecksumAlgorithm(conf).toLowerCase(Locale.ROOT);
+    switch (checksumAlgo) {
+      case "adler32":
+        partitionChecksums = new Adler32[numPartitions];
+        for (int i = 0; i < numPartitions; i ++) {
+          partitionChecksums[i] = new Adler32();
+        }
+        return partitionChecksums;
+
+      case "crc32":
+        partitionChecksums = new CRC32[numPartitions];
+        for (int i = 0; i < numPartitions; i ++) {
+          partitionChecksums[i] = new CRC32();
+        }
+        return partitionChecksums;
+
+      default:
+        throw new SparkException("Unsupported shuffle checksum algorithm: " + 
checksumAlgo);
+    }
+  }
+
+  public static long[] getChecksumValues(Checksum[] partitionChecksums) {

Review comment:
       it already returns the empty long array when `partitionChecksums` is 
empty? (it won't go through the for loop when `partitionChecksums` is empty.)
   
   

##########
File path: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver(
           if (dataTmp != null && dataTmp.exists() && 
!dataTmp.renameTo(dataFile)) {
             throw new IOException("fail to rename file " + dataTmp + " to " + 
dataFile)
           }
+
+          // write the checksum file
+          checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp, 
checksumFile) =>
+            val out = new DataOutputStream(
+              new BufferedOutputStream(
+                new FileOutputStream(checksumTmp)
+              )
+            )
+            Utils.tryWithSafeFinally {
+              checksums.foreach(out.writeLong)
+            } {
+              out.close()
+            }
+
+            if (checksumFile.exists()) {
+              checksumFile.delete()
+            }
+            if (!checksumTmp.renameTo(checksumFile)) {
+              // It's not worthwhile to fail here after index file and data 
file are already
+              // successfully stored due to checksum is only used for the 
corner error case.
+              logWarning("fail to rename file " + checksumTmp + " to " + 
checksumFile)

Review comment:
       So if you look at the comment, my concern is: it's not worthwhile to 
fail here after the index file and data file are
   successfully generated since the checksum is only a best-effort could do in 
case of data corruption. 
   
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

Reply via email to