mridulm commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r668356459
##########
File path:
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -360,13 +389,41 @@ private[spark] class IndexShuffleBlockResolver(
if (dataTmp != null && dataTmp.exists() &&
!dataTmp.renameTo(dataFile)) {
throw new IOException("fail to rename file " + dataTmp + " to " +
dataFile)
}
+
+ // write the checksum file
+ checksumTmpOpt.zip(checksumFileOpt).foreach { case (checksumTmp,
checksumFile) =>
+ val out = new DataOutputStream(
+ new BufferedOutputStream(
+ new FileOutputStream(checksumTmp)
+ )
+ )
+ Utils.tryWithSafeFinally {
+ checksums.foreach(out.writeLong)
+ } {
+ out.close()
+ }
+
+ if (checksumFile.exists()) {
+ checksumFile.delete()
+ }
+ if (!checksumTmp.renameTo(checksumFile)) {
+ // It's not worthwhile to fail here after index file and data
file are already
+ // successfully stored due to checksum is only used for the
corner error case.
+ logWarning("fail to rename file " + checksumTmp + " to " +
checksumFile)
Review comment:
I agree, I am fine with this behavior here.
I was wondering if we have to make it the same
[above](https://github.com/apache/spark/pull/32401/files#diff-9e9749da596e4dd6c02722f91cd62afc28a44f00c7cebb927ccdeae1629e98a1R354)
as well ?
That is, if index/data exists but checksum does not - do we want to rewrite
index/data just to populate checksum ?
Or simply avoid writing checksum if it is missing and behave like we are
doing here ?
Thoughts ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]