otterc commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r528504439



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -827,13 +833,16 @@ void resetChunkTracker() {
     void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
       long idxStartPos = -1;
       try {
-        // update the chunk tracker to meta file before index file
-        writeChunkTracker(mapIndex);
         idxStartPos = indexFile.getFilePointer();
         logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
           appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
           chunkOffset);
-        indexFile.writeLong(chunkOffset);
+        indexFile.write(Longs.toByteArray(chunkOffset));
+        // Chunk bitmap should be written to the meta file after the index 
file because if there are
+        // any exceptions during writing the offset to the index file, meta 
file should not be
+        // updated. If the update to the index file is successful but the 
update to meta file isn't
+        // then the index file position is reset in the catch clause.
+        writeChunkTracker(mapIndex);

Review comment:
       > I don't think we should assume that.
   We could have taken an approach where we give up on writing merged shuffle 
data file as well.
   
   Here the shuffle service is writing to a local file. It's not a network 
filesystem/hdfs where because of network there could be transient exceptions. 
If during updating the local index/meta file the shuffle service sees an 
IOException and then there is an IOException again while seeking to a position, 
chances that the updates to the same file will fail are very high. If 
index/meta files are not successfully updated then those merged blocks cannot 
be served to the executors when they read/fetch it.
   If most of the time `IOExceptions` while writing to local filesystem can 
only be because of issues like file permission, disk full, etc, I don't see the 
advantage of still trying to keep merging blocks to the shuffle partition.
   It might be better for the server to reply to the executor to stop pushing 
in this case. Otherwise, they keep pushing more blocks and in most of the cases 
they will fail as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to