[GitHub] [spark] otterc commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

GitBox Sun, 22 Nov 2020 23:58:02 -0800


otterc commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r528519186




##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -827,13 +833,16 @@ void resetChunkTracker() {
     void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
       long idxStartPos = -1;
       try {
-        // update the chunk tracker to meta file before index file
-        writeChunkTracker(mapIndex);
         idxStartPos = indexFile.getFilePointer();
         logger.trace("{} shuffleId {} reduceId {} updated index current {} 
updated {}",
           appShuffleId.appId, appShuffleId.shuffleId, reduceId, 
this.lastChunkOffset,
           chunkOffset);
-        indexFile.writeLong(chunkOffset);
+        indexFile.write(Longs.toByteArray(chunkOffset));
+        // Chunk bitmap should be written to the meta file after the index 
file because if there are
+        // any exceptions during writing the offset to the index file, meta 
file should not be
+        // updated. If the update to the index file is successful but the 
update to meta file isn't
+        // then the index file position is reset in the catch clause.
+        writeChunkTracker(mapIndex);

Review comment:
       > I think we should catch the chunk level IOExceptions inside 
onComplete, setting certain flags to make the onFailure handling logic know 
about whether we encountered a block level failure or a chunk level failure.
   For a chunk level failure, we shouldn't overwrite the previous block, we 
should effectively only delay the closure of the chunk.
   
   The closure of chunk write now depends on the minimum size of the chunk. If 
we keep growing the chunk then eventually it will be very large and the 
executors will get killed  fetching them because they will exceed the physical 
memory limit. This was the main reason to serve data in manageable-sized chunks.
   
   > Stopping merging blocks only after 2 consecutive IOExceptions seems to be 
giving up too early.
   
   With local file systems the IOExceptions that I can think of would be due to 
disk failures, file corruption, permission and these will not go away when 
retrying to write to these. Are there any other scenarios that I am missing?
   
   > If there's a need to stop early due to too many disk failures, I think 
that logic should be on the client side inside ErrorHandler.
   
   Sure, however the server here can respond with a similar runtime exception 
that it can't update merge file like the `TooLate` runtime exception and the 
error handle for the push side can stop pushing the data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] otterc commented on a change in pull request #30433: [SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal

Reply via email to