gaoyajun02 commented on PR #46934:
URL: https://github.com/apache/spark/pull/46934#issuecomment-2176119908

   Service nodes with disk issues (e.g. No space left on device, Read-only file 
system) have a large number of logs stating `IOExceptions exceeded the 
threshold when merging shufflePush`, as well as the following WARN log: 
   ```
   INFO application_xxx attempt 1 shuffle 0 shuffleMerge 0: finalize shuffle 
merge
   WARN Application application_xxxx shuffleId 0 shuffleMergeId 0 reduceId 133 
update to index/meta failed 
   ```
   The corresponding method is specifically `updateChunkInfo` -> 
`writeChunkTracker,` which encounters an IOException when serializing 
chunkTracker to disk. 
   
   Based on this information, it can be determined that: during the finalize 
partition process, writing the serialized chunk to the metaFile failed, but the 
truncate operation succeeded, meaning the bitmap of the last chunk was deleted 
from the metaFile, while the mapTracker records the mapId after the block data 
processing is completed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to