Victsm commented on a change in pull request #30433:
URL: https://github.com/apache/spark/pull/30433#discussion_r528512614
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -827,13 +833,16 @@ void resetChunkTracker() {
void updateChunkInfo(long chunkOffset, int mapIndex) throws IOException {
long idxStartPos = -1;
try {
- // update the chunk tracker to meta file before index file
- writeChunkTracker(mapIndex);
idxStartPos = indexFile.getFilePointer();
logger.trace("{} shuffleId {} reduceId {} updated index current {}
updated {}",
appShuffleId.appId, appShuffleId.shuffleId, reduceId,
this.lastChunkOffset,
chunkOffset);
- indexFile.writeLong(chunkOffset);
+ indexFile.write(Longs.toByteArray(chunkOffset));
+ // Chunk bitmap should be written to the meta file after the index
file because if there are
+ // any exceptions during writing the offset to the index file, meta
file should not be
+ // updated. If the update to the index file is successful but the
update to meta file isn't
+ // then the index file position is reset in the catch clause.
+ writeChunkTracker(mapIndex);
Review comment:
Stopping merging blocks only after 2 consecutive IOExceptions seems to
be giving up too early.
If there's a need to stop early due to too many disk failures, I think that
logic should be on the client side inside `ErrorHandler`.
For the server side, it should just handle the received merge requests as
best as it can.
Also, index/meta file writing failure should be recoverable just like how we
recover data file writing failures.
We just need to handle the data file and index/meta file IOExceptions
separately.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]