[GitHub] [spark] Ngone51 commented on a diff in pull request #37603: [SPARK-40168][CORE] Handle FileNotFoundException when shuffle file deleted in decommissioner

GitBox Tue, 06 Sep 2022 07:27:21 -0700


Ngone51 commented on code in PR #37603:
URL: https://github.com/apache/spark/pull/37603#discussion_r963779224



##########
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:
##########
@@ -125,21 +126,25 @@ private[storage] class BlockManagerDecommissioner(
                   logDebug(s"Migrated sub-block $blockId")
                 }
               }
+              numMigratedShuffles.incrementAndGet()
               logInfo(s"Migrated $shuffleBlockInfo to $peer")
             } catch {
-              case e: IOException =>
+              case e @ ( _ : IOException | _ : SparkException) =>
                 // If a block got deleted before netty opened the file handle, 
then trying to
                 // load the blocks now will fail. This is most likely to occur 
if we start
                 // migrating blocks and then the shuffle TTL cleaner kicks in. 
However this
                 // could also happen with manually managed shuffles or a GC 
event on the
                 // driver a no longer referenced RDD with shuffle files.
                 if 
(bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo).size < blocks.size) 
{
                   logWarning(s"Skipping block $shuffleBlockInfo, block 
deleted.")
+                  numDeletedShuffles.incrementAndGet()

Review Comment:
   It's still possible to reuse `numMigratedShuffles`? Just reusing the current 
framework but only adding one more case (i.e., `SparkException` ) seems enough 
to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a diff in pull request #37603: [SPARK-40168][CORE] Handle FileNotFoundException when shuffle file deleted in decommissioner

Reply via email to