agrawaldevesh commented on a change in pull request #29211:
URL: https://github.com/apache/spark/pull/29211#discussion_r463860185
##########
File path:
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##########
@@ -327,4 +354,28 @@ private[storage] class BlockManagerDecommissioner(
}
logInfo("Stopped storage decommissioner")
}
+
+ /*
+ * Returns the last migration time and a boolean for if all blocks have
been migrated.
+ * If there are any tasks running since that time the boolean may be
incorrect.
+ */
+ private[storage] def lastMigrationInfo(): (Long, Boolean) = {
+ if (stopped || (stoppedRDD && stoppedShuffle)) {
+ (System.nanoTime(), true)
+ } else {
+ // Chose the min of the running times.
+ val lastMigrationTime = if (
+ conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED) &&
+ conf.get(config.STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED)) {
+ Math.min(lastRDDMigrationTime, lastShuffleMigrationTime)
Review comment:
> @agrawaldevesh: what is your opinion/idea?
I confess that even I am finding the logic in the lastMigrationInfo a bit
hard to follow. And as far as I can see, this logic is the key to getting the
shutdown thread to exit cleanly. If the information returned by
lastMigrationInfo is wrong then the shutdown thread may exit prematurely or
never. Both of which should be avoided, particularly because the shutdown
thread has no timeout on the amount of time it can hang around.
This complexity is intrinsic. The logic is indeed complex but I think what
we can add some more documentation here, perhaps explaining the intent of the
code using a couple of examples ? Even better would be to please add a unit
test for this. I think it should be easy to unit test just the
lastMigrationInfo. This is just testing that the function is behaving as
expected by stubbing out other parts. This test would also help with providing
the necessary guardrails as we change this function in the future. I expect
this function to change as we production harden the cache/shuffle block
migration in the near term.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]