agrawaldevesh commented on pull request #29226: URL: https://github.com/apache/spark/pull/29226#issuecomment-664146969
> If we would need this block to be saved then there would be a much simpler fix by checking when the block appears in the `blocksUpdated` collection. ~But this goes against the `migrateDuring` original intention (see the first comment):~ I considered this approach (among others) but decided against it. Please let me explain my reasoning: - First, while I could always poll for the block to show up in blocksUpdated, you would agree that it is a lot of machinery particularly when we know that the blocksUpdated is placed by the `onBlockUpdated` listener call. So I decided to just trigger this decommissioning directly in the listener. - Second question is whether the decommissioning be done in `onBlockUpdated` or `onTaskEnd`. I chose `onTaskEnd` because that is the point of no return for the block to be "live". Consider what happens when a result task fails after the block is updated but before it can be considered "done". The block it wrote would get garbage collected. This garbage collection would race with the decommissioning. It's possible that the migration starts after the GC and does not find the block to replicated. This would fail the assertion below. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
