Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17325
> 1. Refreshing peer cache in the block manager before trying to
pro-actively replicate. This way the probability of replicating to a failed
executor is eliminated.
> 2. Explicitly stopping the block manager in the tests. This shuts down
the RPC endpoint use by the block manager. This way, even if a block manager
tries to replicate using a stale reference, the replication logic should take
care of refreshing the list of peers after failure.
why we need the fix 2 after we have the fix 1?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]