Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
We enabled RM and NM recovery.
If we assume there are 2 containers running on this NM, after 10 minute, RM
detects the failure of NM and relaunches 2 lost containers in other NMs. This
is ok.
But if we restart the RM, then, the lost containers in the NM will be
**reported to RM as lost again** because of recovery, we will relaunch 2 more
containers in other NMs, then we will get 2 more executors than we expected.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]