LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint URL: https://github.com/apache/spark/pull/25971#issuecomment-551008777 @cloud-fan the prime target is to fix the second problem: `BlockManagerMaster` is mostly busy with other events causes heartbeat timeout frequently. In a heavy driver, executors' heartbeat timeout frequently will cause executors lost and finally driver will crash. We had a testing: Before: 40 JDBC clients send small SQL concurrency, the driver will crash in 2 hours. After: 80 JDBC clients send small SQL concurrency, the driver could work over 3 days. (Testing is based on our concurrency optimized driver)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
