LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager 
heartbeat endpoint from driver endpoint
URL: https://github.com/apache/spark/pull/25971#issuecomment-551008777
 
 
   @cloud-fan the prime target is to fix the second problem: 
`BlockManagerMaster` is mostly busy with other events causes heartbeat timeout 
frequently. In a heavy driver, executors' heartbeat timeout frequently will 
cause executors lost and finally driver will crash.
   We had a testing:
   Before: 40 JDBC clients send small SQL concurrency, the driver will crash in 
2 hours.
   After: 80 JDBC clients send small SQL concurrency, the driver could work 
over 3 days.
   (Testing is based on our concurrency optimized driver)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to