Single point of failure with Driver host crashing

Mich Talebzadeh Thu, 11 Aug 2016 12:41:29 -0700

Hi,

Although Spark is fault tolerant when nodes go down like below:


FROM tmp
[Stage 1:===========>                                           (20 + 10) /
100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on
xx.xxx.197.216: worker lost
[Stage 1:========================>                               (44 + 8) /
100]
It can carry on.

However, when the node (the host) that the app was started  on goes down
the job fails as the driver disappears  as well. Is there a way to avoid
this single point of failure, assuming what I am stating is valid?


Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Single point of failure with Driver host crashing

Reply via email to