Hi, I'm sure that cluster deploy mode would solve it very well. It'd be a cluster issue then to re-execute the driver then?
Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi, > > Although Spark is fault tolerant when nodes go down like below: > > FROM tmp > [Stage 1:===========> (20 + 10) / > 100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on > xx.xxx.197.216: worker lost > [Stage 1:========================> (44 + 8) / > 100] > It can carry on. > > However, when the node (the host) that the app was started on goes down the > job fails as the driver disappears as well. Is there a way to avoid this > single point of failure, assuming what I am stating is valid? > > > Thanks > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > Disclaimer: Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org