Re: Single point of failure with Driver host crashing

2016-08-12 Thread Jacek Laskowski
Hi,

I'm sure that cluster deploy mode would solve it very well. It'd be a
cluster issue then to re-execute the driver then?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh
 wrote:
>
> Hi,
>
> Although Spark is fault tolerant when nodes go down like below:
>
> FROM tmp
> [Stage 1:===>   (20 + 10) /
> 100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on
> xx.xxx.197.216: worker lost
> [Stage 1:>   (44 + 8) /
> 100]
> It can carry on.
>
> However, when the node (the host) that the app was started  on goes down the
> job fails as the driver disappears  as well. Is there a way to avoid this
> single point of failure, assuming what I am stating is valid?
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Single point of failure with Driver host crashing

2016-08-11 Thread Mich Talebzadeh
Thanks Ted,

In this case we were using Standalone with Standalone master started on
another node.

The app was started on a node but not the master node. The master node was
not affected. The node in question was the edge (running spark-submit).

>From the link I was not sure this matter would have been resolved?

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 August 2016 at 20:58, Ted Yu  wrote:

> Have you read https://spark.apache.org/docs/latest/spark-standalone.
> html#high-availability ?
>
> FYI
>
> On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>> Hi,
>>
>> Although Spark is fault tolerant when nodes go down like below:
>>
>> FROM tmp
>> [Stage 1:===>   (20 +
>> 10) / 100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on
>> xx.xxx.197.216: worker lost
>> [Stage 1:>   (44 +
>> 8) / 100]
>> It can carry on.
>>
>> However, when the node (the host) that the app was started  on goes down
>> the job fails as the driver disappears  as well. Is there a way to avoid
>> this single point of failure, assuming what I am stating is valid?
>>
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>


Re: Single point of failure with Driver host crashing

2016-08-11 Thread Ted Yu
Have you read
https://spark.apache.org/docs/latest/spark-standalone.html#high-availability
?

FYI

On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh  wrote:

>
> Hi,
>
> Although Spark is fault tolerant when nodes go down like below:
>
> FROM tmp
> [Stage 1:===>   (20 + 10)
> / 100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on
> xx.xxx.197.216: worker lost
> [Stage 1:>   (44 + 8)
> / 100]
> It can carry on.
>
> However, when the node (the host) that the app was started  on goes down
> the job fails as the driver disappears  as well. Is there a way to avoid
> this single point of failure, assuming what I am stating is valid?
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Single point of failure with Driver host crashing

2016-08-11 Thread Mich Talebzadeh
Hi,

Although Spark is fault tolerant when nodes go down like below:

FROM tmp
[Stage 1:===>   (20 + 10) /
100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on
xx.xxx.197.216: worker lost
[Stage 1:>   (44 + 8) /
100]
It can carry on.

However, when the node (the host) that the app was started  on goes down
the job fails as the driver disappears  as well. Is there a way to avoid
this single point of failure, assuming what I am stating is valid?


Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.