Re: Notification on Spark Streaming job failure

2015-10-07 Thread Adrian Tanase
We’re deploying using YARN in cluster mode, to take advantage of automatic 
restart of long running streaming app. We’ve also done a POC on top of 
Mesos+Marathon, that’s always an option.

For monitoring / alerting, we’re using a combination of:

  *   Spark REST API queried from OpsView via nagios style checks
 *   Here, we have thresholds on things like number of successful jobs / 
tasks, total execution time, etc
  *   Custom business/operational metrics logged manually from the streaming 
app to OpenTSDB
 *   we’re using a combination of spark accumulators and custom RDDs – 
after summarizing some counters we’re pushing them to OpenTSDB via the REST API
 *   we’re using dashboards built with Grafana that poll OpenTSDB – nicer 
looking, same functionality
 *   We have a custom opsview check that queries OpenTSDB and looks for 
some successful number of events processed by the job over a period of time
 *   This is coupled with  a stable stream of data from a canary instance

Hope this helps – feel free to google around for all the above buzzwords :). I 
can get into more details on demand.

-adrian

From: Chen Song
Date: Monday, September 28, 2015 at 5:00 PM
To: Krzysztof Zarzycki
Cc: user
Subject: Re: Notification on Spark Streaming job failure

I am also interested specifically in monitoring and alerting on Spark streaming 
jobs. It will be helpful to get some general guidelines or advice on this, from 
people who implemented anything on this.

On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki 
mailto:k.zarzy...@gmail.com>> wrote:
Hi there Spark Community,
I would like to ask you for an advice: I'm running Spark Streaming jobs in 
production. Sometimes these jobs fail and I would like to get email 
notification about it. Do you know how I can set up Spark to notify me by email 
if my job fails? Or do I have to use external monitoring tool?
I'm thinking of the following options:
1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked for it 
as well but couldn't find any YARN feature to do it.
2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, Luigi. Those 
are created rather for batch jobs, not streaming, but could work. Has anyone 
tried that?
3. Run job driver under "monit" tool and catch the failure and send an email 
about it. Currently I'm deploying with yarn-cluster mode and I would need to 
resign from it to run under monit
4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and use Spark 
metrics. And then implement alerting in those. Can I get information of failed 
jobs in Spark metrics?
5. As 4. but implement my own custom job metrics and monitor them.

What's your opinion about my options? How do you people solve this problem? 
Anything Spark specific?
I'll be grateful for any advice in this subject.
Thanks!
Krzysiek




--
Chen Song



Re: Notification on Spark Streaming job failure

2015-10-07 Thread Steve Loughran

On 7 Oct 2015, at 06:28, Krzysztof Zarzycki 
mailto:k.zarzy...@gmail.com>> wrote:

Hi Vikram, So you give up using yarn-cluster mode of launching Spark jobs, is 
that right? AFAIK when using yarn-cluster mode, the launch process 
(spark-submit) monitors job running on YARN, but if it is killed/dies, it just 
stops printing the state (RUNNING usually), without influencing the monitored 
job. So you cannot use monit features on the launch process (like restart on 
fail,etc.)

One more thing: Monit depends on pidfiles and spark-submit (in yarn-client 
mode) does not create them. Do you create them on your own?

Thanks!
Krzysiek


you know, there's nothing to stop anyone adding a little monitoring tool -just 
poll the YARN RM for application reports and then fail if the application -> 
FAILED/KILLED states. If you do this, do test what happens during AM Restart 
-you probably want to send a notification, but it is not as serious as a full 
application failure



2015-10-07 6:37 GMT+02:00 Vikram Kone 
mailto:vikramk...@gmail.com>>:
We are using Monit to kick off spark streaming jobs n seems to work fine.


On Monday, September 28, 2015, Chen Song 
mailto:chen.song...@gmail.com>> wrote:
I am also interested specifically in monitoring and alerting on Spark streaming 
jobs. It will be helpful to get some general guidelines or advice on this, from 
people who implemented anything on this.

On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki  
wrote:
Hi there Spark Community,
I would like to ask you for an advice: I'm running Spark Streaming jobs in 
production. Sometimes these jobs fail and I would like to get email 
notification about it. Do you know how I can set up Spark to notify me by email 
if my job fails? Or do I have to use external monitoring tool?
I'm thinking of the following options:
1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked for it 
as well but couldn't find any YARN feature to do it.
2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, Luigi. Those 
are created rather for batch jobs, not streaming, but could work. Has anyone 
tried that?
3. Run job driver under "monit" tool and catch the failure and send an email 
about it. Currently I'm deploying with yarn-cluster mode and I would need to 
resign from it to run under monit
4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and use Spark 
metrics. And then implement alerting in those. Can I get information of failed 
jobs in Spark metrics?
5. As 4. but implement my own custom job metrics and monitor them.

What's your opinion about my options? How do you people solve this problem? 
Anything Spark specific?
I'll be grateful for any advice in this subject.
Thanks!
Krzysiek




--
Chen Song





Re: Notification on Spark Streaming job failure

2015-10-06 Thread Krzysztof Zarzycki
Hi Vikram, So you give up using yarn-cluster mode of launching Spark jobs,
is that right? AFAIK when using yarn-cluster mode, the launch process
(spark-submit) monitors job running on YARN, but if it is killed/dies, it
just stops printing the state (RUNNING usually), without influencing the
monitored job. So you cannot use monit features on the launch process (like
restart on fail,etc.)

One more thing: Monit depends on pidfiles and spark-submit (in yarn-client
mode) does not create them. Do you create them on your own?

Thanks!
Krzysiek



2015-10-07 6:37 GMT+02:00 Vikram Kone :

> We are using Monit to kick off spark streaming jobs n seems to work fine.
>
>
> On Monday, September 28, 2015, Chen Song  wrote:
>
>> I am also interested specifically in monitoring and alerting on Spark
>> streaming jobs. It will be helpful to get some general guidelines or advice
>> on this, from people who implemented anything on this.
>>
>> On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki > > wrote:
>>
>>> Hi there Spark Community,
>>> I would like to ask you for an advice: I'm running Spark Streaming jobs
>>> in production. Sometimes these jobs fail and I would like to get email
>>> notification about it. Do you know how I can set up Spark to notify me by
>>> email if my job fails? Or do I have to use external monitoring tool?
>>> I'm thinking of the following options:
>>> 1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked
>>> for it as well but couldn't find any YARN feature to do it.
>>> 2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban,
>>> Luigi. Those are created rather for batch jobs, not streaming, but could
>>> work. Has anyone tried that?
>>> 3. Run job driver under "monit" tool and catch the failure and send an
>>> email about it. Currently I'm deploying with yarn-cluster mode and I would
>>> need to resign from it to run under monit
>>> 4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and
>>> use Spark metrics. And then implement alerting in those. Can I get
>>> information of failed jobs in Spark metrics?
>>> 5. As 4. but implement my own custom job metrics and monitor them.
>>>
>>> What's your opinion about my options? How do you people solve this
>>> problem? Anything Spark specific?
>>> I'll be grateful for any advice in this subject.
>>> Thanks!
>>> Krzysiek
>>>
>>>
>>
>>
>> --
>> Chen Song
>>
>>


Re: Notification on Spark Streaming job failure

2015-10-06 Thread Vikram Kone
We are using Monit to kick off spark streaming jobs n seems to work fine.

On Monday, September 28, 2015, Chen Song  wrote:

> I am also interested specifically in monitoring and alerting on Spark
> streaming jobs. It will be helpful to get some general guidelines or advice
> on this, from people who implemented anything on this.
>
> On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki  > wrote:
>
>> Hi there Spark Community,
>> I would like to ask you for an advice: I'm running Spark Streaming jobs
>> in production. Sometimes these jobs fail and I would like to get email
>> notification about it. Do you know how I can set up Spark to notify me by
>> email if my job fails? Or do I have to use external monitoring tool?
>> I'm thinking of the following options:
>> 1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked
>> for it as well but couldn't find any YARN feature to do it.
>> 2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, Luigi.
>> Those are created rather for batch jobs, not streaming, but could work. Has
>> anyone tried that?
>> 3. Run job driver under "monit" tool and catch the failure and send an
>> email about it. Currently I'm deploying with yarn-cluster mode and I would
>> need to resign from it to run under monit
>> 4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and use
>> Spark metrics. And then implement alerting in those. Can I get information
>> of failed jobs in Spark metrics?
>> 5. As 4. but implement my own custom job metrics and monitor them.
>>
>> What's your opinion about my options? How do you people solve this
>> problem? Anything Spark specific?
>> I'll be grateful for any advice in this subject.
>> Thanks!
>> Krzysiek
>>
>>
>
>
> --
> Chen Song
>
>


Re: Notification on Spark Streaming job failure

2015-09-28 Thread Chen Song
I am also interested specifically in monitoring and alerting on Spark
streaming jobs. It will be helpful to get some general guidelines or advice
on this, from people who implemented anything on this.

On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki 
wrote:

> Hi there Spark Community,
> I would like to ask you for an advice: I'm running Spark Streaming jobs in
> production. Sometimes these jobs fail and I would like to get email
> notification about it. Do you know how I can set up Spark to notify me by
> email if my job fails? Or do I have to use external monitoring tool?
> I'm thinking of the following options:
> 1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked
> for it as well but couldn't find any YARN feature to do it.
> 2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, Luigi.
> Those are created rather for batch jobs, not streaming, but could work. Has
> anyone tried that?
> 3. Run job driver under "monit" tool and catch the failure and send an
> email about it. Currently I'm deploying with yarn-cluster mode and I would
> need to resign from it to run under monit
> 4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and use
> Spark metrics. And then implement alerting in those. Can I get information
> of failed jobs in Spark metrics?
> 5. As 4. but implement my own custom job metrics and monitor them.
>
> What's your opinion about my options? How do you people solve this
> problem? Anything Spark specific?
> I'll be grateful for any advice in this subject.
> Thanks!
> Krzysiek
>
>


-- 
Chen Song


Notification on Spark Streaming job failure

2015-09-17 Thread Krzysztof Zarzycki
Hi there Spark Community,
I would like to ask you for an advice: I'm running Spark Streaming jobs in
production. Sometimes these jobs fail and I would like to get email
notification about it. Do you know how I can set up Spark to notify me by
email if my job fails? Or do I have to use external monitoring tool?
I'm thinking of the following options:
1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked for
it as well but couldn't find any YARN feature to do it.
2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, Luigi.
Those are created rather for batch jobs, not streaming, but could work. Has
anyone tried that?
3. Run job driver under "monit" tool and catch the failure and send an
email about it. Currently I'm deploying with yarn-cluster mode and I would
need to resign from it to run under monit
4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and use
Spark metrics. And then implement alerting in those. Can I get information
of failed jobs in Spark metrics?
5. As 4. but implement my own custom job metrics and monitor them.

What's your opinion about my options? How do you people solve this problem?
Anything Spark specific?
I'll be grateful for any advice in this subject.
Thanks!
Krzysiek