Hi Vikram, So you give up using yarn-cluster mode of launching Spark jobs,
is that right? AFAIK when using yarn-cluster mode, the launch process
(spark-submit) monitors job running on YARN, but if it is killed/dies, it
just stops printing the state (RUNNING usually), without influencing the
monitored job. So you cannot use monit features on the launch process (like
restart on fail,etc.)

One more thing: Monit depends on pidfiles and spark-submit (in yarn-client
mode) does not create them. Do you create them on your own?

Thanks!
Krzysiek



2015-10-07 6:37 GMT+02:00 Vikram Kone <vikramk...@gmail.com>:

> We are using Monit to kick off spark streaming jobs n seems to work fine.
>
>
> On Monday, September 28, 2015, Chen Song <chen.song...@gmail.com> wrote:
>
>> I am also interested specifically in monitoring and alerting on Spark
>> streaming jobs. It will be helpful to get some general guidelines or advice
>> on this, from people who implemented anything on this.
>>
>> On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki <k.zarzy...@gmail.com
>> > wrote:
>>
>>> Hi there Spark Community,
>>> I would like to ask you for an advice: I'm running Spark Streaming jobs
>>> in production. Sometimes these jobs fail and I would like to get email
>>> notification about it. Do you know how I can set up Spark to notify me by
>>> email if my job fails? Or do I have to use external monitoring tool?
>>> I'm thinking of the following options:
>>> 1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked
>>> for it as well but couldn't find any YARN feature to do it.
>>> 2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban,
>>> Luigi. Those are created rather for batch jobs, not streaming, but could
>>> work. Has anyone tried that?
>>> 3. Run job driver under "monit" tool and catch the failure and send an
>>> email about it. Currently I'm deploying with yarn-cluster mode and I would
>>> need to resign from it to run under monit....
>>> 4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and
>>> use Spark metrics. And then implement alerting in those. Can I get
>>> information of failed jobs in Spark metrics?
>>> 5. As 4. but implement my own custom job metrics and monitor them.
>>>
>>> What's your opinion about my options? How do you people solve this
>>> problem? Anything Spark specific?
>>> I'll be grateful for any advice in this subject.
>>> Thanks!
>>> Krzysiek
>>>
>>>
>>
>>
>> --
>> Chen Song
>>
>>

Reply via email to