Hi Vikram, So you give up using yarn-cluster mode of launching Spark jobs, is that right? AFAIK when using yarn-cluster mode, the launch process (spark-submit) monitors job running on YARN, but if it is killed/dies, it just stops printing the state (RUNNING usually), without influencing the monitored job. So you cannot use monit features on the launch process (like restart on fail,etc.)
One more thing: Monit depends on pidfiles and spark-submit (in yarn-client mode) does not create them. Do you create them on your own? Thanks! Krzysiek 2015-10-07 6:37 GMT+02:00 Vikram Kone <vikramk...@gmail.com>: > We are using Monit to kick off spark streaming jobs n seems to work fine. > > > On Monday, September 28, 2015, Chen Song <chen.song...@gmail.com> wrote: > >> I am also interested specifically in monitoring and alerting on Spark >> streaming jobs. It will be helpful to get some general guidelines or advice >> on this, from people who implemented anything on this. >> >> On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki <k.zarzy...@gmail.com >> > wrote: >> >>> Hi there Spark Community, >>> I would like to ask you for an advice: I'm running Spark Streaming jobs >>> in production. Sometimes these jobs fail and I would like to get email >>> notification about it. Do you know how I can set up Spark to notify me by >>> email if my job fails? Or do I have to use external monitoring tool? >>> I'm thinking of the following options: >>> 1. As I'm running those jobs on YARN, monitor somehow YARN jobs. Looked >>> for it as well but couldn't find any YARN feature to do it. >>> 2. Run Spark Streaming job in some scheduler, like Oozie, Azkaban, >>> Luigi. Those are created rather for batch jobs, not streaming, but could >>> work. Has anyone tried that? >>> 3. Run job driver under "monit" tool and catch the failure and send an >>> email about it. Currently I'm deploying with yarn-cluster mode and I would >>> need to resign from it to run under monit.... >>> 4. Implement monitoring tool (like Graphite, Ganglia, Prometheus) and >>> use Spark metrics. And then implement alerting in those. Can I get >>> information of failed jobs in Spark metrics? >>> 5. As 4. but implement my own custom job metrics and monitor them. >>> >>> What's your opinion about my options? How do you people solve this >>> problem? Anything Spark specific? >>> I'll be grateful for any advice in this subject. >>> Thanks! >>> Krzysiek >>> >>> >> >> >> -- >> Chen Song >> >>