Yes, you can submit job remotely.


> On Nov 19, 2015, at 10:10 AM, Vikram Kone <vikramk...@gmail.com> wrote:
> 
> Hi Feng,
> Does airflow allow remote submissions of spark jobs via spark-submit?
> 
> On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu <fengdo...@everstring.com 
> <mailto:fengdo...@everstring.com>> wrote:
> Hi,
> 
> we use ‘Airflow'  as our job workflow scheduler.
> 
> 
> 
> 
>> On Nov 19, 2015, at 9:47 AM, Vikram Kone <vikramk...@gmail.com 
>> <mailto:vikramk...@gmail.com>> wrote:
>> 
>> Hi Nick,
>> Quick question about spark-submit command executed from azkaban with command 
>> job type.
>> I see that when I press kill in azkaban portal on a spark-submit job, it 
>> doesn't actually kill the application on spark master and it continues to 
>> run even though azkaban thinks that it's killed.
>> How do you get around this? Is there a way to kill the spark-submit jobs 
>> from azkaban portal?
>> 
>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentre...@gmail.com 
>> <mailto:nick.pentre...@gmail.com>> wrote:
>> Hi Vikram,
>> 
>> We use Azkaban (2.5.0) in our production workflow scheduling. We just use 
>> local mode deployment and it is fairly easy to set up. It is pretty easy to 
>> use and has a nice scheduling and logging interface, as well as SLAs (like 
>> kill job and notify if it doesn't complete in 3 hours or whatever). 
>> 
>> However Spark support is not present directly - we run everything with shell 
>> scripts and spark-submit. There is a plugin interface where one could create 
>> a Spark plugin, but I found it very cumbersome when I did investigate and 
>> didn't have the time to work through it to develop that.
>> 
>> It has some quirks and while there is actually a REST API for adding jos and 
>> dynamically scheduling jobs, it is not documented anywhere so you kinda have 
>> to figure it out for yourself. But in terms of ease of use I found it way 
>> better than Oozie. I haven't tried Chronos, and it seemed quite involved to 
>> set up. Haven't tried Luigi either.
>> 
>> Spark job server is good but as you say lacks some stuff like scheduling and 
>> DAG type workflows (independent of spark-defined job flows).
>> 
>> 
>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com 
>> <mailto:jornfra...@gmail.com>> wrote:
>> Check also falcon in combination with oozie
>> 
>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid 
>> <mailto:h...@linkedin.com.invalid>> a écrit :
>> Looks like Oozie can satisfy most of your requirements. 
>> 
>> 
>> 
>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com 
>> <mailto:vikramk...@gmail.com>> wrote:
>> Hi,
>> I'm looking for open source workflow tools/engines that allow us to schedule 
>> spark jobs on a datastax cassandra cluster. Since there are tonnes of 
>> alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I wanted to 
>> check with people here to see what they are using today.
>> 
>> Some of the requirements of the workflow engine that I'm looking for are
>> 
>> 1. First class support for submitting Spark jobs on Cassandra. Not some 
>> wrapper Java code to submit tasks.
>> 2. Active open source community support and well tested at production scale.
>> 3. Should be dead easy to write job dependencices using XML or web interface 
>> . Ex; job A depends on Job B and Job C, so run Job A after B and C are 
>> finished. Don't need to write full blown java applications to specify job 
>> parameters and dependencies. Should be very simple to use.
>> 4. Time based  recurrent scheduling. Run the spark jobs at a given time 
>> every hour or day or week or month.
>> 5. Job monitoring, alerting on failures and email notifications on daily 
>> basis.
>> 
>> I have looked at Ooyala's spark job server which seems to be hated towards 
>> making spark jobs run faster by sharing contexts between the jobs but isn't 
>> a full blown workflow engine per se. A combination of spark job server and 
>> workflow engine would be ideal 
>> 
>> Thanks for the inputs
>> 
>> 
>> 
> 
> 

Reply via email to