Thanks, Himanish and Felix! On Sun, Mar 1, 2015 at 7:50 PM, Himanish Kushary <[email protected]> wrote:
> We are running our Spark jobs on Amazon AWS and are using AWS Datapipeline > for orchestration of the different spark jobs. AWS datapipeline provides > automatic EMR cluster provisioning, retry on failure,SNS notification etc. > out of the box and works well for us. > > > > > > On Sun, Mar 1, 2015 at 7:02 PM, Felix C <[email protected]> wrote: > >> We use Oozie as well, and it has worked well. >> The catch is each action in Oozie is separate and one cannot retain >> SparkContext or RDD, or leverage caching or temp table, going into another >> Oozie action. You could either save output to file or put all Spark >> processing into one Oozie action. >> >> --- Original Message --- >> >> From: "Mayur Rustagi" <[email protected]> >> Sent: February 28, 2015 7:07 PM >> To: "Qiang Cao" <[email protected]> >> Cc: "Ted Yu" <[email protected]>, "Ashish Nigam" < >> [email protected]>, "user" <[email protected]> >> Subject: Re: Tools to manage workflows on Spark >> >> Sorry not really. Spork is a way to migrate your existing pig scripts >> to Spark or write new pig jobs then can execute on spark. >> For orchestration you are better off using Oozie especially if you are >> using other execution engines/systems besides spark. >> >> >> Regards, >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoid.com <http://www.sigmoidanalytics.com/> >> @mayur_rustagi <http://www.twitter.com/mayur_rustagi> >> >> On Sat, Feb 28, 2015 at 6:59 PM, Qiang Cao <[email protected]> wrote: >> >> Thanks Mayur! I'm looking for something that would allow me to easily >> describe and manage a workflow on Spark. A workflow in my context is a >> composition of Spark applications that may depend on one another based on >> hdfs inputs/outputs. Is Spork a good fit? The orchestration I want is on >> app level. >> >> >> >> On Sat, Feb 28, 2015 at 9:38 PM, Mayur Rustagi <[email protected]> >> wrote: >> >> We do maintain it but in apache repo itself. However Pig cannot do >> orchestration for you. I am not sure what you are looking at from Pig in >> this context. >> >> Regards, >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoid.com <http://www.sigmoidanalytics.com/> >> @mayur_rustagi <http://www.twitter.com/mayur_rustagi> >> >> On Sat, Feb 28, 2015 at 6:36 PM, Ted Yu <[email protected]> wrote: >> >> Here was latest modification in spork repo: >> Mon Dec 1 10:08:19 2014 >> >> Not sure if it is being actively maintained. >> >> On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao <[email protected]> wrote: >> >> Thanks for the pointer, Ashish! I was also looking at Spork >> https://github.com/sigmoidanalytics/spork Pig-on-Spark), but wasn't sure >> if that's the right direction. >> >> On Sat, Feb 28, 2015 at 6:36 PM, Ashish Nigam <[email protected]> >> wrote: >> >> You have to call spark-submit from oozie. >> I used this link to get the idea for my implementation - >> >> >> http://mail-archives.apache.org/mod_mbox/oozie-user/201404.mbox/%3CCAHCsPn-0Grq1rSXrAZu35yy_i4T=fvovdox2ugpcuhkwmjp...@mail.gmail.com%3E >> >> >> >> On Feb 28, 2015, at 3:25 PM, Qiang Cao <[email protected]> wrote: >> >> Thanks, Ashish! Is Oozie integrated with Spark? I knew it can >> accommodate some Hadoop jobs. >> >> >> On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam <[email protected]> >> wrote: >> >> Qiang, >> Did you look at Oozie? >> We use oozie to run spark jobs in production. >> >> >> On Feb 28, 2015, at 2:45 PM, Qiang Cao <[email protected]> wrote: >> >> Hi Everyone, >> >> We need to deal with workflows on Spark. In our scenario, each workflow >> consists of multiple processing steps. Among different steps, there could >> be dependencies. I'm wondering if there are tools available that can >> help us schedule and manage workflows on Spark. I'm looking for something >> like pig on Hadoop, but it should fully function on Spark. >> >> Any suggestion? >> >> Thanks in advance! >> >> Qiang >> >> >> >> >> >> >> >> >> >> > > > -- > Thanks & Regards > Himanish >
