Thanks everyone for sharing your ideas. Very useful. I appreciate. On Fri, Apr 7, 2017 at 10:40 AM, Sam Elamin <hussam.ela...@gmail.com> wrote:
> Definitely agree with gourav there. I wouldn't want jenkins to run my work > flow. Seems to me that you would only be using jenkins for its scheduling > capabilities > > Yes you can run tests but you wouldn't want it to run your orchestration > of jobs > > What happens if jenkijs goes down for any particular reason. How do you > have the conversation with your stakeholders that your pipeline is not > working and they don't have data because the build server is going through > an upgrade or going through an upgrade > > However to be fair I understand what you are saying Steve if someone is in > a place where you only have access to jenkins and have to go through hoops > to setup:get access to new instances then engineers will do what they > always do, find ways to game the system to get their work done > > > > > On Fri, 7 Apr 2017 at 16:17, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > >> Hi Steve, >> >> Why would you ever do that? You are suggesting the use of a CI tool as a >> workflow and orchestration engine. >> >> Regards, >> Gourav Sengupta >> >> On Fri, Apr 7, 2017 at 4:07 PM, Steve Loughran <ste...@hortonworks.com> >> wrote: >> >>> If you have Jenkins set up for some CI workflow, that can do scheduled >>> builds and tests. Works well if you can do some build test before even >>> submitting it to a remote cluster >>> >>> On 7 Apr 2017, at 10:15, Sam Elamin <hussam.ela...@gmail.com> wrote: >>> >>> Hi Shyla >>> >>> You have multiple options really some of which have been already listed >>> but let me try and clarify >>> >>> Assuming you have a spark application in a jar you have a variety of >>> options >>> >>> You have to have an existing spark cluster that is either running on EMR >>> or somewhere else. >>> >>> *Super simple / hacky* >>> Cron job on EC2 that calls a simple shell script that does a spart >>> submit to a Spark Cluster OR create or add step to an EMR cluster >>> >>> *More Elegant* >>> Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that >>> will do the above step but have scheduling and potential backfilling and >>> error handling(retries,alerts etc) >>> >>> AWS are coming out with glue <https://aws.amazon.com/glue/> soon that >>> does some Spark jobs but I do not think its available worldwide just yet >>> >>> Hope I cleared things up >>> >>> Regards >>> Sam >>> >>> >>> On Fri, Apr 7, 2017 at 6:05 AM, Gourav Sengupta < >>> gourav.sengu...@gmail.com> wrote: >>> >>>> Hi Shyla, >>>> >>>> why would you want to schedule a spark job in EC2 instead of EMR? >>>> >>>> Regards, >>>> Gourav >>>> >>>> On Fri, Apr 7, 2017 at 1:04 AM, shyla deshpande < >>>> deshpandesh...@gmail.com> wrote: >>>> >>>>> I want to run a spark batch job maybe hourly on AWS EC2 . What is the >>>>> easiest way to do this. Thanks >>>>> >>>> >>>> >>> >>> >>