+1 to the two jobs idea. Not only do you get the benefits that Stephan mentioned, but Aurora already assumes that a job is a pool of like tasks. This assumption is used right now for the maintenance tooling (where it try to keep 95% of the instances up at a given time), but it can be used elsewhere in the future. By going the two jobs route you avoid the case where Aurora reschedules shard 0 and disrupts all of the work done.
It is possible to make shard 0 the coordinator/driver task and I have seen production systems which do this. However, I advise against it for the reasons mentioned. If you do find creating 2 jobs is more difficult to orchestrate, could you please detail why? On Tue, Dec 22, 2015 at 9:38 AM, Erb, Stephan <[email protected]> wrote: > Hi Chris, > > > we are spawning an internal batch processing framework on > Aurora, consisting of a single master and multiple workers. > > > We opted for the 2 jobs idea. Main advantage I see with this approach that > you actually keep separate things separate without having to teach all > external systems (service discovery, load balancer, your monitoring > solution, etc) that the first instance is different. > > > Best Regards, > > Stephan > > > > > > ------------------------------ > *From:* Chris Bannister <[email protected]> > *Sent:* Tuesday, December 22, 2015 2:47 PM > *To:* [email protected] > *Subject:* Launching master/slave jobs in Auora > > Hi, I'm doing some work to get Apache Spark running in Aurora and it seems > to work reasonably well without many changes to Spark, the only issue I'm > running into is launching it over many instances. > > Spark runs in a driver executor model, where the driver coordinates works > on the executors. The problem I have is that I want to launch the executors > and drivers independently, ie I want to have 10 executors and 1 driver. I > can accomplish this by having 2 jobs, a driver and an executor job, but > launching this seems a bit complicated to orchestrate. Another option would > be to declare the job with 2 tasks, have the driver run on shard 0 and > executors on the rest. > > Has anyone had any experience with running similar systems in Aurora? I > imagine Heron must have to do something similar, launching the topology > master and workers. > > Chris > -- Zameer Manji
