[What are the integration points of Apache Spark and Mesos? What are the true advantages of running Spark on Mesos?]
Spark runs on Mesos by acting as a framework/scheduler, and Spark out-of-box provides a coarse grain and a fine grain scheduler. I think the advantage of Spark running on Mesos is that, it's easy to define Spark specific scheduling needs using Mesos Scheduler APIs, and therefore can provide more optimization opportunities. Also by running on Mesos you can share the cluster with a lot more frameworks and we're adding a lot more support to make the multi-framework experience a lot nicer. [Can Mesos autoscale the cluster based on some signals/events coming out of Spark runtime or Spark consumers, then cause the consumers to run on the updated cluster, or signal to the consumers to restart themselves into an updated cluster?] Mesos won't autoscale out of the box, but Spark has dynamic allocation that can scale down and back up based on Spark metrics. Potentially more can be done on the Spark scheduler side that can signal more events and scale opportunities, so if you have ideas about Spark please feel free to email the dev@spark list, create JIRAs or we can chat on IRC/email for more discussions too. Tim On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <[email protected]> wrote: > A Mesos noob here. Could someone point me at the doc or summary for the > cluster autoscaling capabilities in Mesos? > > Is there a way to feed it events and have it detect the need to bring in > more machines or decommission machines? Is there a way to receive events > back that notify you that machines have been allocated or decommissioned? > > Would this work within a certain set of > "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and > grab machines from the cloud? > > What are the integration points of Apache Spark and Mesos? What are the > true advantages of running Spark on Mesos? > > Can Mesos autoscale the cluster based on some signals/events coming out of > Spark runtime or Spark consumers, then cause the consumers to run on the > updated cluster, or signal to the consumers to restart themselves into an > updated cluster? > > Thanks. >

