You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174.
On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Upvote for the multitanency requirement. > > I'm also building a data analytic platform and there'll be multiple users > running queries and computations simultaneously. One of the paint point is > control of resource size. Users don't really know how much nodes they need, > they always use as much as possible... The result is lots of wasted resource > in our Yarn cluster. > > A way to 1) allow multiple spark context to share the same resource or 2) > add dynamic resource management for Yarn mode is very much wanted. > > Jianshi > > On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar >> <ashwinshanka...@gmail.com> wrote: >> >> That's not something you might want to do usually. In general, a >> >> SparkContext maps to a user application >> > >> > My question was basically this. In this page in the official doc, under >> > "Scheduling within an application" section, it talks about multiuser and >> > fair sharing within an app. How does multiuser within an application >> > work(how users connect to an app,run their stuff) ? When would I want to >> > use >> > this ? >> >> I see. The way I read that page is that Spark supports all those >> scheduling options; but Spark doesn't give you the means to actually >> be able to submit jobs from different users to a running SparkContext >> hosted on a different process. For that, you'll need something like >> the job server that I referenced before, or write your own framework >> for supporting that. >> >> Personally, I'd use the information on that page when dealing with >> concurrent jobs in the same SparkContext, but still restricted to the >> same user. I'd avoid trying to create any application where a single >> SparkContext is trying to be shared by multiple users in any way. >> >> >> As far as I understand, this will cause executors to be killed, which >> >> means that Spark will start retrying tasks to rebuild the data that >> >> was held by those executors when needed. >> > >> > I basically wanted to find out if there were any "gotchas" related to >> > preemption on Spark. Things like say half of an application's executors >> > got >> > preempted say while doing reduceByKey, will the application progress >> > with >> > the remaining resources/fair share ? >> >> Jobs should still make progress as long as at least one executor is >> available. The gotcha would be the one I mentioned, where Spark will >> fail your job after "x" executors failed, which might be a common >> occurrence when preemption is enabled. That being said, it's a >> configurable option, so you can set "x" to a very large value and your >> job should keep on chugging along. >> >> The options you'd want to take a look at are: spark.task.maxFailures >> and spark.yarn.max.executor.failures >> >> -- >> Marcelo >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org