Best practice for multi-user web controller in front of Spark

bethesda Tue, 11 Nov 2014 05:51:02 -0800

We are relatively new to spark and so far have been manually submitting
single jobs at a time for ML training, during our development process, using
spark-submit.  Each job accepts a small user-submitted data set and compares
it to every data set in our hdfs corpus, which only changes incrementally on
a daily basis.  (that detail is relevant to question 3 below)


Now we are ready to start building out the front-end, which will allow a
team of data scientists to submit their problems to the system via a web
front-end (web tier will be java).  Users could of course be submitting jobs
more or less simultaneously.  We want to make sure we understand how to best
structure this.  

Questions:  

1 - Does a new SparkContext get created in the web tier for each new request
for processing?  

2 - If so, how much time should we expect it to take for setting up the
context?  Our goal is to return a response to the users in under 10 seconds,
but if it takes many seconds to create a new context or otherwise set up the
job, then we need to adjust our expectations for what is possible.  From
using spark-shell one might conclude that it might take more than 10 seconds
to create a context, however it's not clear how much of that is
context-creation vs other things.

3 - (This last question perhaps deserves a post in and of itself:) if every
job is always comparing some little data structure to the same HDFS corpus
of data, what is the best pattern to use to cache the RDD's from HDFS so
they don't have to always be re-constituted from disk?  I.e. how can RDD's
be "shared" from the context of one job to the context of subsequent jobs? 
Or does something like memcache have to be used?

Thanks!
David



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practice-for-multi-user-web-controller-in-front-of-Spark-tp18581.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Best practice for multi-user web controller in front of Spark

Reply via email to