Re: Reuse Executor JVM across different JobContext
Can you give me more details on Spark's jobserver. Regards, Praveen On 18 Jan 2016 03:30, "Jia"wrote: > I guess all jobs submitted through JobServer are executed in the same JVM, > so RDDs cached by one job can be visible to all other jobs executed later. > On Jan 17, 2016, at 3:56 PM, Mark Hamstra wrote: > > Yes, that is one of the basic reasons to use a > jobserver/shared-SparkContext. Otherwise, in order share the data in an > RDD you have to use an external storage system, such as a distributed > filesystem or Tachyon. > > On Sun, Jan 17, 2016 at 1:52 PM, Jia wrote: > >> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, >> so that jobs can be submitted at different time and still share RDDs. >> >> Best Regards, >> Jia >> >> >> On Jan 17, 2016, at 3:44 PM, Mark Hamstra >> wrote: >> >> There is a 1-to-1 relationship between Spark Applications and >> SparkContexts -- fundamentally, a Spark Applications is a program that >> creates and uses a SparkContext, and that SparkContext is destroyed when >> then Application ends. A jobserver generically and the Spark JobServer >> specifically is an Application that keeps a SparkContext open for a long >> time and allows many Jobs to be be submitted and run using that shared >> SparkContext. >> >> More than one Application/SparkContext unavoidably implies more than one >> JVM process per Worker -- Applications/SparkContexts cannot share JVM >> processes. >> >> On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: >> >>> Hi, Mark, sorry for the confusion. >>> >>> Let me clarify, when an application is submitted, the master will tell >>> each Spark worker to spawn an executor JVM process. All the task sets of >>> the application will be executed by the executor. After the application >>> runs to completion. The executor process will be killed. >>> But I hope that all applications submitted can run in the same executor, >>> can JobServer do that? If so, it’s really good news! >>> >>> Best Regards, >>> Jia >>> >>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra >>> wrote: >>> >>> You've still got me confused. The SparkContext exists at the Driver, >>> not on an Executor. >>> >>> Many Jobs can be run by a SparkContext -- it is a common pattern to use >>> something like the Spark Jobserver where all Jobs are run through a shared >>> SparkContext. >>> >>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou >>> wrote: >>> Hi, Mark, sorry, I mean SparkContext. I mean to change Spark into running all submitted jobs (SparkContexts) in one executor JVM. Best Regards, Jia On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra wrote: > -dev > > What do you mean by JobContext? That is a Hadoop mapreduce concept, > not Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou > wrote: > >> Dear all, >> >> Is there a way to reuse executor JVM across different JobContexts? >> Thanks. >> >> Best Regards, >> Jia >> > > >>> >>> >> >> > >
Re: Reuse Executor JVM across different JobContext
Hi, Praveen, have you checked out this, which might have the details you need: https://spark-summit.org/2014/wp-content/uploads/2014/07/Spark-Job-Server-Easy-Spark-Job-Management-Chan-Chu.pdf Best Regards, Jia On Jan 19, 2016, at 7:28 AM, praveen Swrote: > Can you give me more details on Spark's jobserver. > > Regards, > Praveen > > On 18 Jan 2016 03:30, "Jia" wrote: > I guess all jobs submitted through JobServer are executed in the same JVM, so > RDDs cached by one job can be visible to all other jobs executed later. > On Jan 17, 2016, at 3:56 PM, Mark Hamstra wrote: > >> Yes, that is one of the basic reasons to use a >> jobserver/shared-SparkContext. Otherwise, in order share the data in an RDD >> you have to use an external storage system, such as a distributed filesystem >> or Tachyon. >> >> On Sun, Jan 17, 2016 at 1:52 PM, Jia wrote: >> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so >> that jobs can be submitted at different time and still share RDDs. >> >> Best Regards, >> Jia >> >> >> On Jan 17, 2016, at 3:44 PM, Mark Hamstra wrote: >> >>> There is a 1-to-1 relationship between Spark Applications and SparkContexts >>> -- fundamentally, a Spark Applications is a program that creates and uses a >>> SparkContext, and that SparkContext is destroyed when then Application >>> ends. A jobserver generically and the Spark JobServer specifically is an >>> Application that keeps a SparkContext open for a long time and allows many >>> Jobs to be be submitted and run using that shared SparkContext. >>> >>> More than one Application/SparkContext unavoidably implies more than one >>> JVM process per Worker -- Applications/SparkContexts cannot share JVM >>> processes. >>> >>> On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: >>> Hi, Mark, sorry for the confusion. >>> >>> Let me clarify, when an application is submitted, the master will tell each >>> Spark worker to spawn an executor JVM process. All the task sets of the >>> application will be executed by the executor. After the application runs to >>> completion. The executor process will be killed. >>> But I hope that all applications submitted can run in the same executor, >>> can JobServer do that? If so, it’s really good news! >>> >>> Best Regards, >>> Jia >>> >>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra wrote: >>> You've still got me confused. The SparkContext exists at the Driver, not on an Executor. Many Jobs can be run by a SparkContext -- it is a common pattern to use something like the Spark Jobserver where all Jobs are run through a shared SparkContext. On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: Hi, Mark, sorry, I mean SparkContext. I mean to change Spark into running all submitted jobs (SparkContexts) in one executor JVM. Best Regards, Jia On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra wrote: -dev What do you mean by JobContext? That is a Hadoop mapreduce concept, not Spark. On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: Dear all, Is there a way to reuse executor JVM across different JobContexts? Thanks. Best Regards, Jia >>> >>> >> >> >
Re: Reuse Executor JVM across different JobContext
Yes, you can share RDDs with Tachyon, while keeping the data in memory. Spark jobs can write to a Tachyon path (tachyon://host:port/path/) and other jobs can read from the same path. Here is a presentation that includes that use case: http://www.slideshare.net/TachyonNexus/tachyon-presentation-at-ampcamp-6-november-2015 Thanks, Gene On Sun, Jan 17, 2016 at 1:56 PM, Mark Hamstrawrote: > Yes, that is one of the basic reasons to use a > jobserver/shared-SparkContext. Otherwise, in order share the data in an > RDD you have to use an external storage system, such as a distributed > filesystem or Tachyon. > > On Sun, Jan 17, 2016 at 1:52 PM, Jia wrote: > >> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, >> so that jobs can be submitted at different time and still share RDDs. >> >> Best Regards, >> Jia >> >> >> On Jan 17, 2016, at 3:44 PM, Mark Hamstra >> wrote: >> >> There is a 1-to-1 relationship between Spark Applications and >> SparkContexts -- fundamentally, a Spark Applications is a program that >> creates and uses a SparkContext, and that SparkContext is destroyed when >> then Application ends. A jobserver generically and the Spark JobServer >> specifically is an Application that keeps a SparkContext open for a long >> time and allows many Jobs to be be submitted and run using that shared >> SparkContext. >> >> More than one Application/SparkContext unavoidably implies more than one >> JVM process per Worker -- Applications/SparkContexts cannot share JVM >> processes. >> >> On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: >> >>> Hi, Mark, sorry for the confusion. >>> >>> Let me clarify, when an application is submitted, the master will tell >>> each Spark worker to spawn an executor JVM process. All the task sets of >>> the application will be executed by the executor. After the application >>> runs to completion. The executor process will be killed. >>> But I hope that all applications submitted can run in the same executor, >>> can JobServer do that? If so, it’s really good news! >>> >>> Best Regards, >>> Jia >>> >>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra >>> wrote: >>> >>> You've still got me confused. The SparkContext exists at the Driver, >>> not on an Executor. >>> >>> Many Jobs can be run by a SparkContext -- it is a common pattern to use >>> something like the Spark Jobserver where all Jobs are run through a shared >>> SparkContext. >>> >>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou >>> wrote: >>> Hi, Mark, sorry, I mean SparkContext. I mean to change Spark into running all submitted jobs (SparkContexts) in one executor JVM. Best Regards, Jia On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra wrote: > -dev > > What do you mean by JobContext? That is a Hadoop mapreduce concept, > not Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou > wrote: > >> Dear all, >> >> Is there a way to reuse executor JVM across different JobContexts? >> Thanks. >> >> Best Regards, >> Jia >> > > >>> >>> >> >> >
Re: Reuse Executor JVM across different JobContext
-dev What do you mean by JobContext? That is a Hadoop mapreduce concept, not Spark. On Sun, Jan 17, 2016 at 7:29 AM, Jia Zouwrote: > Dear all, > > Is there a way to reuse executor JVM across different JobContexts? Thanks. > > Best Regards, > Jia >
Re: Reuse Executor JVM across different JobContext
Yes, that is one of the basic reasons to use a jobserver/shared-SparkContext. Otherwise, in order share the data in an RDD you have to use an external storage system, such as a distributed filesystem or Tachyon. On Sun, Jan 17, 2016 at 1:52 PM, Jiawrote: > Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, > so that jobs can be submitted at different time and still share RDDs. > > Best Regards, > Jia > > > On Jan 17, 2016, at 3:44 PM, Mark Hamstra wrote: > > There is a 1-to-1 relationship between Spark Applications and > SparkContexts -- fundamentally, a Spark Applications is a program that > creates and uses a SparkContext, and that SparkContext is destroyed when > then Application ends. A jobserver generically and the Spark JobServer > specifically is an Application that keeps a SparkContext open for a long > time and allows many Jobs to be be submitted and run using that shared > SparkContext. > > More than one Application/SparkContext unavoidably implies more than one > JVM process per Worker -- Applications/SparkContexts cannot share JVM > processes. > > On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: > >> Hi, Mark, sorry for the confusion. >> >> Let me clarify, when an application is submitted, the master will tell >> each Spark worker to spawn an executor JVM process. All the task sets of >> the application will be executed by the executor. After the application >> runs to completion. The executor process will be killed. >> But I hope that all applications submitted can run in the same executor, >> can JobServer do that? If so, it’s really good news! >> >> Best Regards, >> Jia >> >> On Jan 17, 2016, at 3:09 PM, Mark Hamstra >> wrote: >> >> You've still got me confused. The SparkContext exists at the Driver, not >> on an Executor. >> >> Many Jobs can be run by a SparkContext -- it is a common pattern to use >> something like the Spark Jobserver where all Jobs are run through a shared >> SparkContext. >> >> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou >> wrote: >> >>> Hi, Mark, sorry, I mean SparkContext. >>> I mean to change Spark into running all submitted jobs (SparkContexts) >>> in one executor JVM. >>> >>> Best Regards, >>> Jia >>> >>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra >>> wrote: >>> -dev What do you mean by JobContext? That is a Hadoop mapreduce concept, not Spark. On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: > Dear all, > > Is there a way to reuse executor JVM across different JobContexts? > Thanks. > > Best Regards, > Jia > >>> >> >> > >
Re: Reuse Executor JVM across different JobContext
Hi, Mark, sorry for the confusion. Let me clarify, when an application is submitted, the master will tell each Spark worker to spawn an executor JVM process. All the task sets of the application will be executed by the executor. After the application runs to completion. The executor process will be killed. But I hope that all applications submitted can run in the same executor, can JobServer do that? If so, it’s really good news! Best Regards, Jia On Jan 17, 2016, at 3:09 PM, Mark Hamstrawrote: > You've still got me confused. The SparkContext exists at the Driver, not on > an Executor. > > Many Jobs can be run by a SparkContext -- it is a common pattern to use > something like the Spark Jobserver where all Jobs are run through a shared > SparkContext. > > On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: > Hi, Mark, sorry, I mean SparkContext. > I mean to change Spark into running all submitted jobs (SparkContexts) in one > executor JVM. > > Best Regards, > Jia > > On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra wrote: > -dev > > What do you mean by JobContext? That is a Hadoop mapreduce concept, not > Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: > Dear all, > > Is there a way to reuse executor JVM across different JobContexts? Thanks. > > Best Regards, > Jia > > >
Re: Reuse Executor JVM across different JobContext
Hi, Mark, sorry for the confusion. Let me clarify, when an application is submitted, the master will tell each Spark worker to spawn an executor JVM process. All the task sets of the application will be executed by the executor. After the application runs to completion. The executor process will be killed. But I hope that all applications submitted can run in the same executor, can JobServer do that? If so, it’s really good news! Best Regards, Jia On Jan 17, 2016, at 3:09 PM, Mark Hamstrawrote: > You've still got me confused. The SparkContext exists at the Driver, not on > an Executor. > > Many Jobs can be run by a SparkContext -- it is a common pattern to use > something like the Spark Jobserver where all Jobs are run through a shared > SparkContext. > > On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: > Hi, Mark, sorry, I mean SparkContext. > I mean to change Spark into running all submitted jobs (SparkContexts) in one > executor JVM. > > Best Regards, > Jia > > On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra wrote: > -dev > > What do you mean by JobContext? That is a Hadoop mapreduce concept, not > Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: > Dear all, > > Is there a way to reuse executor JVM across different JobContexts? Thanks. > > Best Regards, > Jia > > >
Re: Reuse Executor JVM across different JobContext
There is a 1-to-1 relationship between Spark Applications and SparkContexts -- fundamentally, a Spark Applications is a program that creates and uses a SparkContext, and that SparkContext is destroyed when then Application ends. A jobserver generically and the Spark JobServer specifically is an Application that keeps a SparkContext open for a long time and allows many Jobs to be be submitted and run using that shared SparkContext. More than one Application/SparkContext unavoidably implies more than one JVM process per Worker -- Applications/SparkContexts cannot share JVM processes. On Sun, Jan 17, 2016 at 1:15 PM, Jiawrote: > Hi, Mark, sorry for the confusion. > > Let me clarify, when an application is submitted, the master will tell > each Spark worker to spawn an executor JVM process. All the task sets of > the application will be executed by the executor. After the application > runs to completion. The executor process will be killed. > But I hope that all applications submitted can run in the same executor, > can JobServer do that? If so, it’s really good news! > > Best Regards, > Jia > > On Jan 17, 2016, at 3:09 PM, Mark Hamstra wrote: > > You've still got me confused. The SparkContext exists at the Driver, not > on an Executor. > > Many Jobs can be run by a SparkContext -- it is a common pattern to use > something like the Spark Jobserver where all Jobs are run through a shared > SparkContext. > > On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: > >> Hi, Mark, sorry, I mean SparkContext. >> I mean to change Spark into running all submitted jobs (SparkContexts) in >> one executor JVM. >> >> Best Regards, >> Jia >> >> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra >> wrote: >> >>> -dev >>> >>> What do you mean by JobContext? That is a Hadoop mapreduce concept, not >>> Spark. >>> >>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou >>> wrote: >>> Dear all, Is there a way to reuse executor JVM across different JobContexts? Thanks. Best Regards, Jia >>> >>> >> > >
Re: Reuse Executor JVM across different JobContext
Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so that jobs can be submitted at different time and still share RDDs. Best Regards, Jia On Jan 17, 2016, at 3:44 PM, Mark Hamstrawrote: > There is a 1-to-1 relationship between Spark Applications and SparkContexts > -- fundamentally, a Spark Applications is a program that creates and uses a > SparkContext, and that SparkContext is destroyed when then Application ends. > A jobserver generically and the Spark JobServer specifically is an > Application that keeps a SparkContext open for a long time and allows many > Jobs to be be submitted and run using that shared SparkContext. > > More than one Application/SparkContext unavoidably implies more than one JVM > process per Worker -- Applications/SparkContexts cannot share JVM processes. > > On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: > Hi, Mark, sorry for the confusion. > > Let me clarify, when an application is submitted, the master will tell each > Spark worker to spawn an executor JVM process. All the task sets of the > application will be executed by the executor. After the application runs to > completion. The executor process will be killed. > But I hope that all applications submitted can run in the same executor, can > JobServer do that? If so, it’s really good news! > > Best Regards, > Jia > > On Jan 17, 2016, at 3:09 PM, Mark Hamstra wrote: > >> You've still got me confused. The SparkContext exists at the Driver, not on >> an Executor. >> >> Many Jobs can be run by a SparkContext -- it is a common pattern to use >> something like the Spark Jobserver where all Jobs are run through a shared >> SparkContext. >> >> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: >> Hi, Mark, sorry, I mean SparkContext. >> I mean to change Spark into running all submitted jobs (SparkContexts) in >> one executor JVM. >> >> Best Regards, >> Jia >> >> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra >> wrote: >> -dev >> >> What do you mean by JobContext? That is a Hadoop mapreduce concept, not >> Spark. >> >> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: >> Dear all, >> >> Is there a way to reuse executor JVM across different JobContexts? Thanks. >> >> Best Regards, >> Jia >> >> >> > >
Re: Reuse Executor JVM across different JobContext
Hi, Mark, sorry, I mean SparkContext. I mean to change Spark into running all submitted jobs (SparkContexts) in one executor JVM. Best Regards, Jia On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstrawrote: > -dev > > What do you mean by JobContext? That is a Hadoop mapreduce concept, not > Spark. > > On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: > >> Dear all, >> >> Is there a way to reuse executor JVM across different JobContexts? Thanks. >> >> Best Regards, >> Jia >> > >
Re: Reuse Executor JVM across different JobContext
You've still got me confused. The SparkContext exists at the Driver, not on an Executor. Many Jobs can be run by a SparkContext -- it is a common pattern to use something like the Spark Jobserver where all Jobs are run through a shared SparkContext. On Sun, Jan 17, 2016 at 12:57 PM, Jia Zouwrote: > Hi, Mark, sorry, I mean SparkContext. > I mean to change Spark into running all submitted jobs (SparkContexts) in > one executor JVM. > > Best Regards, > Jia > > On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra > wrote: > >> -dev >> >> What do you mean by JobContext? That is a Hadoop mapreduce concept, not >> Spark. >> >> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: >> >>> Dear all, >>> >>> Is there a way to reuse executor JVM across different JobContexts? >>> Thanks. >>> >>> Best Regards, >>> Jia >>> >> >> >
Re: Reuse Executor JVM across different JobContext
I guess all jobs submitted through JobServer are executed in the same JVM, so RDDs cached by one job can be visible to all other jobs executed later. On Jan 17, 2016, at 3:56 PM, Mark Hamstrawrote: > Yes, that is one of the basic reasons to use a jobserver/shared-SparkContext. > Otherwise, in order share the data in an RDD you have to use an external > storage system, such as a distributed filesystem or Tachyon. > > On Sun, Jan 17, 2016 at 1:52 PM, Jia wrote: > Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, so > that jobs can be submitted at different time and still share RDDs. > > Best Regards, > Jia > > > On Jan 17, 2016, at 3:44 PM, Mark Hamstra wrote: > >> There is a 1-to-1 relationship between Spark Applications and SparkContexts >> -- fundamentally, a Spark Applications is a program that creates and uses a >> SparkContext, and that SparkContext is destroyed when then Application ends. >> A jobserver generically and the Spark JobServer specifically is an >> Application that keeps a SparkContext open for a long time and allows many >> Jobs to be be submitted and run using that shared SparkContext. >> >> More than one Application/SparkContext unavoidably implies more than one JVM >> process per Worker -- Applications/SparkContexts cannot share JVM processes. >> >> >> On Sun, Jan 17, 2016 at 1:15 PM, Jia wrote: >> Hi, Mark, sorry for the confusion. >> >> Let me clarify, when an application is submitted, the master will tell each >> Spark worker to spawn an executor JVM process. All the task sets of the >> application will be executed by the executor. After the application runs to >> completion. The executor process will be killed. >> But I hope that all applications submitted can run in the same executor, can >> JobServer do that? If so, it’s really good news! >> >> Best Regards, >> Jia >> >> On Jan 17, 2016, at 3:09 PM, Mark Hamstra wrote: >> >>> You've still got me confused. The SparkContext exists at the Driver, not >>> on an Executor. >>> >>> Many Jobs can be run by a SparkContext -- it is a common pattern to use >>> something like the Spark Jobserver where all Jobs are run through a shared >>> SparkContext. >>> >>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou wrote: >>> Hi, Mark, sorry, I mean SparkContext. >>> I mean to change Spark into running all submitted jobs (SparkContexts) in >>> one executor JVM. >>> >>> Best Regards, >>> Jia >>> >>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra >>> wrote: >>> -dev >>> >>> What do you mean by JobContext? That is a Hadoop mapreduce concept, not >>> Spark. >>> >>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou wrote: >>> Dear all, >>> >>> Is there a way to reuse executor JVM across different JobContexts? Thanks. >>> >>> Best Regards, >>> Jia >>> >>> >>> >> >> > >
Reuse Executor JVM across different JobContext
Dear all, Is there a way to reuse executor JVM across different JobContexts? Thanks. Best Regards, Jia