Re: Possible to make one executor be able to work on multiple tasks simultaneously?

Sean Owen Tue, 02 Sep 2014 10:06:31 -0700

+user@

An executor is specific to an application, but an application can be
executing many jobs at once. So as I understand many jobs' tasks can
be executing at once on an executor.


You may not use your full 80-way parallelism if, for example, your
data set doesn't have 80 partitions. I also believe Spark will not
necessarily spread the load over executors, instead preferring to
respect data and rack locality if possible. Those are two reasons you
might see only 4 executors active. If you mean only 4 executors exist
at all, is it possible the other 4 can't provide the memory you're
asking for?


On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <v...@paxata.com> wrote:
> Actually one more question, since in preliminary runs I wasn't sure if I
> understood what's going on. Are the cores allocated to an executor able to
> execute tasks for different jobs simultaneously, or just for one job at a
> time? I have 10 workers with 8 cores each, and it appeared that one job got
> four executors at once, then four more later on. The system wasn't anywhere
> near saturation of 80 cores so I would've expected all 8 cores to be running
> simultaneously.
>
> If there's value to these questions, please reply back to the list.
>
>
> On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <v...@paxata.com> wrote:
>>
>> Thank you for the help, guys. So as I expected, I didn't fully understand
>> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize
>> that by setting to > 1 it would mean an executor could operate on multiple
>> tasks simultaneously. I just thought it was a hint to Spark that that
>> executor could be expected to use that many threads, but otherwise I had not
>> understood that it affected the scheduler that way. Thanks!
>>
>>
>> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <matei.zaha...@gmail.com>
>> wrote:
>>>
>>>
>>> Hey Victor,
>>>
>>> As Sean said, executors actually execute multiple tasks at a time. The
>>> only reasons they wouldn't are either (1) if you launched an executor with
>>> just 1 core (you can configure how many cores the executors will use when
>>> you set up your Worker, or it will look at your system by default) or (2) if
>>> your tasks are acquiring some kind of global lock, so only one can run at a
>>> time.
>>>
>>> To test this, do the following:
>>> - Launch your standalone cluster (you can do it on just one machine by
>>> adding just "localhost" in the slaves file)
>>> - Go to http://:4040 and look at the worker list. Do you see workers with
>>> more than 1 core? If not, you need to launch the workers by hand or set
>>> SPARK_WORKER_CORES in conf/spark-env.sh.
>>> - Run your application. Make sure it has enough pending tasks for your
>>> cores in the driver web UI (http://:4040), and if so, jstack one of the
>>> CoarseGrainedExecutor processes on a worker to see what the threads are
>>> doing. (Look for threads that contain TaskRunner.run in them)
>>>
>>> You can also try a simple CPU-bound job that launches lots of tasks like
>>> this to see that all cores are being used:
>>>
>>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to
>>> 2000000000).product).count()
>>>
>>> Each task here takes 1-2 seconds to execute and there are 1000 of them so
>>> it should fill up your cluster.
>>>
>>> Matei
>>>
>>>
>>>
>>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen
>>> (v...@paxata.com(mailto:v...@paxata.com)) wrote:
>>>
>>> > I'm pretty sure my terminology matches that doc except the doc makes no
>>> > explicit mention of machines. In standalone mode, you can spawn multiple
>>> > workers on a single machine and each will babysit one executor (per
>>> > application). In my observation as well each executor can be assigned many
>>> > tasks but operates on one at a time. If there's a way to have it execute 
>>> > in
>>> > multiple tasks simultaneously in a single VM can you please show me how?
>>> > Maybe I'm missing the requisite configuration options, no matter how 
>>> > common
>>> > or trivial...
>>> >
>>> > On Sunday, August 31, 2014, Sean Owen wrote:
>>> > > The confusion may be your use of 'worker', which isn't matching what
>>> > > 'worker' means in Spark. Have a look at
>>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of course
>>> > > one VM can run many tasks at once; that's already how Spark works.
>>> > >
>>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote:
>>> > > > I might not be making myself clear, so sorry about that. I
>>> > > > understand that a
>>> > > > machine can have as many spark workers as you'd like, for example
>>> > > > one per
>>> > > > core. A worker may be assigned to a pool for one or more
>>> > > > applications, but
>>> > > > for a single application let's just say a single worker will have
>>> > > > at most a
>>> > > > single executor. An executor can be assigned multiple tasks in its
>>> > > > queue,
>>> > > > but will work on one task at a time only.
>>> > > >
>>> > > > In local mode, you can specify the number of executors you want and
>>> > > > they
>>> > > > will all reside in the same vm. Those executors will each be able
>>> > > > to operate
>>> > > > on a single task at a time, though they may also have an arbitrary
>>> > > > number of
>>> > > > tasks in their queue. From the standpoint of a vm, however, a vm
>>> > > > can
>>> > > > therefore operate on multiple tasks simultaneously in local mode.
>>> > > >
>>> > > > What I want is something similar in standalone mode (or mesos or
>>> > > > YARN if
>>> > > > that's the only way to do it) whereby I can have a single executor
>>> > > > vm handle
>>> > > > many tasks concurrently. Is this possible? Is my problem statement
>>> > > > clear? If
>>> > > > there's a misconception on my part on the deployment of a spark
>>> > > > cluster I'd
>>> > > > like to know it, but as of currently what we have deployed is like
>>> > > > my first
>>> > > > paragraph.
>>> > > >
>>> > > >
>>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote:
>>> > > >>
>>> > > >> A machine should have one worker, and many executors per worker
>>> > > >> (one per
>>> > > >> app). An executor runs many tasks. This is how it works for me in
>>> > > >> standalone
>>> > > >> mode at least!
>>> > > >>
>>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote:
>>> > > >>>
>>> > > >>> A machine has many workers and a worker has an executor. I want
>>> > > >>> the
>>> > > >>> executor to handle many tasks at once, like in local mode.
>>> > > >>>
>>> > > >>>
>>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote:
>>> > > >>>>
>>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks per
>>> > > >>>> machine
>>> > > >>>> at once.
>>> > > >>>>
>>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote:
>>> > > >>>>>
>>> > > >>>>> Standalone. I'd love to tell it that my one executor can
>>> > > >>>>> simultaneously
>>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of
>>> > > >>>>> distinct jobs.
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia
>>> > > >>>>> wrote:
>>> > > >>>>>>
>>> > > >>>>>> Yes, executors run one task per core of your machine by
>>> > > >>>>>> default. You
>>> > > >>>>>> can also manually launch them with more worker threads than
>>> > > >>>>>> you have cores.
>>> > > >>>>>> What cluster manager are you on?
>>> > > >>>>>>
>>> > > >>>>>> Matei
>>> > > >>>>>>
>>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen
>>> > > >>>>>> (v...@paxata.com(javascript:;)) wrote:
>>> > > >>>>>>
>>> > > >>>>>> I'm thinking of local mode where multiple virtual executors
>>> > > >>>>>> occupy the
>>> > > >>>>>> same vm. Can we have the same configuration in spark
>>> > > >>>>>> standalone cluster
>>> > > >>>>>> mode?
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>
>>> > > >
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Possible to make one executor be able to work on multiple tasks simultaneously?

Reply via email to