Re: Possible to make one executor be able to work on multiple tasks simultaneously?

Victor Tso-Guillen Tue, 02 Sep 2014 17:57:12 -0700

I'm pretty sure the issue was an interaction with another subsystem. Thanks
for your patience with me!



On Tue, Sep 2, 2014 at 10:05 AM, Sean Owen <so...@cloudera.com> wrote:

> +user@
>
> An executor is specific to an application, but an application can be
> executing many jobs at once. So as I understand many jobs' tasks can
> be executing at once on an executor.
>
> You may not use your full 80-way parallelism if, for example, your
> data set doesn't have 80 partitions. I also believe Spark will not
> necessarily spread the load over executors, instead preferring to
> respect data and rack locality if possible. Those are two reasons you
> might see only 4 executors active. If you mean only 4 executors exist
> at all, is it possible the other 4 can't provide the memory you're
> asking for?
>
>
> On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <v...@paxata.com>
> wrote:
> > Actually one more question, since in preliminary runs I wasn't sure if I
> > understood what's going on. Are the cores allocated to an executor able
> to
> > execute tasks for different jobs simultaneously, or just for one job at a
> > time? I have 10 workers with 8 cores each, and it appeared that one job
> got
> > four executors at once, then four more later on. The system wasn't
> anywhere
> > near saturation of 80 cores so I would've expected all 8 cores to be
> running
> > simultaneously.
> >
> > If there's value to these questions, please reply back to the list.
> >
> >
> > On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <v...@paxata.com>
> wrote:
> >>
> >> Thank you for the help, guys. So as I expected, I didn't fully
> understand
> >> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize
> >> that by setting to > 1 it would mean an executor could operate on
> multiple
> >> tasks simultaneously. I just thought it was a hint to Spark that that
> >> executor could be expected to use that many threads, but otherwise I
> had not
> >> understood that it affected the scheduler that way. Thanks!
> >>
> >>
> >> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <matei.zaha...@gmail.com
> >
> >> wrote:
> >>>
> >>>
> >>> Hey Victor,
> >>>
> >>> As Sean said, executors actually execute multiple tasks at a time. The
> >>> only reasons they wouldn't are either (1) if you launched an executor
> with
> >>> just 1 core (you can configure how many cores the executors will use
> when
> >>> you set up your Worker, or it will look at your system by default) or
> (2) if
> >>> your tasks are acquiring some kind of global lock, so only one can run
> at a
> >>> time.
> >>>
> >>> To test this, do the following:
> >>> - Launch your standalone cluster (you can do it on just one machine by
> >>> adding just "localhost" in the slaves file)
> >>> - Go to http://:4040 and look at the worker list. Do you see workers
> with
> >>> more than 1 core? If not, you need to launch the workers by hand or set
> >>> SPARK_WORKER_CORES in conf/spark-env.sh.
> >>> - Run your application. Make sure it has enough pending tasks for your
> >>> cores in the driver web UI (http://:4040), and if so, jstack one of
> the
> >>> CoarseGrainedExecutor processes on a worker to see what the threads are
> >>> doing. (Look for threads that contain TaskRunner.run in them)
> >>>
> >>> You can also try a simple CPU-bound job that launches lots of tasks
> like
> >>> this to see that all cores are being used:
> >>>
> >>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to
> >>> 2000000000).product).count()
> >>>
> >>> Each task here takes 1-2 seconds to execute and there are 1000 of them
> so
> >>> it should fill up your cluster.
> >>>
> >>> Matei
> >>>
> >>>
> >>>
> >>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen
> >>> (v...@paxata.com(mailto:v...@paxata.com)) wrote:
> >>>
> >>> > I'm pretty sure my terminology matches that doc except the doc makes
> no
> >>> > explicit mention of machines. In standalone mode, you can spawn
> multiple
> >>> > workers on a single machine and each will babysit one executor (per
> >>> > application). In my observation as well each executor can be
> assigned many
> >>> > tasks but operates on one at a time. If there's a way to have it
> execute in
> >>> > multiple tasks simultaneously in a single VM can you please show me
> how?
> >>> > Maybe I'm missing the requisite configuration options, no matter how
> common
> >>> > or trivial...
> >>> >
> >>> > On Sunday, August 31, 2014, Sean Owen wrote:
> >>> > > The confusion may be your use of 'worker', which isn't matching
> what
> >>> > > 'worker' means in Spark. Have a look at
> >>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of
> course
> >>> > > one VM can run many tasks at once; that's already how Spark works.
> >>> > >
> >>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote:
> >>> > > > I might not be making myself clear, so sorry about that. I
> >>> > > > understand that a
> >>> > > > machine can have as many spark workers as you'd like, for example
> >>> > > > one per
> >>> > > > core. A worker may be assigned to a pool for one or more
> >>> > > > applications, but
> >>> > > > for a single application let's just say a single worker will have
> >>> > > > at most a
> >>> > > > single executor. An executor can be assigned multiple tasks in
> its
> >>> > > > queue,
> >>> > > > but will work on one task at a time only.
> >>> > > >
> >>> > > > In local mode, you can specify the number of executors you want
> and
> >>> > > > they
> >>> > > > will all reside in the same vm. Those executors will each be able
> >>> > > > to operate
> >>> > > > on a single task at a time, though they may also have an
> arbitrary
> >>> > > > number of
> >>> > > > tasks in their queue. From the standpoint of a vm, however, a vm
> >>> > > > can
> >>> > > > therefore operate on multiple tasks simultaneously in local mode.
> >>> > > >
> >>> > > > What I want is something similar in standalone mode (or mesos or
> >>> > > > YARN if
> >>> > > > that's the only way to do it) whereby I can have a single
> executor
> >>> > > > vm handle
> >>> > > > many tasks concurrently. Is this possible? Is my problem
> statement
> >>> > > > clear? If
> >>> > > > there's a misconception on my part on the deployment of a spark
> >>> > > > cluster I'd
> >>> > > > like to know it, but as of currently what we have deployed is
> like
> >>> > > > my first
> >>> > > > paragraph.
> >>> > > >
> >>> > > >
> >>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote:
> >>> > > >>
> >>> > > >> A machine should have one worker, and many executors per worker
> >>> > > >> (one per
> >>> > > >> app). An executor runs many tasks. This is how it works for me
> in
> >>> > > >> standalone
> >>> > > >> mode at least!
> >>> > > >>
> >>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote:
> >>> > > >>>
> >>> > > >>> A machine has many workers and a worker has an executor. I want
> >>> > > >>> the
> >>> > > >>> executor to handle many tasks at once, like in local mode.
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote:
> >>> > > >>>>
> >>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks
> per
> >>> > > >>>> machine
> >>> > > >>>> at once.
> >>> > > >>>>
> >>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote:
> >>> > > >>>>>
> >>> > > >>>>> Standalone. I'd love to tell it that my one executor can
> >>> > > >>>>> simultaneously
> >>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of
> >>> > > >>>>> distinct jobs.
> >>> > > >>>>>
> >>> > > >>>>>
> >>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia
> >>> > > >>>>> wrote:
> >>> > > >>>>>>
> >>> > > >>>>>> Yes, executors run one task per core of your machine by
> >>> > > >>>>>> default. You
> >>> > > >>>>>> can also manually launch them with more worker threads than
> >>> > > >>>>>> you have cores.
> >>> > > >>>>>> What cluster manager are you on?
> >>> > > >>>>>>
> >>> > > >>>>>> Matei
> >>> > > >>>>>>
> >>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen
> >>> > > >>>>>> (v...@paxata.com(javascript:;)) wrote:
> >>> > > >>>>>>
> >>> > > >>>>>> I'm thinking of local mode where multiple virtual executors
> >>> > > >>>>>> occupy the
> >>> > > >>>>>> same vm. Can we have the same configuration in spark
> >>> > > >>>>>> standalone cluster
> >>> > > >>>>>> mode?
> >>> > > >>>>>
> >>> > > >>>>>
> >>> > > >>>
> >>> > > >
> >>>
> >>
> >
>

Re: Possible to make one executor be able to work on multiple tasks simultaneously?

Reply via email to