+user@ An executor is specific to an application, but an application can be executing many jobs at once. So as I understand many jobs' tasks can be executing at once on an executor.
You may not use your full 80-way parallelism if, for example, your data set doesn't have 80 partitions. I also believe Spark will not necessarily spread the load over executors, instead preferring to respect data and rack locality if possible. Those are two reasons you might see only 4 executors active. If you mean only 4 executors exist at all, is it possible the other 4 can't provide the memory you're asking for? On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <v...@paxata.com> wrote: > Actually one more question, since in preliminary runs I wasn't sure if I > understood what's going on. Are the cores allocated to an executor able to > execute tasks for different jobs simultaneously, or just for one job at a > time? I have 10 workers with 8 cores each, and it appeared that one job got > four executors at once, then four more later on. The system wasn't anywhere > near saturation of 80 cores so I would've expected all 8 cores to be running > simultaneously. > > If there's value to these questions, please reply back to the list. > > > On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <v...@paxata.com> wrote: >> >> Thank you for the help, guys. So as I expected, I didn't fully understand >> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize >> that by setting to > 1 it would mean an executor could operate on multiple >> tasks simultaneously. I just thought it was a hint to Spark that that >> executor could be expected to use that many threads, but otherwise I had not >> understood that it affected the scheduler that way. Thanks! >> >> >> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >>> >>> >>> Hey Victor, >>> >>> As Sean said, executors actually execute multiple tasks at a time. The >>> only reasons they wouldn't are either (1) if you launched an executor with >>> just 1 core (you can configure how many cores the executors will use when >>> you set up your Worker, or it will look at your system by default) or (2) if >>> your tasks are acquiring some kind of global lock, so only one can run at a >>> time. >>> >>> To test this, do the following: >>> - Launch your standalone cluster (you can do it on just one machine by >>> adding just "localhost" in the slaves file) >>> - Go to http://:4040 and look at the worker list. Do you see workers with >>> more than 1 core? If not, you need to launch the workers by hand or set >>> SPARK_WORKER_CORES in conf/spark-env.sh. >>> - Run your application. Make sure it has enough pending tasks for your >>> cores in the driver web UI (http://:4040), and if so, jstack one of the >>> CoarseGrainedExecutor processes on a worker to see what the threads are >>> doing. (Look for threads that contain TaskRunner.run in them) >>> >>> You can also try a simple CPU-bound job that launches lots of tasks like >>> this to see that all cores are being used: >>> >>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to >>> 2000000000).product).count() >>> >>> Each task here takes 1-2 seconds to execute and there are 1000 of them so >>> it should fill up your cluster. >>> >>> Matei >>> >>> >>> >>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen >>> (v...@paxata.com(mailto:v...@paxata.com)) wrote: >>> >>> > I'm pretty sure my terminology matches that doc except the doc makes no >>> > explicit mention of machines. In standalone mode, you can spawn multiple >>> > workers on a single machine and each will babysit one executor (per >>> > application). In my observation as well each executor can be assigned many >>> > tasks but operates on one at a time. If there's a way to have it execute >>> > in >>> > multiple tasks simultaneously in a single VM can you please show me how? >>> > Maybe I'm missing the requisite configuration options, no matter how >>> > common >>> > or trivial... >>> > >>> > On Sunday, August 31, 2014, Sean Owen wrote: >>> > > The confusion may be your use of 'worker', which isn't matching what >>> > > 'worker' means in Spark. Have a look at >>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of course >>> > > one VM can run many tasks at once; that's already how Spark works. >>> > > >>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote: >>> > > > I might not be making myself clear, so sorry about that. I >>> > > > understand that a >>> > > > machine can have as many spark workers as you'd like, for example >>> > > > one per >>> > > > core. A worker may be assigned to a pool for one or more >>> > > > applications, but >>> > > > for a single application let's just say a single worker will have >>> > > > at most a >>> > > > single executor. An executor can be assigned multiple tasks in its >>> > > > queue, >>> > > > but will work on one task at a time only. >>> > > > >>> > > > In local mode, you can specify the number of executors you want and >>> > > > they >>> > > > will all reside in the same vm. Those executors will each be able >>> > > > to operate >>> > > > on a single task at a time, though they may also have an arbitrary >>> > > > number of >>> > > > tasks in their queue. From the standpoint of a vm, however, a vm >>> > > > can >>> > > > therefore operate on multiple tasks simultaneously in local mode. >>> > > > >>> > > > What I want is something similar in standalone mode (or mesos or >>> > > > YARN if >>> > > > that's the only way to do it) whereby I can have a single executor >>> > > > vm handle >>> > > > many tasks concurrently. Is this possible? Is my problem statement >>> > > > clear? If >>> > > > there's a misconception on my part on the deployment of a spark >>> > > > cluster I'd >>> > > > like to know it, but as of currently what we have deployed is like >>> > > > my first >>> > > > paragraph. >>> > > > >>> > > > >>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote: >>> > > >> >>> > > >> A machine should have one worker, and many executors per worker >>> > > >> (one per >>> > > >> app). An executor runs many tasks. This is how it works for me in >>> > > >> standalone >>> > > >> mode at least! >>> > > >> >>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote: >>> > > >>> >>> > > >>> A machine has many workers and a worker has an executor. I want >>> > > >>> the >>> > > >>> executor to handle many tasks at once, like in local mode. >>> > > >>> >>> > > >>> >>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote: >>> > > >>>> >>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks per >>> > > >>>> machine >>> > > >>>> at once. >>> > > >>>> >>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote: >>> > > >>>>> >>> > > >>>>> Standalone. I'd love to tell it that my one executor can >>> > > >>>>> simultaneously >>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of >>> > > >>>>> distinct jobs. >>> > > >>>>> >>> > > >>>>> >>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia >>> > > >>>>> wrote: >>> > > >>>>>> >>> > > >>>>>> Yes, executors run one task per core of your machine by >>> > > >>>>>> default. You >>> > > >>>>>> can also manually launch them with more worker threads than >>> > > >>>>>> you have cores. >>> > > >>>>>> What cluster manager are you on? >>> > > >>>>>> >>> > > >>>>>> Matei >>> > > >>>>>> >>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen >>> > > >>>>>> (v...@paxata.com(javascript:;)) wrote: >>> > > >>>>>> >>> > > >>>>>> I'm thinking of local mode where multiple virtual executors >>> > > >>>>>> occupy the >>> > > >>>>>> same vm. Can we have the same configuration in spark >>> > > >>>>>> standalone cluster >>> > > >>>>>> mode? >>> > > >>>>> >>> > > >>>>> >>> > > >>> >>> > > > >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org