I'm pretty sure the issue was an interaction with another subsystem. Thanks for your patience with me!
On Tue, Sep 2, 2014 at 10:05 AM, Sean Owen <so...@cloudera.com> wrote: > +user@ > > An executor is specific to an application, but an application can be > executing many jobs at once. So as I understand many jobs' tasks can > be executing at once on an executor. > > You may not use your full 80-way parallelism if, for example, your > data set doesn't have 80 partitions. I also believe Spark will not > necessarily spread the load over executors, instead preferring to > respect data and rack locality if possible. Those are two reasons you > might see only 4 executors active. If you mean only 4 executors exist > at all, is it possible the other 4 can't provide the memory you're > asking for? > > > On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <v...@paxata.com> > wrote: > > Actually one more question, since in preliminary runs I wasn't sure if I > > understood what's going on. Are the cores allocated to an executor able > to > > execute tasks for different jobs simultaneously, or just for one job at a > > time? I have 10 workers with 8 cores each, and it appeared that one job > got > > four executors at once, then four more later on. The system wasn't > anywhere > > near saturation of 80 cores so I would've expected all 8 cores to be > running > > simultaneously. > > > > If there's value to these questions, please reply back to the list. > > > > > > On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <v...@paxata.com> > wrote: > >> > >> Thank you for the help, guys. So as I expected, I didn't fully > understand > >> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize > >> that by setting to > 1 it would mean an executor could operate on > multiple > >> tasks simultaneously. I just thought it was a hint to Spark that that > >> executor could be expected to use that many threads, but otherwise I > had not > >> understood that it affected the scheduler that way. Thanks! > >> > >> > >> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <matei.zaha...@gmail.com > > > >> wrote: > >>> > >>> > >>> Hey Victor, > >>> > >>> As Sean said, executors actually execute multiple tasks at a time. The > >>> only reasons they wouldn't are either (1) if you launched an executor > with > >>> just 1 core (you can configure how many cores the executors will use > when > >>> you set up your Worker, or it will look at your system by default) or > (2) if > >>> your tasks are acquiring some kind of global lock, so only one can run > at a > >>> time. > >>> > >>> To test this, do the following: > >>> - Launch your standalone cluster (you can do it on just one machine by > >>> adding just "localhost" in the slaves file) > >>> - Go to http://:4040 and look at the worker list. Do you see workers > with > >>> more than 1 core? If not, you need to launch the workers by hand or set > >>> SPARK_WORKER_CORES in conf/spark-env.sh. > >>> - Run your application. Make sure it has enough pending tasks for your > >>> cores in the driver web UI (http://:4040), and if so, jstack one of > the > >>> CoarseGrainedExecutor processes on a worker to see what the threads are > >>> doing. (Look for threads that contain TaskRunner.run in them) > >>> > >>> You can also try a simple CPU-bound job that launches lots of tasks > like > >>> this to see that all cores are being used: > >>> > >>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to > >>> 2000000000).product).count() > >>> > >>> Each task here takes 1-2 seconds to execute and there are 1000 of them > so > >>> it should fill up your cluster. > >>> > >>> Matei > >>> > >>> > >>> > >>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen > >>> (v...@paxata.com(mailto:v...@paxata.com)) wrote: > >>> > >>> > I'm pretty sure my terminology matches that doc except the doc makes > no > >>> > explicit mention of machines. In standalone mode, you can spawn > multiple > >>> > workers on a single machine and each will babysit one executor (per > >>> > application). In my observation as well each executor can be > assigned many > >>> > tasks but operates on one at a time. If there's a way to have it > execute in > >>> > multiple tasks simultaneously in a single VM can you please show me > how? > >>> > Maybe I'm missing the requisite configuration options, no matter how > common > >>> > or trivial... > >>> > > >>> > On Sunday, August 31, 2014, Sean Owen wrote: > >>> > > The confusion may be your use of 'worker', which isn't matching > what > >>> > > 'worker' means in Spark. Have a look at > >>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of > course > >>> > > one VM can run many tasks at once; that's already how Spark works. > >>> > > > >>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote: > >>> > > > I might not be making myself clear, so sorry about that. I > >>> > > > understand that a > >>> > > > machine can have as many spark workers as you'd like, for example > >>> > > > one per > >>> > > > core. A worker may be assigned to a pool for one or more > >>> > > > applications, but > >>> > > > for a single application let's just say a single worker will have > >>> > > > at most a > >>> > > > single executor. An executor can be assigned multiple tasks in > its > >>> > > > queue, > >>> > > > but will work on one task at a time only. > >>> > > > > >>> > > > In local mode, you can specify the number of executors you want > and > >>> > > > they > >>> > > > will all reside in the same vm. Those executors will each be able > >>> > > > to operate > >>> > > > on a single task at a time, though they may also have an > arbitrary > >>> > > > number of > >>> > > > tasks in their queue. From the standpoint of a vm, however, a vm > >>> > > > can > >>> > > > therefore operate on multiple tasks simultaneously in local mode. > >>> > > > > >>> > > > What I want is something similar in standalone mode (or mesos or > >>> > > > YARN if > >>> > > > that's the only way to do it) whereby I can have a single > executor > >>> > > > vm handle > >>> > > > many tasks concurrently. Is this possible? Is my problem > statement > >>> > > > clear? If > >>> > > > there's a misconception on my part on the deployment of a spark > >>> > > > cluster I'd > >>> > > > like to know it, but as of currently what we have deployed is > like > >>> > > > my first > >>> > > > paragraph. > >>> > > > > >>> > > > > >>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote: > >>> > > >> > >>> > > >> A machine should have one worker, and many executors per worker > >>> > > >> (one per > >>> > > >> app). An executor runs many tasks. This is how it works for me > in > >>> > > >> standalone > >>> > > >> mode at least! > >>> > > >> > >>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote: > >>> > > >>> > >>> > > >>> A machine has many workers and a worker has an executor. I want > >>> > > >>> the > >>> > > >>> executor to handle many tasks at once, like in local mode. > >>> > > >>> > >>> > > >>> > >>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote: > >>> > > >>>> > >>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks > per > >>> > > >>>> machine > >>> > > >>>> at once. > >>> > > >>>> > >>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote: > >>> > > >>>>> > >>> > > >>>>> Standalone. I'd love to tell it that my one executor can > >>> > > >>>>> simultaneously > >>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of > >>> > > >>>>> distinct jobs. > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia > >>> > > >>>>> wrote: > >>> > > >>>>>> > >>> > > >>>>>> Yes, executors run one task per core of your machine by > >>> > > >>>>>> default. You > >>> > > >>>>>> can also manually launch them with more worker threads than > >>> > > >>>>>> you have cores. > >>> > > >>>>>> What cluster manager are you on? > >>> > > >>>>>> > >>> > > >>>>>> Matei > >>> > > >>>>>> > >>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen > >>> > > >>>>>> (v...@paxata.com(javascript:;)) wrote: > >>> > > >>>>>> > >>> > > >>>>>> I'm thinking of local mode where multiple virtual executors > >>> > > >>>>>> occupy the > >>> > > >>>>>> same vm. Can we have the same configuration in spark > >>> > > >>>>>> standalone cluster > >>> > > >>>>>> mode? > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>> > >>> > > > > >>> > >> > > >