When you say that tasks do not start, do you mean that worker process itself is not starting?
On Thu, Sep 3, 2015 at 5:20 PM, John Yost <[email protected]> wrote: > Hi Nick, > > What do the nimbus and supervisor logs say? One or both may contain clues > as to why your workers are not starting up. > > --John > > On Thu, Sep 3, 2015 at 4:44 AM, Matthias J. Sax <[email protected]> wrote: > >> I am currently working with version 0.11.0-SNAPSHOT and cannot observe >> the behavior you describe. If I submit a sample topology with 1 spout >> (dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12 >> supervisor available (each with 12 worker slots), each of the 11 >> executors is running on a single worker of a single supervisor (host). >> >> I am not idea why you observe a different behavior... >> >> -Matthias >> >> On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote: >> > When I say co-locate, what I have seen in my experiments is the >> following: >> > >> > If the executor's number can be served by workers on one node, the >> > scheduler spawns all the executors in the workers of one node. I have >> > also seen that behavior in that the default scheduler tries to fill up >> > one node before provisioning an additional one for the topology. >> > >> > Going back to your following sentence "and the executors should be >> > evenly distributed over all available workers." I have to say that I do >> > not see that often in my experiments. Actually, I often come across with >> > workers handling 2 - 3 executors/tasks, and other doing nothing. Am I >> > missing something? Is it just a coincidence that happened in my >> experiments? >> > >> > Thank you, >> > Nick >> > >> > >> > >> > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <[email protected] >> > <mailto:[email protected]>>: >> > >> > I agree. The load is not high. >> > >> > About higher latencies. How many ackers did you configure? As a >> rule of >> > thumb there should be one acker per executor. If you have less >> ackers, >> > and an increasing number of executors, this might cause the >> increased >> > latency as the ackers could become a bottleneck. >> > >> > What do you mean by "trying to co-locate tasks and executors as >> much as >> > possible"? Tasks a logical units of works that are processed by >> > executors (which are threads). Furthermore (as far as I know), the >> > default scheduler does a evenly distributed assignment for tasks and >> > executor to the available workers. In you case, as you set the >> number of >> > task equal to the number of executors, each executors processes a >> single >> > task, and the executors should be evenly distributed over all >> available >> > workers. >> > >> > However, you are right: intra-worker channels are "cheaper" than >> > inter-worker channels. In order to exploit this, you should use >> > shuffle-or-local grouping instead of shuffle. The disadvantage of >> > shuffle-or-local might be missing load-balancing. Shuffle always >> ensures >> > good load balancing. >> > >> > >> > -Matthias >> > >> > >> > >> > On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote: >> > > Well, my input load is 4 streams at 4000 tuples per second, and >> each >> > > tuple is about 128 bytes long. Therefore, I do not think my load >> is too >> > > much for my hardware. >> > > >> > > No, I am running only this topology in my cluster. >> > > >> > > For some reason, when I set the task to executor ratio to 1, my >> topology >> > > does not hang at all. The strange thing now is that I see higher >> latency >> > > with more executors and I am trying to figure this out. Also, I >> see that >> > > the default scheduler is trying to co-locate tasks and executors >> as much >> > > as possible. Is this true? If yes, is it because the intra-worker >> > > latencies are much lower than the inter-worker latencies? >> > > >> > > Thanks, >> > > Nick >> > > >> > > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <[email protected] >> <mailto:[email protected]> >> > > <mailto:[email protected] <mailto:[email protected]>>>: >> > > >> > > So (for each node) you have 4 cores available for 1 >> supervisor JVM, 2 >> > > worker JVMs that execute up to 5 thread each (if 40 executors >> are >> > > distributed evenly over all workers. Thus, about 12 threads >> for 4 cores. >> > > Or course, Storm starts a few more threads within each >> > > worker/supervisor. >> > > >> > > If your load is not huge, this might be sufficient. However, >> having high >> > > data rate, it might be problematic. >> > > >> > > One more question: do you run a single topology in your >> cluster or >> > > multiple? Storm isolates topologies for fault-tolerance >> reasons. Thus, a >> > > single worker cannot process executors from different >> topologies. If you >> > > run out of workers, a topology might not start up completely. >> > > >> > > -Matthias >> > > >> > > >> > > >> > > On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote: >> > > > Hello Matthias and thank you for your reply. See my answers >> below: >> > > > >> > > > - I have a 4 supervisor nodes in my AWS cluster of >> m4.xlarge instances >> > > > (4 cores per node). On top of that I have 3 more nodes for >> zookeeper and >> > > > nimbus. >> > > > - 2 worker nodes per supervisor node >> > > > - The task number for each bolt ranges from 1 to 4 and I >> use 1:1 task to >> > > > executor assignment. >> > > > - The number of executors in total for the topology ranges >> from 14 to 41 >> > > > >> > > > Thanks, >> > > > Nick >> > > > >> > > > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax < >> [email protected] <mailto:[email protected]> <mailto:[email protected] >> > <mailto:[email protected]>> >> > > > <mailto:[email protected] <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>>>>: >> > > > >> > > > Without any exception/error message it is hard to tell. >> > > > >> > > > What is your cluster setup >> > > > - Hardware, ie, number of cores per node? >> > > > - How many node/supervisor are available? >> > > > - Configured number of workers for the topology? >> > > > - What is the number of task for each spout/bolt? >> > > > - What is the number of executors for each spout/bolt? >> > > > >> > > > -Matthias >> > > > >> > > > On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote: >> > > > > Hello all, >> > > > > >> > > > > I am working on a project in which I submit a topology >> > to my >> > > Storm >> > > > > cluster, but for some reason, some of my tasks do not >> > start >> > > executing. >> > > > > >> > > > > I can see that the above is happening because every >> > bolt I have >> > > > needs to >> > > > > connect to an external server and do a registration >> to a >> > > service. >> > > > > However, some of the bolts do not seem to connect. >> > > > > >> > > > > I have to say that the number of tasks I have is >> > larger than the >> > > > number >> > > > > of workers of my cluster. Also, I check my worker log >> > files, >> > > and I see >> > > > > that the workers that do not register, are also not >> > writing some >> > > > > initialization messages I have them print in the >> > beginning. >> > > > > >> > > > > Any idea why this is happening? Can it be because my >> > > resources are not >> > > > > enough to start off all of the tasks? >> > > > > >> > > > > Thank you, >> > > > > Nick >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Nikolaos Romanos Katsipoulakis, >> > > > University of Pittsburgh, PhD candidate >> > > >> > > >> > > >> > > >> > > -- >> > > Nikolaos Romanos Katsipoulakis, >> > > University of Pittsburgh, PhD candidate >> > >> > >> > >> > >> > -- >> > Nikolaos Romanos Katsipoulakis, >> > University of Pittsburgh, PhD candidate >> >> > -- Regards, Abhishek Agarwal
