I agree. The load is not high. About higher latencies. How many ackers did you configure? As a rule of thumb there should be one acker per executor. If you have less ackers, and an increasing number of executors, this might cause the increased latency as the ackers could become a bottleneck.
What do you mean by "trying to co-locate tasks and executors as much as possible"? Tasks a logical units of works that are processed by executors (which are threads). Furthermore (as far as I know), the default scheduler does a evenly distributed assignment for tasks and executor to the available workers. In you case, as you set the number of task equal to the number of executors, each executors processes a single task, and the executors should be evenly distributed over all available workers. However, you are right: intra-worker channels are "cheaper" than inter-worker channels. In order to exploit this, you should use shuffle-or-local grouping instead of shuffle. The disadvantage of shuffle-or-local might be missing load-balancing. Shuffle always ensures good load balancing. -Matthias On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote: > Well, my input load is 4 streams at 4000 tuples per second, and each > tuple is about 128 bytes long. Therefore, I do not think my load is too > much for my hardware. > > No, I am running only this topology in my cluster. > > For some reason, when I set the task to executor ratio to 1, my topology > does not hang at all. The strange thing now is that I see higher latency > with more executors and I am trying to figure this out. Also, I see that > the default scheduler is trying to co-locate tasks and executors as much > as possible. Is this true? If yes, is it because the intra-worker > latencies are much lower than the inter-worker latencies? > > Thanks, > Nick > > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <[email protected] > <mailto:[email protected]>>: > > So (for each node) you have 4 cores available for 1 supervisor JVM, 2 > worker JVMs that execute up to 5 thread each (if 40 executors are > distributed evenly over all workers. Thus, about 12 threads for 4 cores. > Or course, Storm starts a few more threads within each > worker/supervisor. > > If your load is not huge, this might be sufficient. However, having high > data rate, it might be problematic. > > One more question: do you run a single topology in your cluster or > multiple? Storm isolates topologies for fault-tolerance reasons. Thus, a > single worker cannot process executors from different topologies. If you > run out of workers, a topology might not start up completely. > > -Matthias > > > > On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote: > > Hello Matthias and thank you for your reply. See my answers below: > > > > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge instances > > (4 cores per node). On top of that I have 3 more nodes for zookeeper and > > nimbus. > > - 2 worker nodes per supervisor node > > - The task number for each bolt ranges from 1 to 4 and I use 1:1 task to > > executor assignment. > > - The number of executors in total for the topology ranges from 14 to 41 > > > > Thanks, > > Nick > > > > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>: > > > > Without any exception/error message it is hard to tell. > > > > What is your cluster setup > > - Hardware, ie, number of cores per node? > > - How many node/supervisor are available? > > - Configured number of workers for the topology? > > - What is the number of task for each spout/bolt? > > - What is the number of executors for each spout/bolt? > > > > -Matthias > > > > On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote: > > > Hello all, > > > > > > I am working on a project in which I submit a topology to my > Storm > > > cluster, but for some reason, some of my tasks do not start > executing. > > > > > > I can see that the above is happening because every bolt I have > > needs to > > > connect to an external server and do a registration to a > service. > > > However, some of the bolts do not seem to connect. > > > > > > I have to say that the number of tasks I have is larger than the > > number > > > of workers of my cluster. Also, I check my worker log files, > and I see > > > that the workers that do not register, are also not writing some > > > initialization messages I have them print in the beginning. > > > > > > Any idea why this is happening? Can it be because my > resources are not > > > enough to start off all of the tasks? > > > > > > Thank you, > > > Nick > > > > > > > > > > -- > > Nikolaos Romanos Katsipoulakis, > > University of Pittsburgh, PhD candidate > > > > > -- > Nikolaos Romanos Katsipoulakis, > University of Pittsburgh, PhD candidate
signature.asc
Description: OpenPGP digital signature
