Re: Tasks are not starting

Matthias J. Sax Thu, 03 Sep 2015 01:45:02 -0700

I am currently working with version 0.11.0-SNAPSHOT and cannot observe
the behavior you describe. If I submit a sample topology with 1 spout
(dop=1) and 1 bolt (dop=10) connected via shuffle grouping and have 12
supervisor available (each with 12 worker slots), each of the 11
executors is running on a single worker of a single supervisor (host).


I am not idea why you observe a different behavior...

-Matthias

On 09/03/2015 12:20 AM, Nick R. Katsipoulakis wrote:
> When I say co-locate, what I have seen in my experiments is the following:
> 
> If the executor's number can be served by workers on one node, the
> scheduler spawns all the executors in the workers of one node. I have
> also seen that behavior in that the default scheduler tries to fill up
> one node before provisioning an additional one for the topology.
> 
> Going back to your following sentence "and the executors should be
> evenly distributed over all available workers." I have to say that I do
> not see that often in my experiments. Actually, I often come across with
> workers handling 2 - 3 executors/tasks, and other doing nothing. Am I
> missing something? Is it just a coincidence that happened in my experiments?
> 
> Thank you,
> Nick
> 
> 
> 
> 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <[email protected]
> <mailto:[email protected]>>:
> 
>     I agree. The load is not high.
> 
>     About higher latencies. How many ackers did you configure? As a rule of
>     thumb there should be one acker per executor. If you have less ackers,
>     and an increasing number of executors, this might cause the increased
>     latency as the ackers could become a bottleneck.
> 
>     What do you mean by "trying to co-locate tasks and executors as much as
>     possible"? Tasks a logical units of works that are processed by
>     executors (which are threads). Furthermore (as far as I know), the
>     default scheduler does a evenly distributed assignment for tasks and
>     executor to the available workers. In you case, as you set the number of
>     task equal to the number of executors, each executors processes a single
>     task, and the executors should be evenly distributed over all available
>     workers.
> 
>     However, you are right: intra-worker channels are "cheaper" than
>     inter-worker channels. In order to exploit this, you should use
>     shuffle-or-local grouping instead of shuffle. The disadvantage of
>     shuffle-or-local might be missing load-balancing. Shuffle always ensures
>     good load balancing.
> 
> 
>     -Matthias
> 
> 
> 
>     On 09/02/2015 10:31 PM, Nick R. Katsipoulakis wrote:
>     > Well, my input load is 4 streams at 4000 tuples per second, and each
>     > tuple is about 128 bytes long. Therefore, I do not think my load is too
>     > much for my hardware.
>     >
>     > No, I am running only this topology in my cluster.
>     >
>     > For some reason, when I set the task to executor ratio to 1, my topology
>     > does not hang at all. The strange thing now is that I see higher latency
>     > with more executors and I am trying to figure this out. Also, I see that
>     > the default scheduler is trying to co-locate tasks and executors as much
>     > as possible. Is this true? If yes, is it because the intra-worker
>     > latencies are much lower than the inter-worker latencies?
>     >
>     > Thanks,
>     > Nick
>     >
>     > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <[email protected] 
> <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>>:
>     >
>     >     So (for each node) you have 4 cores available for 1 supervisor JVM, 
> 2
>     >     worker JVMs that execute up to 5 thread each (if 40 executors are
>     >     distributed evenly over all workers. Thus, about 12 threads for 4 
> cores.
>     >     Or course, Storm starts a few more threads within each
>     >     worker/supervisor.
>     >
>     >     If your load is not huge, this might be sufficient. However, having 
> high
>     >     data rate, it might be problematic.
>     >
>     >     One more question: do you run a single topology in your cluster or
>     >     multiple? Storm isolates topologies for fault-tolerance reasons. 
> Thus, a
>     >     single worker cannot process executors from different topologies. 
> If you
>     >     run out of workers, a topology might not start up completely.
>     >
>     >     -Matthias
>     >
>     >
>     >
>     >     On 09/02/2015 09:54 PM, Nick R. Katsipoulakis wrote:
>     >     > Hello Matthias and thank you for your reply. See my answers below:
>     >     >
>     >     > - I have a 4 supervisor nodes in my AWS cluster of m4.xlarge 
> instances
>     >     > (4 cores per node). On top of that I have 3 more nodes for 
> zookeeper and
>     >     > nimbus.
>     >     > - 2 worker nodes per supervisor node
>     >     > - The task number for each bolt ranges from 1 to 4 and I use 1:1 
> task to
>     >     > executor assignment.
>     >     > - The number of executors in total for the topology ranges from 
> 14 to 41
>     >     >
>     >     > Thanks,
>     >     > Nick
>     >     >
>     >     > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <[email protected] 
> <mailto:[email protected]> <mailto:[email protected]
>     <mailto:[email protected]>>
>     >     > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>>:
>     >     >
>     >     >     Without any exception/error message it is hard to tell.
>     >     >
>     >     >     What is your cluster setup
>     >     >       - Hardware, ie, number of cores per node?
>     >     >       - How many node/supervisor are available?
>     >     >       - Configured number of workers for the topology?
>     >     >       - What is the number of task for each spout/bolt?
>     >     >       - What is the number of executors for each spout/bolt?
>     >     >
>     >     >     -Matthias
>     >     >
>     >     >     On 09/02/2015 08:02 PM, Nick R. Katsipoulakis wrote:
>     >     >     > Hello all,
>     >     >     >
>     >     >     > I am working on a project in which I submit a topology
>     to my
>     >     Storm
>     >     >     > cluster, but for some reason, some of my tasks do not
>     start
>     >     executing.
>     >     >     >
>     >     >     > I can see that the above is happening because every
>     bolt I have
>     >     >     needs to
>     >     >     > connect to an external server and do a registration to a
>     >     service.
>     >     >     > However, some of the bolts do not seem to connect.
>     >     >     >
>     >     >     > I have to say that the number of tasks I have is
>     larger than the
>     >     >     number
>     >     >     > of workers of my cluster. Also, I check my worker log
>     files,
>     >     and I see
>     >     >     > that the workers that do not register, are also not
>     writing some
>     >     >     > initialization messages I have them print in the
>     beginning.
>     >     >     >
>     >     >     > Any idea why this is happening? Can it be because my
>     >     resources are not
>     >     >     > enough to start off all of the tasks?
>     >     >     >
>     >     >     > Thank you,
>     >     >     > Nick
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > --
>     >     > Nikolaos Romanos Katsipoulakis,
>     >     > University of Pittsburgh, PhD candidate
>     >
>     >
>     >
>     >
>     > --
>     > Nikolaos Romanos Katsipoulakis,
>     > University of Pittsburgh, PhD candidate
> 
> 
> 
> 
> -- 
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate

signature.asc
Description: OpenPGP digital signature

Re: Tasks are not starting

Reply via email to