Re: Understanding parallelism in Storm

Matthias J. Sax Mon, 16 May 2016 12:27:02 -0700

Answers inline.

I guess you are not aware, that a worker run other thread next to the
executors, too. For example, there are two threads (one for input; one
for output), that work as "dispatcher" for incoming messages. There is a
global input queue, and the dispatcher "forwards" incoming messages to
the individual tasks queues such that the executors can all work in
parallel. Same for output. Executors write into own output queues and a
single "output thread" reads the data from there and take care of
network transfer to downstream bolts.


-Matthias

On 05/16/2016 06:24 PM, Navin Ipe wrote:
> Err...guys....I appreciate the ongoing discussion, but the original
> question remains unanswered. The one I've asked at the very beginning of
> this conversation. Some help would be appreciated.
> Referring to the code I posted and as per Nathan's answer, you say that
> int *BoltParallelism* actually represents the tasks

No. *BoltParallslim* is the number of executor threads.

which are the number
> of instances of Bolts/Spouts? And BoltTaskParallelism is the number of
> executors (OS threads)?

No. This is the number of tasks.

> If that's the case, then execute() will get called only after the
> previous execute() call of a Bolt has completed. And nextTuple() will
> get called only after the previous nextTuple() of a Spout has completed.

For a single executor, yes.

> That's a bit reassuring, since now one does not have to cater to
> multithreading within a Spout/Bolt.

No. All executors run in parallel.

> 
> 
> On Mon, May 16, 2016 at 7:07 PM, Matthias J. Sax <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi,
> 
> 
>     So this is not correct:
>     > and
>     > the Bolt creates a task for processing each incoming Tuple.
> 
>     Storm create exactly *BoltTaskParallelism* tasks and assigns incoming
>     messages to tasks (according to the used connection pattern -- shuffle,
>     fieldsGrouping etc).
> 
>     Futhermore:
> 
>     > If there
>     > are not enough tasks, then the excess Tuples are made to wait in a
>     > queue of the executor.
> 
>     No. There is no thing as "not enough tasks". Each task has its own input
>     queue/buffer and tuple are stored there.
> 
>     The executor threads process one or multiple tasks. Thus, if a task is
>     currently "on hold", new tuples are just added to the task's input
>     queue. If an executor picks up on of its tasks for processing, the
>     buffered tuples of the task are processed.
> 
> 
>     -Matthias
> 
>     On 05/16/2016 09:07 AM, Adrien Carreira wrote:
>     > +1
>     >
>     > 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected] 
> <mailto:[email protected]>
>     > <mailto:[email protected]
>     <mailto:[email protected]>>>:
>     >
>     >     Hi,
>     >
>     >     I've seen the explanations
>     >   
>      
> <http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/>,
>     >     but none of them explain it in terms of what I see in the code. This
>     >     is what I understood:
>     >
>     >     int BoltParallelism = 3;
>     >     int BoltTaskParallelism = 2;
>     >     builder.setBolt("bolt1", new BoltA(), *BoltParallelism*)
>     >                     .setNumTasks(*BoltTaskParallelism*)
>     >
>     >     BoltParallelism creates 3 instances of BoltA. These are the
>     executors.
>     >     BoltTaskParallelism allows Tuples to come into BoltA very
>     fast, and
>     >     the Bolt creates a task for processing each incoming Tuple. If
>     there
>     >     are not enough tasks, then the excess Tuples are made to wait in a
>     >     queue of the executor.
>     >
>     >     Strange thing is that the explanation says the tasks are run in a
>     >     single thread, so obviously I misunderstood something. Could you
>     >     help me understand it?
>     >
>     >     --
>     >     Regards,
>     >     Navin
>     >
>     >
> 
> 
> 
> 
> -- 
> Regards,
> Navin

signature.asc
Description: OpenPGP digital signature

Re: Understanding parallelism in Storm

Reply via email to