Hello again, I have a follow-up question in order to understand better the abstraction of a "stream" in Storm and what happens in a cyclic data-flow like the one I have described.
Every time a tuple is sent back to bolt-1 from bolt-2, it is placed in the output queue of the task that runs bolt-2's code (let's assume that it is task-x). Similarly, another task-y executes bolt-1's code. Let us consider the two different cases: (i) either tasks x & y are threads running on the same node, or (ii) they are threads running on different machines. In the former case, the tuple produced by task-x is going to be placed in a Disruptor queue, and in the latter case it will be placed in a Netty queue. My question is the following: Will the tuple be directly sent to task-y or will it have to start from the beginning of the stream? To be more precise, in the case that the two tasks are on different machines and the tuple is placed in a Netty queue, will the tuple be sent directly to the Netty input queue of the machine that task-y runs on? Or will the tuple have to do some intermediate hops? The same question applies for the case where task-x and task-y operate on the same machine. Thank you, Nick On Thu, Nov 5, 2015 at 10:06 AM, Nick R. Katsipoulakis < [email protected]> wrote: > Hello all, > > Thank you very much for your replies. I actually uses Matthias' suggestion > and it worked. > > Thanks, > Nick > > On Thu, Nov 5, 2015 at 4:47 AM, Matthias J. Sax <[email protected]> wrote: > >> You need to specify a cyclic dataflow: >> >> > builder.setSpout("spout", ...); >> > builder.setBolt("bolt1", >> ...).directGrouping("spout").directGrouping("bolt1"); >> > builder.setBolt("bolt2, ...).directGropuing("bolt1"); >> >> You can use the default stream. >> >> -Matthias >> >> On 11/04/2015 09:06 PM, Nathan Leung wrote: >> > You would need: >> > >> > Spout-1 --(direct-grouping)--> Bolt-1 --(direct-grouping)--> Bolt-2 >> > --(direct grouping, non-default stream)--> Bolt 1 >> > >> > Task IDs are numbered from 0 (pretty sure it's 0, if not it's from 1) >> > for each component. Therefore Spout 1 task IDs are 0 to (n-1), Bolt 1 >> > task IDs are 0 to (m-1), etc where n = number of Spout 1 tasks and m = >> > number of Bolt 1 tasks. >> > >> > If you want a cyclic graph in your topology I believe you have to use a >> > non-default stream. >> > >> > On Wed, Nov 4, 2015 at 2:22 PM, Nick R. Katsipoulakis >> > <[email protected] <mailto:[email protected]>> wrote: >> > >> > Hello, >> > >> > I have a question regarding direct streaming and sending a tuple >> > from a downstream node to its upstream node. To be more precise, let >> > us assume we have the following topology: >> > >> > Spout-1 --(direct-grouping)--> Bolt-1 --(direct-grouping)--> Bolt-2 >> > >> > Can Bolt-2 call emitDirect() and send a tuple back to Bolt-1 (by >> > getting the task-id from tuple.getSourceTask() )? If not, is it >> > because of the general architecture of Storm? >> > >> > Thanks, >> > Nick >> > >> > >> >> > > > -- > Nick R. Katsipoulakis, > Department of Computer Science > University of Pittsburgh > -- Nick R. Katsipoulakis, Department of Computer Science University of Pittsburgh
