You can get the number of bolt instances from TopologyContext that is provided in Bolt.prepare()
Furthermore, you could put a loop into your topology, ie, a bolt reads
it's own output; if you broadcast (ie, allGrouping) this
feedback-loop-stream you can let bolt instances talk to each other.
builder.setBolt("DbBolt", new MyDBBolt())
.shuffleGrouping("spout")
.allGrouping("flush-stream", "DbBolt");
where "flush-stream" is a second output stream of MyDBBolt() sending a
notification tuple after it received the end-of-stream from spout;
furthermore, if a bolt received the signal via "flush-stream" from
**all** parallel bolt instances, it can flush to DB.
Or something like this... Be creative! :)
-Matthias
On 05/08/2016 02:26 PM, Navin Ipe wrote:
> @Matthias: I agree about the batch processor, but my superior took the
> decision to use Storm, and he visualizes more complexity later for which
> he needs Storm.
> I had considered the "end of stream" tuple earlier (my idea was to emit
> 10 consecutive nulls), but then the question was how do I know how many
> bolt instances have been created, and how do I notify all the bolts?
> Because it's only after the last bolt finishes writing to DB, that I
> have to shut down the topology.
>
> @Jason: Thanks. I had seen storm signals earlier (I think from one of
> your replies to someone else) and I had a look at the code too, but am a
> bit wary because it's no longer being maintained and because of the
> issues: https://github.com/ptgoetz/storm-signals/issues
>
> On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <[email protected]
> <mailto:[email protected]>> wrote:
>
> You might want to check out Storm Signals.
> https://github.com/ptgoetz/storm-signals
>
> It might give you what you're looking for.
>
>
> On Sat, May 7, 2016, 11:59 AM Matthias J. Sax <[email protected]
> <mailto:[email protected]>> wrote:
>
> As you mentioned already: Storm is designed to run topologies
> forever ;)
> If you have finite data, why do you not use a batch processor???
>
> As a workaround, you can embed "control messages" in your stream
> (or use
> an additional stream for them).
>
> If you want a topology to shut down itself, you could use
>
> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
> in your spout/bolt code.
>
> Something like:
> - Spout emit all tuples
> - Spout emit special "end of stream" control tuple
> - Bolt1 processes everything
> - Bolt1 forward "end of stream" control tuple (when it received it)
> - Bolt2 processed everything
> - Bolt2 receives "end of stream" control tuple => flush to DB
> => kill
> topology
>
> But I guess, this is kinda weird pattern.
>
> -Matthias
>
> On 05/05/2016 06:13 AM, Navin Ipe wrote:
> > Hi,
> >
> > I know Storm is designed to run forever. I also know about
> Trident's
> > technique of aggregation. But shouldn't Storm have a way to
> let bolts
> > know that a certain bunch of processing has been completed?
> >
> > Consider this topology:
> > Spout------>Bolt-A------>Bolt-B
> > | |--->Bolt-B
> > | \--->Bolt-B
> > |--->Bolt-A------>Bolt-B
> > | |--->Bolt-B
> > | \--->Bolt-B
> > \--->Bolt-A------>Bolt-B
> > |--->Bolt-B
> > \--->Bolt-B
> >
> > * From Bolt-A to Bolt-B, it is a FieldsGrouping.
> > * Spout emits only a few tuples and then stops emitting.
> > * Bolt A takes those tuples and generates millions of tuples.
> >
> >
> > *Bolt-B accumulates tuples that Bolt A sends and needs to know
> when
> > Spout finished emitting. Only then can Bolt-B start writing to
> SQL.*
> >
> > *Questions:*
> > 1. How can all Bolts B be notified that it is time to write to
> SQL?
> > 2. After all Bolts B have written to SQL, how to know that all
> Bolts B
> > have completed writing?
> > 3. How to stop the topology? I know of
> > localCluster.killTopology("HelloStorm"), but shouldn't there
> be a way to
> > do it from the Bolt?
> >
> > --
> > Regards,
> > Navin
>
>
>
>
> --
> Regards,
> Navin
signature.asc
Description: OpenPGP digital signature
