-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 "Software Dev",
in RollingCountBolt there are two *time* related settings: 1. The size (duration) of the sliding window itself. In seconds. 2. The time interval at which the latest sliding window count is sent to downstream bolts. In seconds. See details here: https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/bolt/RollingCountBolt.java I'm quoting from the code above: "The bolt is configured by two parameters, the length of the sliding window in seconds (which influences the output data of the bolt, i.e. how it will count objects) and the emit frequency in seconds (which influences how often the bolt will output the latest window counts). For instance, if the window length is set to an equivalent of five minutes and the emit frequency to one minute, then the bolt will output the latest five-minute sliding window every minute." > Does this mean that the rolling counts for the last 9 events are > ranked and emitted every 2 seconds? 7 seconds The RollingCountBolt "thinks" in seconds. However, behind the scenes RollingCountBolt uses SlidingWindowCounter [1], which in turn is built upon SlotBasedCounter [2]. Both the SlidingWindowCounter and the SlotBasedCounter don't know anything about time or durations (no seconds, minutes, and such). This is by design, as it decouples the responsibility of counting (SlidingWindowCounter/SlotBasedCounter) from the responsibility of tracking the time (RollingCountBolt). The Apache Spark project has exactly the same notion of emitFrequencyInSeconds and windowLengthInSeconds, which they call slideInterval and windowLength. See https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html. They also have a similar diagram to what I showed in [3] that explains the idea behind sliding windows, see section "Window Operations" in the Spark link above. Does that make sense? Michael [1] https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlidingWindowCounter.java [2] https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlotBasedCounter.java [3] http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/ On 01.04.2014 18:45, Software Dev wrote: > In the article > (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/) > > and I was wondering what the rationale was for the emit frequencies > and how they all relate to each other. > > In the example the RollingCountBolt emits every 3 seconds, > IntermediateRankingBolt every 2 seconds and TotalRankingBolt every > 2 seconds. Does this mean that the rolling counts for the last 9 > events are ranked and emitted every 2 seconds? 7 seconds? A little > confused. > > Thanks > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM7A2kACgkQeW5XuG18ujR93wCdHE6Ldu01fRgnMqjIi7chVMbu uEMAnjUyrZQq0xkg2REUzbgvk31A85Dm =YI7Y -----END PGP SIGNATURE-----
