So here is the smoking gun: ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18425 ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18426 ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18427 ObjectDetectorTopology$19@1e8635) - processed_count: 1 ObjectDetectorTopology$19@1e8635) - processed_count: 2 ObjectDetectorTopology$19@1e8635) - processed_count: 3 ObjectDetectorTopology$19@1e8635) - processed_count: 4
These are out of my logs, and the example bolt in question is an anonymous inner class ($19). That bolt is configured for a parallelism of 1, so it is clear that Storm is freeing instance 3f6c7c9c, replacing with instance 1e8635 and of course the stored internal state reinitializes to one. There is no apparent clear pattern to how it decides to do this, as multiple runs have produced differing results empirically. Is there a way to prevent this? I already replaced the code so accumulations are stored in a common place and retrieved, but that is totally suboptimal. I would rather keep in memory state. Thank you On Mon, Jul 27, 2015 at 7:59 PM, Richard Huber <[email protected]> wrote: > Hello all - > > I have a simple topology where there is a fieldsGrouping on a bolt and it > has an internal accumulator, so simply, > > BoltA -> BoltB.fieldsGrouping( BoltA, "string for grouping" ) > > I never had a problem with this code before, it worked perfectly until the > timeframe between accumulator emits exceeded an hour. What I came back to > today was the topology had frozen for one of the groups, because the > accumulator had restarted from one, which I thought should have been > impossible. Basically I use a Map internally the key being the 'string for > grouping' above and storing an integer object as the accumulator. When > accumulator >= threshold, it emits a single message. > > So my question is, after reading much about how this could possibly > happen: How does storm decide to 'retire' or 'kill' a bolt, that for all > intents and purposes is working fine, but has not emitted in a long > period? Is there a timeout configuration for that or some other regular > behavior? > > The standard set size in production is going to take hour(s) before the > accumulator hits the threshold and emits to the next stage of processing. > > Thanks! > > Rich >
