Re: Question about bolt death and refresh

Richard Huber Tue, 28 Jul 2015 10:26:18 -0700

So here is the smoking gun:

ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18425
ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18426
ObjectDetectorTopology$19@3f6c7c9c) - processed_count: 18427
ObjectDetectorTopology$19@1e8635) - processed_count: 1
ObjectDetectorTopology$19@1e8635) - processed_count: 2
ObjectDetectorTopology$19@1e8635) - processed_count: 3
ObjectDetectorTopology$19@1e8635) - processed_count: 4

These are out of my logs, and the example bolt in question is an anonymous
inner class ($19).  That bolt is configured for a parallelism of 1, so it
is clear that Storm is freeing instance 3f6c7c9c, replacing with instance
1e8635 and of course the stored internal state reinitializes to one.  There
is no apparent clear pattern to how it decides to do this, as multiple runs
have produced differing results empirically.

Is there a way to prevent this?

I already replaced the code so accumulations are stored in a common place
and retrieved, but that is totally suboptimal.  I would rather keep in
memory state.

Thank you

On Mon, Jul 27, 2015 at 7:59 PM, Richard Huber <[email protected]>
wrote:

> Hello all -
>
> I have a simple topology where there is a fieldsGrouping on a bolt and it
> has an internal accumulator, so simply,
>
> BoltA -> BoltB.fieldsGrouping( BoltA, "string for grouping" )
>
> I never had a problem with this code before, it worked perfectly until the
> timeframe between accumulator emits exceeded an hour.  What I came back to
> today was the topology had frozen for one of the groups, because the
> accumulator had restarted from one, which I thought should have been
> impossible.  Basically I use a Map internally the key being the 'string for
> grouping' above and storing an integer object as the accumulator.  When
> accumulator >= threshold, it emits a single message.
>
> So my question is, after reading much about how this could possibly
> happen:  How does storm decide to 'retire' or 'kill' a bolt, that for all
> intents and purposes is working fine, but has not emitted in a long
> period?  Is there a timeout configuration for that or some other regular
> behavior?
>
> The standard set size in production is going to take hour(s) before the
> accumulator hits the threshold and emits to the next stage of processing.
>
> Thanks!
>
> Rich
>

Re: Question about bolt death and refresh

Reply via email to