Hello,

In our topology we utilize the metrics exposed by SystemBolt to record the
stats of our different JVM workers.  We have it set up to go to DataDog.

Recently we've been pleagued by this bug that brings our topology into some
kind of a loop reprocessing the same data again and again.  While that's a
code problem of ours, the initial behavior that serves as a precursor to
this happening is a slow down of the amount of metrics we get from the
SystemBolt.

My question is.  What kind of a situation could cause the SystemBolt
metrics to be sent at a much slower rate than the
'topology.builtin.metrics.bucket.size.secs' defined one.

Consider the following setup:

[Worker 1 - (slowed down worker)] -> sends metrics to -> [Worker 2 - (where
my MetricsConsumer bolt lives on]

[Worker 3 - (all seems normal) ] -> sends metrics to -> [Worker 2 - (where
my MetricsConsumer bolt lives on)]

So my theory that something is wrong with Worker 1 is based on Worker 2
being the one single place MetricsConsumer lives on and the fact that all
stats made by Worker 2 SystemBolt and Worker 3 SystemBolt seem to be coming
at the 20 seconds interval i've set.

But worker 1's JVM stat start coing at odd intervals ranging from 20 to 50
MINUTES!

Any help hypothesizing what could cause that inside a worker would be
greatly appreciated.

Thanks!
Daniel

Reply via email to