Hello All,
I wanted a time history of the internal storm metrics (the ones displayed
on the Nimbus UI), so I created and registered a custom Metrics consumer
that intercepts the data points and publishes them to a StatsD backend.
When my Metrics consumer receives the data points, I create custom metrics,
with the following naming scheme -
"{TaskName}+{TaskId}+{Supervisor-Host-IP}+{Supervisor-Host-Port}+{Internal_Storm_Metric_Name}+{Stream_Name}".
The purpose of this to identify the metrics from each task in the topology
and finally aggregate on the Graphite front-end to display a time history
for the metrics we see on the Nimbus UI. Currently the metrics I am
tracking are 'emit-count', 'execute-count', 'execute-latency',
'process-latency' and 'ack-count'.
For most of my bolts, the values are very close to what we see on the
Nimbus UI (with the same window of aggregation 10min, 3hr and 1 day).
However, for certain bolts, I am seeing zero values.
These are metrics that correspond to the following scenarios :
1. The bolt responds only to a tick tuple. Upon every tick tuple, the bolt
makes a DB call, extracts some information and generates tuples from that
information and puts them in a stream. Nimbus UI correctly shows me the
execute latency and emit count for this action but I see only 0 values from
the metrics stream
2. Another bolt receives both tick and non tick tuples. Upon receiving a
non tick tuple, this bolt emits tuples of its own and the metrics for this
stream and action are displayed accurately in the metrics stream and the
counts and latency match up pretty well with the Nimbus UI. Upon receiving
a tick tuple though, this bolt makes a DB call and doesn't emit any tuples
of its own and once again, I am seeing 0 values in the metrics stream
Since I include stream name in my metrics, I can see in Graphite that ALL
bolts are publishing something to the metrics stream and this isn't a case
of Graphite showing 0 values because it received no data.
Also, I use the default storm interval for the Metrics publishing, which is
60 seconds and my StatsD flush interval is 10 seconds, so I don't think it
is a case of me missing data points.
Any insights into this matter are much appreciated.
Thanks,
Yash