Hi,
I find some strange vaules in the metrics logs of my topology. They have
something in common:
1. population=1
2. very small arrival_rate_secs
3. large sojourn_time, thousand milliseconds
4. from __receive
Also I often get "subprocess heartbeat timeout" Exception in my topology
(using python ShellBolt). "topology.message.timeout.secs" is set 30.
Some logs are here.
2017-03-30 15:59:46,466 11594452 1490860786 nj.xxxxxx.com:7806
255:writeES_events __receive
{sojourn_time_ms=3197.6666666666665, write_pos=8117, read_pos=8116,
arrival_rate_secs=0.3127280308558324, overflow=0, capacity=1024,
population=1}
2017-03-30 15:59:46,540 11594526 1490860786 nj.xxxxxx.com:7801
125:create_alarms __receive {sojourn_time_ms=4818.5,
write_pos=8168, read_pos=8167, arrival_rate_secs=0.20753346477119436,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:46,603 11594589 1490860786 nj.xxxxxx.com:7806
-1:__system __receive {sojourn_time_ms=9483.0,
write_pos=192, read_pos=191, arrival_rate_secs=0.10545186122535062,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:46,812 11594798 1490860786 nj.xxxxxx.com:7806
-1:__system __receive {sojourn_time_ms=9540.000000000002,
write_pos=192, read_pos=191, arrival_rate_secs=0.10482180293501048,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:47,148 11595134 1490860787 nj.xxxxxx.com:7806
135:es-bolt-events __receive {sojourn_time_ms=4637.0,
write_pos=8092, read_pos=8091, arrival_rate_secs=0.21565667457407806,
overflow=0, capacity=1024, population=1}
2017-03-30 16:00:37,344 11645330 1490860837 nj.xxxxxx.com:7806
142:es-bolt-events __receive
{sojourn_time_ms=9819.999999999998, write_pos=8182, read_pos=8181,
arrival_rate_secs=0.10183299389002037, overflow=0, capacity=1024,
population=1}
Anyone have ideas about why sojourn_time is so large?
--
Thanks
Zhechao Ma