Hi,

I find some strange vaules in the metrics logs of my topology. They have
something in common:
  1. population=1
  2. very small arrival_rate_secs
  3. large sojourn_time, thousand milliseconds
  4. from __receive

Also I often get "subprocess heartbeat timeout" Exception in my topology
(using python ShellBolt). "topology.message.timeout.secs" is set 30.

Some logs are here.

2017-03-30 15:59:46,466 11594452 1490860786     nj.xxxxxx.com:7806
255:writeES_events      __receive
{sojourn_time_ms=3197.6666666666665, write_pos=8117, read_pos=8116,
arrival_rate_secs=0.3127280308558324, overflow=0, capacity=1024,
population=1}
2017-03-30 15:59:46,540 11594526 1490860786      nj.xxxxxx.com:7801
125:create_alarms       __receive               {sojourn_time_ms=4818.5,
write_pos=8168, read_pos=8167, arrival_rate_secs=0.20753346477119436,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:46,603 11594589 1490860786     nj.xxxxxx.com:7806
-1:__system    __receive               {sojourn_time_ms=9483.0,
write_pos=192, read_pos=191, arrival_rate_secs=0.10545186122535062,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:46,812 11594798 1490860786     nj.xxxxxx.com:7806
-1:__system    __receive               {sojourn_time_ms=9540.000000000002,
write_pos=192, read_pos=191, arrival_rate_secs=0.10482180293501048,
overflow=0, capacity=1024, population=1}
2017-03-30 15:59:47,148 11595134 1490860787     nj.xxxxxx.com:7806
135:es-bolt-events      __receive               {sojourn_time_ms=4637.0,
write_pos=8092, read_pos=8091, arrival_rate_secs=0.21565667457407806,
overflow=0, capacity=1024, population=1}
2017-03-30 16:00:37,344 11645330 1490860837     nj.xxxxxx.com:7806
142:es-bolt-events      __receive
{sojourn_time_ms=9819.999999999998, write_pos=8182, read_pos=8181,
arrival_rate_secs=0.10183299389002037, overflow=0, capacity=1024,
population=1}

Anyone have ideas about why sojourn_time is so large?

-- 
Thanks
Zhechao Ma

Reply via email to