Is there a good way to measure how contended a cluster is in terms of inbound/outbound queues?
I'm using 1.0.2 and have noticed that at times tuples flowing through a topology slow down considerably. Load for each of the 5 nodes in the cluster is low and network doesn't appear bottlenecked. Sometimes, if I redeploy or re-balance the topology, throughput increases dramatically for a day or so. I'm using topology.max.spout.pending set to 30 with 8 spouts feeding 40 "writer" bolts. The capacity metric for the busiest bolt is around .780, which seems to indicate that they aren't the bottleneck. topology.message.timeout.secs is set to 120 seconds, but I'm not seeing failures. Additionally, I'm using tic tuples to flush the accumulated data at each bolt to the database every 5 minutes. Between those cycles, the bolt accumulates aggregated data and only writes if cache misses occur. But, the cache hit rate is almost always 100%. -Tom
