Hi Guys, I'm dealing with a Storm topology and trying to improve the throughput and reduce latency. When the topology is consuming 170k records, it is able to process and ack for every record within 1.5 min, single Spout. The average latency is 500 ms. By tripling the data (around 500k) and using 3 spouts (evenly across three spouts), about 5k of the records fail (time out is set to default of 30 sec.). Average latency is around 9 sec and processing latencies within bolts are alway under 50 ms and the capacity of my bolt never go beyond .7-.8. I'm setting max spout pending to around 7k. What is the explanation for the failures? I would have expected the capacity numbers on the Bolt to go way up, they don't. So I'm not sure what is causing the latencies. If I increase the max spout pending parameter the number of failures increase, if I reduce it, the overall throughput is not acceptable. I use a range of 10-30 executors for my bolt and the results don't change much. My guess is the issue is with 'Netty' I need to see what maybe slowing down the communication across JVMs. Is there anything else I should look for? I'm on version .93.
Thanks
