Hi, We are attempting a real-time distributed computing using storm and the solution has only one problem - inter bolt latency on same machine or across machines ranges between 2 - 250 ms. I am not able to figure out why. Network latency is under 0.5 ms. By latency, I mean the time between an emit of one bolt/spout to getting the message in execute() of next bolt.
I have a topology like the below - A (Spout) ->(Emits a number say 1000) -> B (bolt) [Receives this number and divides this into 10 emits of 100 each) -> C (bolt) [Recieves these emits and divides this to 10 emits of 10 numbers) -> D (bolt) [Does some computation on the number and emits one message] -> E (bolt) [Aggregates all the data and confirms if all the 1000 messages are processed) Every bolt takes under 3 msec to complete and as a result, I estimated that the end to end processing for 1000 takes not more than 50 msec including any latencies. *Observations* 1. The end to end time from Spout A to Bolt E takes 200 msec to 3 seconds. My estimate was under 50 msec given that each bolt and spout take under 3 msec to execute including any latencies. 2. I noticed that the most of the time is spent between Emit from a Spout/Bolt and execute() of the consuming bolt. 3. Network latency is under 0.5 msec. I am not able to figure out why it takes so much time between a spout/bolt to next bolt. I understand that the spout/bolt buffers the data into a queue and then the subsequent bolt consumes from there. *Infrastructure* 1. 5 VMs with 4 CPU and 8 GB ram. Workers are with 1024 MB and there are 20 workers overall. *Test* 1. The test was done with 25 messages to the spout => 25 messages are sent to spout in a span of 5 seconds. *Config values* Config config = new Config(); config.put(Config.TOPOLOGY_WORKERS, Integer.parseInt(20)); config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384); config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384); config.put(Config.TOPOLOGY_ACKER_EXECUTORS, 1); config.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 8); config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 64); Please let me know if you have encountered similar issues and any steps you have taken to mitigate the time taken between spout/bolt and another bolt. Thanks Kashyap
