Your latency observation is in line with https://hortonworks.com/blog/microbenchmarking-storm-1-0-performance/
You might be able to achieve that kind of latency with the upcoming Storm 2.0. Since it is not released, you could try building the bits from the master branch. Some of the settings have changed in 2.0, so take a look at Performance.md for info. Right, in 1worker mode there is no ser/deser. You can spin up multiple 1 worker instances if needed. Btw, what is your application domain that needs such low latency ? Roshan Sent from Yahoo Mail for iPhone On Wednesday, April 4, 2018, 2:10 AM, Wijekoon, Manusha <[email protected]> wrote: <!--#yiv0771744635 _filtered #yiv0771744635 {font-family:"Cambria Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv0771744635 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv0771744635 #yiv0771744635 p.yiv0771744635MsoNormal, #yiv0771744635 li.yiv0771744635MsoNormal, #yiv0771744635 div.yiv0771744635MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv0771744635 a:link, #yiv0771744635 span.yiv0771744635MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv0771744635 a:visited, #yiv0771744635 span.yiv0771744635MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv0771744635 p {margin-right:0cm;margin-left:0cm;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv0771744635 span.yiv0771744635EmailStyle17 {font-family:"Calibri", sans-serif;color:windowtext;}#yiv0771744635 span.yiv0771744635vote-count-post {}#yiv0771744635 .yiv0771744635MsoChpDefault {font-family:"Calibri", sans-serif;} _filtered #yiv0771744635 {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv0771744635 div.yiv0771744635WordSection1 {}--> In my topology I see around 1 - 2 ms latency when transferring tuples from spouts to bolts or from bolts to bolts. I am calculating latency using nanosecond timestamps because the whole topology runs inside a single worker. Topology is run in a cluster which runs in a production capable hardware. To my understanding, tuples need not be serialized/de-serialized in this case as everything is inside single JVM. I have set parallelism hint for most spouts and bolts to 5 and spouts only produce events at a rate of 100 per second. I dont think high latency is due to queuing of events because I dont see any increase of latency with time. No memory increase either. log levels are set to ERROR. CPU usage is in the range of 200 to 300 %. I have set batch size to 1 and batch timeout to 1 ms (lowest possible). I am trying to achieve less than 50 us for tuple transfer. Is there a way to tune storm to get this kind of latency?
