Hi, I'm using storm 0.9.1 and kafka 0.8. I've used trident to write a topology that get tuples from six partitions of a kafka topic.
Every kafka message is about 2000 bytes on average, and that topic has like 4.000 messages per second. In the topology I use the kafka spout provided by wurstmeister at ( https://github.com/wurstmeister/storm-kafka-0.8-plus) with a parallelismHint of six, to match each executor with a partition. The version of the kafka spout is 0.5.0-SNAPSHOT (I updated it today to the last commit on the master branch). I use a cluster of 4 different machines to run the topology. Three of them run kafka (so they have 2 partitions for each machine) and supervisors, and the last of them runs the nimbus and others services that I need for my current setup, not related to storm nor kafka. I've configured the topology to use 3 works, and each supervisor only has one slot. The problem is that I have high latency problems on this configuration, and I'm pretty sure that it is related to kafka, because I've removed almost every line of code from my topology except the kafka spouts, and the problem persists. Let me show you the stats from storm-ui with a picture: https://dl.dropboxusercontent.com/u/48250946/stormScreenshot.png The last spout (spout3) is the kafka spout, and as you can see it has like 2 secs latency on each tuple. The $mastercoord-bg3 also have a lot of latency, and when I click on it, its "Output Stats" shows high latency on a stream called "$batch". I don't now if the problem is that the throughput is high for this configuration (4k msg/sec), or maybe that I have a low number of kafka partitions. I would appreciate any information about what is causing this and any tip about kafka & storm performance :) Thanks!
