The behaviour you are describing sounds like your topology is processing a small backlog of events built up in each partition and then catching up to realtime where events are only being published to one of the 10 partitions at a time. I will echo Harsha in suggesting that you verify you are actually publishing to all partitions (important: this is *not* the default behaviour).
On Tue, Feb 3, 2015 at 12:05 AM, Vineet Mishra <[email protected]> wrote: > Hi Harsha, > > Based on the proposed metric, I ensured the specified changes by changing > the Kafka-Storm Version bundle. > > Although I could see the difference from the last bundle used to the > current change but was not satisfied by the way Spouts were processing. The > observation which I had was, > > The Spout were running with Executor counts as 10, while initiating the > job around half of the executors(5) started processing in parallel to > ingest the data. > > As soon as the counts reached around a million or so the state of > parallelism dropped and eventually it started processing in serially(One > Executor at a time). > > Executors (All time) > Id Uptime Host Port Emitted Transferred Complete latency (ms) Acked Failed > [2-2] 13m 54s host3 6703 0 0 0.000 0 0 > [3-3] 13m 52s host2 6702 318300 318300 4.789 318160 0 > [4-4] 13m 52s host3 6702 434200 434200 7.064 434380 0 > [5-5] 13m 53s host2 6701 20 20 0.000 0 0 > [6-6] 13m 55s host3 6701 0 0 0.000 0 0 > [7-7] 13m 51s host2 6700 25000 25000 4.122 24500 0 > [8-8] 13m 51s host3 6700 248360 248360 9.514 245780 0 > [9-9] 13m 52s host2 6703 0 0 0.000 0 0 > [10-10] 13m 54s host3 6703 235220 235220 9.250 233200 0 > [11-11] 13m 52s host2 6702 204420 204420 10.382 205800 0 > > I am having around .2 Billion Events ingested to Kafka which needs to be > processed through Storm in Real time but I am not sure what is making this > unexpected intermittent behavior of the storm and how can I prevent this in > near future. > > Expecting Expert Suggestions. > > Thanks! > > > > On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <[email protected]> > wrote: > >> Well I am already running Kafka with 10 Partitions and Replication factor >> as 3 which is the default size of my cluster. >> >> bin/kafka-topics.sh --create --zookeeper host1:2181,host2:2181,host3:2181 >> --replication-factor 3 --partitions 10 --topic test >> >> and I am also running Kafka Storm topology with Executors count as 10 >> >> TopologyBuilder builder=new TopologyBuilder(); >> builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10); >> >> I am having a notion that since the time I have started running Kafka >> from last* changed RF and # of Partitions I am landing up with latency. >> >> * bin/kafka-topics.sh --create --zookeeper >> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1 >> --topic test >> >> Well I will try with above provided Storm Kafka bundle. Hope that could >> help out! >> >> Thanks! >> >> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <[email protected]> wrote: >> >>> Vineet, >>> Can you try using the one in storm >>> https://github.com/apache/storm/tree/master/external/storm-kafka . This >>> is published into maven repo. So you can use the following >>> <dependency> >>> <groupId>org.apache.storm</groupId> >>> <artifactId>storm-kafka</artifactId> >>> <version>0.9.3</version> >>> </dependency> >>> >>> If you are using topic with partitions size 10 make sure you configured >>> your kafka spout with parallelism set to 10. Also make sure on the producer >>> side you are pushing data onto all of the 10 partitions so that your kafka >>> spout is fetching data from all of the 10 partitions. >>> -Harsha >>> >>> >>> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote: >>> >>> Hi Harsha, >>> >>> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from >>> >>> https://github.com/wurstmeister/storm-kafka-0.8-plus >>> >>> Thanks! >>> >>> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <[email protected]> wrote: >>> >>> >>> Vineet, >>> Which kafka spout are you using? >>> >>> -Harsha >>> >>> >>> >>> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote: >>> >>> Hi, >>> >>> I am running Kafka Storm Engine to process real time data generated on a >>> 3 node distributed cluster. >>> >>> Currently I have set 10 Executors for Storm Spout, which I don't think >>> is running in parallel. >>> Moreover earlier I was running the Kafka Topology with Replication >>> Factor and Partitions as 1(which seems to have run comparatively faster), >>> now I gave the Replication Factor as 3 and Partitions as 10 and I could see >>> the performance degradation. >>> >>> Is there any way I can max utilize the available resource and get the >>> max throughput of event processing. >>> >>> Looking for the expert suggestions at URGENT. >>> >>> Thanks! >>> >>> >>> >>> >>> >>> >>> >> >> >
