Do you mean to say that the event published to Kafka is not partition distributed?
Well while creating the topic I ensured to create # of partitions as 10 and replication factor as 3. Is it something effected as how I am writing to Kafka? Thanks! On Tue, Feb 3, 2015 at 1:50 PM, Andrew Neilson <[email protected]> wrote: > The behaviour you are describing sounds like your topology is processing a > small backlog of events built up in each partition and then catching up to > realtime where events are only being published to one of the 10 partitions > at a time. I will echo Harsha in suggesting that you verify you are > actually publishing to all partitions (important: this is *not* the > default behaviour). > > On Tue, Feb 3, 2015 at 12:05 AM, Vineet Mishra <[email protected]> > wrote: > >> Hi Harsha, >> >> Based on the proposed metric, I ensured the specified changes by changing >> the Kafka-Storm Version bundle. >> >> Although I could see the difference from the last bundle used to the >> current change but was not satisfied by the way Spouts were processing. The >> observation which I had was, >> >> The Spout were running with Executor counts as 10, while initiating the >> job around half of the executors(5) started processing in parallel to >> ingest the data. >> >> As soon as the counts reached around a million or so the state of >> parallelism dropped and eventually it started processing in serially(One >> Executor at a time). >> >> Executors (All time) >> Id Uptime Host Port Emitted Transferred Complete latency (ms) Acked >> Failed >> [2-2] 13m 54s host3 6703 0 0 0.000 0 0 >> [3-3] 13m 52s host2 6702 318300 318300 4.789 318160 0 >> [4-4] 13m 52s host3 6702 434200 434200 7.064 434380 0 >> [5-5] 13m 53s host2 6701 20 20 0.000 0 0 >> [6-6] 13m 55s host3 6701 0 0 0.000 0 0 >> [7-7] 13m 51s host2 6700 25000 25000 4.122 24500 0 >> [8-8] 13m 51s host3 6700 248360 248360 9.514 245780 0 >> [9-9] 13m 52s host2 6703 0 0 0.000 0 0 >> [10-10] 13m 54s host3 6703 235220 235220 9.250 233200 0 >> [11-11] 13m 52s host2 6702 204420 204420 10.382 205800 0 >> >> I am having around .2 Billion Events ingested to Kafka which needs to be >> processed through Storm in Real time but I am not sure what is making this >> unexpected intermittent behavior of the storm and how can I prevent this in >> near future. >> >> Expecting Expert Suggestions. >> >> Thanks! >> >> >> >> On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <[email protected]> >> wrote: >> >>> Well I am already running Kafka with 10 Partitions and Replication >>> factor as 3 which is the default size of my cluster. >>> >>> bin/kafka-topics.sh --create --zookeeper >>> host1:2181,host2:2181,host3:2181 --replication-factor 3 --partitions 10 >>> --topic test >>> >>> and I am also running Kafka Storm topology with Executors count as 10 >>> >>> TopologyBuilder builder=new TopologyBuilder(); >>> builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10); >>> >>> I am having a notion that since the time I have started running Kafka >>> from last* changed RF and # of Partitions I am landing up with latency. >>> >>> * bin/kafka-topics.sh --create --zookeeper >>> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1 >>> --topic test >>> >>> Well I will try with above provided Storm Kafka bundle. Hope that could >>> help out! >>> >>> Thanks! >>> >>> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <[email protected]> wrote: >>> >>>> Vineet, >>>> Can you try using the one in storm >>>> https://github.com/apache/storm/tree/master/external/storm-kafka . >>>> This is published into maven repo. So you can use the following >>>> <dependency> >>>> <groupId>org.apache.storm</groupId> >>>> <artifactId>storm-kafka</artifactId> >>>> <version>0.9.3</version> >>>> </dependency> >>>> >>>> If you are using topic with partitions size 10 make sure you configured >>>> your kafka spout with parallelism set to 10. Also make sure on the producer >>>> side you are pushing data onto all of the 10 partitions so that your kafka >>>> spout is fetching data from all of the 10 partitions. >>>> -Harsha >>>> >>>> >>>> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote: >>>> >>>> Hi Harsha, >>>> >>>> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from >>>> >>>> https://github.com/wurstmeister/storm-kafka-0.8-plus >>>> >>>> Thanks! >>>> >>>> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <[email protected]> wrote: >>>> >>>> >>>> Vineet, >>>> Which kafka spout are you using? >>>> >>>> -Harsha >>>> >>>> >>>> >>>> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote: >>>> >>>> Hi, >>>> >>>> I am running Kafka Storm Engine to process real time data generated on >>>> a 3 node distributed cluster. >>>> >>>> Currently I have set 10 Executors for Storm Spout, which I don't think >>>> is running in parallel. >>>> Moreover earlier I was running the Kafka Topology with Replication >>>> Factor and Partitions as 1(which seems to have run comparatively faster), >>>> now I gave the Replication Factor as 3 and Partitions as 10 and I could see >>>> the performance degradation. >>>> >>>> Is there any way I can max utilize the available resource and get the >>>> max throughput of event processing. >>>> >>>> Looking for the expert suggestions at URGENT. >>>> >>>> Thanks! >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> >
