Re: Storm Kafka Processing

Andrew Neilson Tue, 03 Feb 2015 00:20:51 -0800

The behaviour you are describing sounds like your topology is processing a
small backlog of events built up in each partition and then catching up to
realtime where events are only being published to one of the 10 partitions
at a time. I will echo Harsha in suggesting that you verify you are
actually publishing to all partitions (important: this is *not* the default
behaviour).


On Tue, Feb 3, 2015 at 12:05 AM, Vineet Mishra <[email protected]>
wrote:

> Hi Harsha,
>
> Based on the proposed metric, I ensured the specified changes by changing
> the Kafka-Storm Version bundle.
>
> Although I could see the difference from the last bundle used to the
> current change but was not satisfied by the way Spouts were processing. The
> observation which I had was,
>
> The Spout were running with Executor counts as 10, while initiating the
> job around half of the executors(5) started processing in parallel to
> ingest the data.
>
> As soon as the counts reached around a million or so the state of
> parallelism dropped and eventually it started processing in serially(One
> Executor at a time).
>
> Executors (All time)
> Id Uptime Host Port Emitted Transferred Complete latency (ms) Acked Failed
> [2-2] 13m 54s host3 6703 0 0 0.000 0 0
> [3-3] 13m 52s host2 6702 318300 318300 4.789 318160 0
> [4-4] 13m 52s host3 6702 434200 434200 7.064 434380 0
> [5-5] 13m 53s host2 6701 20 20 0.000 0 0
> [6-6] 13m 55s host3 6701 0 0 0.000 0 0
> [7-7] 13m 51s host2 6700 25000 25000 4.122 24500 0
> [8-8] 13m 51s host3 6700 248360 248360 9.514 245780 0
> [9-9] 13m 52s host2 6703 0 0 0.000 0 0
> [10-10] 13m 54s host3 6703 235220 235220 9.250 233200 0
> [11-11] 13m 52s host2 6702 204420 204420 10.382 205800 0
>
> I am having around .2 Billion Events ingested to Kafka which needs to be
> processed through Storm in Real time but I am not sure what is making this
> unexpected intermittent behavior of the storm and how can I prevent this in
> near future.
>
> Expecting Expert Suggestions.
>
> Thanks!
>
>
>
> On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <[email protected]>
> wrote:
>
>> Well I am already running Kafka with 10 Partitions and Replication factor
>> as 3 which is the default size of my cluster.
>>
>> bin/kafka-topics.sh --create --zookeeper host1:2181,host2:2181,host3:2181
>> --replication-factor 3 --partitions 10 --topic test
>>
>> and I am also running Kafka Storm topology with Executors count as 10
>>
>> TopologyBuilder builder=new TopologyBuilder();
>>         builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10);
>>
>> I am having a notion that since the time I have started running Kafka
>> from last* changed RF and # of Partitions I am landing up with latency.
>>
>> * bin/kafka-topics.sh --create --zookeeper
>> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1
>> --topic test
>>
>> Well I will try with above provided Storm Kafka bundle. Hope that could
>> help out!
>>
>> Thanks!
>>
>> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <[email protected]> wrote:
>>
>>>  Vineet,
>>>        Can you try using the one in storm
>>> https://github.com/apache/storm/tree/master/external/storm-kafka . This
>>> is published into maven repo. So you can use the following
>>> <dependency>
>>> <groupId>org.apache.storm</groupId>
>>> <artifactId>storm-kafka</artifactId>
>>> <version>0.9.3</version>
>>> </dependency>
>>>
>>> If you are using topic with partitions size 10 make sure you configured
>>> your kafka spout with parallelism set to 10. Also make sure on the producer
>>> side you are pushing data onto all of the 10 partitions so that your kafka
>>> spout is fetching data from all of the 10 partitions.
>>> -Harsha
>>>
>>>
>>> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote:
>>>
>>> Hi Harsha,
>>>
>>> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from
>>>
>>> https://github.com/wurstmeister/storm-kafka-0.8-plus
>>>
>>> Thanks!
>>>
>>> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <[email protected]> wrote:
>>>
>>>
>>> Vineet,
>>>         Which kafka spout are you using?
>>>
>>> -Harsha
>>>
>>>
>>>
>>> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote:
>>>
>>> Hi,
>>>
>>> I am running Kafka Storm Engine to process real time data generated on a
>>> 3 node distributed cluster.
>>>
>>> Currently I have set 10 Executors for Storm Spout, which I don't think
>>> is running in parallel.
>>> Moreover earlier I was running the Kafka Topology with Replication
>>> Factor and Partitions as 1(which seems to have run comparatively faster),
>>> now I gave the Replication Factor as 3 and Partitions as 10 and I could see
>>> the performance degradation.
>>>
>>> Is there any way I can max utilize the available resource and get the
>>> max throughput of event processing.
>>>
>>> Looking for the expert suggestions at URGENT.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Storm Kafka Processing

Reply via email to