Re: Storm Kafka Processing

Vineet Mishra Tue, 03 Feb 2015 00:46:15 -0800

Do you mean to say that the event published to Kafka is not partition
distributed?


Well while creating the topic I ensured to create # of partitions as 10 and
replication factor as 3.

Is it something effected as how I am writing to Kafka?

Thanks!

On Tue, Feb 3, 2015 at 1:50 PM, Andrew Neilson <[email protected]> wrote:

> The behaviour you are describing sounds like your topology is processing a
> small backlog of events built up in each partition and then catching up to
> realtime where events are only being published to one of the 10 partitions
> at a time. I will echo Harsha in suggesting that you verify you are
> actually publishing to all partitions (important: this is *not* the
> default behaviour).
>
> On Tue, Feb 3, 2015 at 12:05 AM, Vineet Mishra <[email protected]>
> wrote:
>
>> Hi Harsha,
>>
>> Based on the proposed metric, I ensured the specified changes by changing
>> the Kafka-Storm Version bundle.
>>
>> Although I could see the difference from the last bundle used to the
>> current change but was not satisfied by the way Spouts were processing. The
>> observation which I had was,
>>
>> The Spout were running with Executor counts as 10, while initiating the
>> job around half of the executors(5) started processing in parallel to
>> ingest the data.
>>
>> As soon as the counts reached around a million or so the state of
>> parallelism dropped and eventually it started processing in serially(One
>> Executor at a time).
>>
>> Executors (All time)
>> Id Uptime Host Port Emitted Transferred Complete latency (ms) Acked
>> Failed
>> [2-2] 13m 54s host3 6703 0 0 0.000 0 0
>> [3-3] 13m 52s host2 6702 318300 318300 4.789 318160 0
>> [4-4] 13m 52s host3 6702 434200 434200 7.064 434380 0
>> [5-5] 13m 53s host2 6701 20 20 0.000 0 0
>> [6-6] 13m 55s host3 6701 0 0 0.000 0 0
>> [7-7] 13m 51s host2 6700 25000 25000 4.122 24500 0
>> [8-8] 13m 51s host3 6700 248360 248360 9.514 245780 0
>> [9-9] 13m 52s host2 6703 0 0 0.000 0 0
>> [10-10] 13m 54s host3 6703 235220 235220 9.250 233200 0
>> [11-11] 13m 52s host2 6702 204420 204420 10.382 205800 0
>>
>> I am having around .2 Billion Events ingested to Kafka which needs to be
>> processed through Storm in Real time but I am not sure what is making this
>> unexpected intermittent behavior of the storm and how can I prevent this in
>> near future.
>>
>> Expecting Expert Suggestions.
>>
>> Thanks!
>>
>>
>>
>> On Mon, Feb 2, 2015 at 11:53 PM, Vineet Mishra <[email protected]>
>> wrote:
>>
>>> Well I am already running Kafka with 10 Partitions and Replication
>>> factor as 3 which is the default size of my cluster.
>>>
>>> bin/kafka-topics.sh --create --zookeeper
>>> host1:2181,host2:2181,host3:2181 --replication-factor 3 --partitions 10
>>> --topic test
>>>
>>> and I am also running Kafka Storm topology with Executors count as 10
>>>
>>> TopologyBuilder builder=new TopologyBuilder();
>>>         builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 10);
>>>
>>> I am having a notion that since the time I have started running Kafka
>>> from last* changed RF and # of Partitions I am landing up with latency.
>>>
>>> * bin/kafka-topics.sh --create --zookeeper
>>> host1:2181,host2:2181,host3:2181 --replication-factor 1 --partitions 1
>>> --topic test
>>>
>>> Well I will try with above provided Storm Kafka bundle. Hope that could
>>> help out!
>>>
>>> Thanks!
>>>
>>> On Mon, Feb 2, 2015 at 10:30 PM, Harsha <[email protected]> wrote:
>>>
>>>>  Vineet,
>>>>        Can you try using the one in storm
>>>> https://github.com/apache/storm/tree/master/external/storm-kafka .
>>>> This is published into maven repo. So you can use the following
>>>> <dependency>
>>>> <groupId>org.apache.storm</groupId>
>>>> <artifactId>storm-kafka</artifactId>
>>>> <version>0.9.3</version>
>>>> </dependency>
>>>>
>>>> If you are using topic with partitions size 10 make sure you configured
>>>> your kafka spout with parallelism set to 10. Also make sure on the producer
>>>> side you are pushing data onto all of the 10 partitions so that your kafka
>>>> spout is fetching data from all of the 10 partitions.
>>>> -Harsha
>>>>
>>>>
>>>> On Mon, Feb 2, 2015, at 08:55 AM, Vineet Mishra wrote:
>>>>
>>>> Hi Harsha,
>>>>
>>>> I am using storm.kafka.KafkaSpout.KafkaSpout implementation from
>>>>
>>>> https://github.com/wurstmeister/storm-kafka-0.8-plus
>>>>
>>>> Thanks!
>>>>
>>>> On Mon, Feb 2, 2015 at 8:14 PM, Harsha <[email protected]> wrote:
>>>>
>>>>
>>>> Vineet,
>>>>         Which kafka spout are you using?
>>>>
>>>> -Harsha
>>>>
>>>>
>>>>
>>>> On Mon, Feb 2, 2015, at 05:25 AM, Vineet Mishra wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am running Kafka Storm Engine to process real time data generated on
>>>> a 3 node distributed cluster.
>>>>
>>>> Currently I have set 10 Executors for Storm Spout, which I don't think
>>>> is running in parallel.
>>>> Moreover earlier I was running the Kafka Topology with Replication
>>>> Factor and Partitions as 1(which seems to have run comparatively faster),
>>>> now I gave the Replication Factor as 3 and Partitions as 10 and I could see
>>>> the performance degradation.
>>>>
>>>> Is there any way I can max utilize the available resource and get the
>>>> max throughput of event processing.
>>>>
>>>> Looking for the expert suggestions at URGENT.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Storm Kafka Processing

Reply via email to