Re: High latency with trident kafka spout

Danijel Schiavuzzi Fri, 25 Apr 2014 09:43:55 -0700

>From https://dl.dropboxusercontent.com/u/48250946/1.png it looks like the
two Kafka spout instances on host "pablo04" are having trouble consuming
from kafka -- their execute latencies are almost 30ms compared to the spout
instances on the other two hosts (4ms).


You should check the Kafka broker on "pablo04" to determine why consuming
from it is so slow.

Also, you should not leave "topology.max.spout.pending" unset. Be careful
though: when using Trident, it should be set to a number of _batches_, not
tuples (took me a while to figure this out :)



On Fri, Apr 25, 2014 at 3:45 PM, Carlos Rodriguez <[email protected]> wrote:

> Hi,
> I took new screenshots from Storm UI
>
> Topology Stats: https://dl.dropboxusercontent.com/u/48250946/2.png
> Spout3: https://dl.dropboxusercontent.com/u/48250946/1.png
> $mastercoord-bg3: https://dl.dropboxusercontent.com/u/48250946/4.png
>
> Topology Configuration: http://pastebin.com/uLsVa5Hn
>
> I hope these are useful,
> Thanks for your time!
>
>
> 2014-04-25 14:59 GMT+02:00 Danijel Schiavuzzi <[email protected]>:
>
> Could you provide the Storm UI stats for the "spout3" component?
>>
>> Also, it would be helpful if you could provide the topology configuration.
>>
>> 4k messages/s is a very low throughput for Kafka, and should be easily
>> handled especially with a three-node broker.
>>
>>
>>
>> On Fri, Apr 25, 2014 at 1:43 PM, Carlos Rodriguez <[email protected]>wrote:
>>
>>> Hi,
>>>
>>> I'm using storm 0.9.1 and kafka 0.8.
>>> I've used trident to write a topology that get tuples from six
>>> partitions of a kafka topic.
>>>
>>> Every kafka message is about 2000 bytes on average, and that topic has
>>> like 4.000 messages per second.
>>>
>>> In the topology I use the kafka spout provided by wurstmeister at (
>>> https://github.com/wurstmeister/storm-kafka-0.8-plus) with a
>>> parallelismHint of six, to match each executor with a partition. The
>>> version of the kafka spout is 0.5.0-SNAPSHOT (I updated it today to the
>>> last commit on the master branch).
>>>
>>> I use a cluster of 4 different machines to run the topology. Three of
>>> them run kafka (so they have 2 partitions for each machine) and
>>> supervisors, and the last of them runs the nimbus and others services that
>>> I need for my current setup, not related to storm nor kafka.
>>>
>>> I've configured the topology to use 3 works, and each supervisor only
>>> has one slot.
>>>
>>> The problem is that I have high latency problems on this configuration,
>>> and I'm pretty sure that it is related to kafka, because I've removed
>>> almost every line of code from my topology except the kafka spouts, and the
>>> problem persists.
>>>
>>> Let me show you the stats from storm-ui with a picture:
>>> https://dl.dropboxusercontent.com/u/48250946/stormScreenshot.png
>>>
>>> The last spout (spout3) is the kafka spout, and as you can see it has
>>> like 2 secs latency on each tuple. The $mastercoord-bg3 also have a lot of
>>> latency, and when I click on it, its "Output Stats" shows high latency on a
>>> stream called "$batch".
>>>
>>> I don't now if the problem is that the throughput is high for this
>>> configuration (4k msg/sec), or maybe that I have a low number of kafka
>>> partitions.
>>>
>>> I would appreciate any information about what is causing this and any
>>> tip about kafka & storm performance :)
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Danijel Schiavuzzi
>>
>> E: [email protected]
>> W: www.schiavuzzi.com
>> T: +385989035562
>> Skype: danijels7
>>
>
>


-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Re: High latency with trident kafka spout

Reply via email to