Re: kafkaspout is very slow

Michael Rose Wed, 04 Feb 2015 19:32:55 -0800

How does your CPU look at 23000 tuples/s? Still low?

Have you profiled to see if anything is blocking? Is your spout constantly
doing work?


*Michael Rose*
Senior Platform Engineer
*Full*Contact | fullcontact.com
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
m: +1.720.837.1357 | t: @xorlev


All Your Contacts, Updated and In One Place.
Try FullContact for Free
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>

On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <[email protected]>
wrote:

> I bumped the kafka buffer/fetch sizes to
>
> kafka.fetch.size.bytes:  12582912
> kafka.buffer.size.bytes: 12582912
>
> The throughput almost doubled (to about 23000 un-acked tuples/second). It
> seems increasing the sizes for these two parameters further does not
> improve the performance further. Is there anything else that I can try?
>
> On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <[email protected]>
> wrote:
>
>> 100,000 records is about 12MB.
>> I'll try bumping the numbers, by 100 fold to see if it makes any
>> difference.
>> thanks,
>> -Clay
>>
>> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <
>> [email protected]> wrote:
>>
>>> I would bump these numbers up by a lot:
>>>
>>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>>
>>> Say 10 or 100 times that or more. I dont know by heart how much I
>>> increased those numbers on my topology.
>>>
>>> How many bytes are you writting per minute on kafka? Try dumping 1
>>> minute of messages to a file to figure out how many bytes that is..
>>> I am reading (sending data to the topic) about 100,000 records per
>>> second. My kafka consumer can consume the 3 millions records in less than
>>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>>> get 1500 records per second from the topology. With ack disabled, I get
>>> about 12000/second.
>>> I don't lose any data, it is just the data is emitted from the spout to
>>> the bolt very slowly.
>>>
>>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>>
>>>     topology.transfer.buffer.size: 2048
>>>     topology.executor.buffer.size: 65536
>>>     topology.receiver.buffer.size: 16
>>>     topology.executor.send.buffer.size: 65536
>>>
>>>     kafka.fetch.size.bytes: 102400
>>>     kafka.buffer.size.bytes: 102400
>>>
>>> thanks
>>> Clay
>>>
>>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>>> [email protected]> wrote:
>>>
>>>> can you share a  screenshot of the Storm UI for your spout?
>>>>
>>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <[email protected]>
>>>> wrote:
>>>>
>>>>>  I have this issue with any amount of load. Different max spout
>>>>> pendings do not seem to make much a difference. I've lowered this 
>>>>> parameter
>>>>> to 100, still a little difference . At this point the bolt consuming the
>>>>> data does no processing.
>>>>>
>>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I'm not sure, that i understand your problem .. but here is few
>>>>>> points:
>>>>>> If you have large pending spout size and slow processing - you will
>>>>>> see large latency at kafka spout probably. Spout emits message .. it 
>>>>>> stays
>>>>>> in queue for long time (that will add latency) .. and finally is 
>>>>>> processed
>>>>>> and ack received. You will see queue time + processing time in kafka 
>>>>>> spout
>>>>>> latency.
>>>>>> Take a look at load factors of your bolts - are they close to 1 or
>>>>>> more ? and load factor of kafka spout.
>>>>>>
>>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>>>
>>>>>>> builder.setSpout("kafka",
>>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>>>> the number of partitions
>>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>>
>>>>>>> ----------
>>>>>>> Andrey Yegorov
>>>>>>>
>>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>>>> latency. I have tried different spout max pending and played with the
>>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize 
>>>>>>>> the
>>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>>>> throughput with the simple kafka consumer.
>>>>>>>>
>>>>>>>> thank you for your feedback
>>>>>>>> Clay
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Reply via email to