I would bump these numbers up by a lot:
kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400
Say 10 or 100 times that or more. I dont know by heart how much I increased
those numbers on my topology.
How many bytes are you writting per minute on kafka? Try dumping 1 minute
of messages to a file to figure out how many bytes that is..
I am reading (sending data to the topic) about 100,000 records per second.
My kafka consumer can consume the 3 millions records in less than 50
seconds. I have disabled the ack. But with the ack enabled, I won't even
get 1500 records per second from the topology. With ack disabled, I get
about 12000/second.
I don't lose any data, it is just the data is emitted from the spout to the
bolt very slowly.
I did bump my buffer sizes but I am not sure if they are sufficient.
topology.transfer.buffer.size: 2048
topology.executor.buffer.size: 65536
topology.receiver.buffer.size: 16
topology.executor.send.buffer.size: 65536
kafka.fetch.size.bytes: 102400
kafka.buffer.size.bytes: 102400
thanks
Clay
On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <[email protected]>
wrote:
> can you share a screenshot of the Storm UI for your spout?
>
> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <[email protected]>
> wrote:
>
>> I have this issue with any amount of load. Different max spout pendings
>> do not seem to make much a difference. I've lowered this parameter to 100,
>> still a little difference . At this point the bolt consuming the data does
>> no processing.
>>
>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <[email protected]>
>> wrote:
>>
>>> I'm not sure, that i understand your problem .. but here is few points:
>>> If you have large pending spout size and slow processing - you will see
>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>> queue for long time (that will add latency) .. and finally is processed and
>>> ack received. You will see queue time + processing time in kafka spout
>>> latency.
>>> Take a look at load factors of your bolts - are they close to 1 or more
>>> ? and load factor of kafka spout.
>>>
>>> On 4 February 2015 at 21:19, Andrey Yegorov <[email protected]>
>>> wrote:
>>>
>>>> have you tried increasing max spout pending parameter for the spout?
>>>>
>>>> builder.setSpout("kafka",
>>>> new KafkaSpout(spoutConfig),
>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>> //the maximum parallelism you can have on a KafkaSpout is the
>>>> number of partitions
>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>
>>>> ----------
>>>> Andrey Yegorov
>>>>
>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In my topology, kafka spout is responsible for over 85% of the
>>>>> latency. I have tried different spout max pending and played with the
>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>> throughput with the simple kafka consumer.
>>>>>
>>>>> thank you for your feedback
>>>>> Clay
>>>>>
>>>>>
>>>>
>>>
>>
>