100,000 records is about 12MB. I'll try bumping the numbers, by 100 fold to see if it makes any difference. thanks, -Clay
On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <[email protected]> wrote: > I would bump these numbers up by a lot: > > kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400 > > Say 10 or 100 times that or more. I dont know by heart how much I > increased those numbers on my topology. > > How many bytes are you writting per minute on kafka? Try dumping 1 minute > of messages to a file to figure out how many bytes that is.. > I am reading (sending data to the topic) about 100,000 records per second. > My kafka consumer can consume the 3 millions records in less than 50 > seconds. I have disabled the ack. But with the ack enabled, I won't even > get 1500 records per second from the topology. With ack disabled, I get > about 12000/second. > I don't lose any data, it is just the data is emitted from the spout to > the bolt very slowly. > > I did bump my buffer sizes but I am not sure if they are sufficient. > > topology.transfer.buffer.size: 2048 > topology.executor.buffer.size: 65536 > topology.receiver.buffer.size: 16 > topology.executor.send.buffer.size: 65536 > > kafka.fetch.size.bytes: 102400 > kafka.buffer.size.bytes: 102400 > > thanks > Clay > > On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <[email protected] > > wrote: > >> can you share a screenshot of the Storm UI for your spout? >> >> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <[email protected]> >> wrote: >> >>> I have this issue with any amount of load. Different max spout pendings >>> do not seem to make much a difference. I've lowered this parameter to 100, >>> still a little difference . At this point the bolt consuming the data does >>> no processing. >>> >>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <[email protected]> >>> wrote: >>> >>>> I'm not sure, that i understand your problem .. but here is few points: >>>> If you have large pending spout size and slow processing - you will see >>>> large latency at kafka spout probably. Spout emits message .. it stays in >>>> queue for long time (that will add latency) .. and finally is processed and >>>> ack received. You will see queue time + processing time in kafka spout >>>> latency. >>>> Take a look at load factors of your bolts - are they close to 1 or more >>>> ? and load factor of kafka spout. >>>> >>>> On 4 February 2015 at 21:19, Andrey Yegorov <[email protected]> >>>> wrote: >>>> >>>>> have you tried increasing max spout pending parameter for the spout? >>>>> >>>>> builder.setSpout("kafka", >>>>> new KafkaSpout(spoutConfig), >>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>> //the maximum parallelism you can have on a KafkaSpout is >>>>> the number of partitions >>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >>>>> >>>>> ---------- >>>>> Andrey Yegorov >>>>> >>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> In my topology, kafka spout is responsible for over 85% of the >>>>>> latency. I have tried different spout max pending and played with the >>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize >>>>>> the >>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high >>>>>> throughput with the simple kafka consumer. >>>>>> >>>>>> thank you for your feedback >>>>>> Clay >>>>>> >>>>>> >>>>> >>>> >>> >> >
