How does your CPU look at 23000 tuples/s? Still low? Have you profiled to see if anything is blocking? Is your spout constantly doing work?
*Michael Rose* Senior Platform Engineer *Full*Contact | fullcontact.com <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures> m: +1.720.837.1357 | t: @xorlev All Your Contacts, Updated and In One Place. Try FullContact for Free <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures> On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <[email protected]> wrote: > I bumped the kafka buffer/fetch sizes to > > kafka.fetch.size.bytes: 12582912 > kafka.buffer.size.bytes: 12582912 > > The throughput almost doubled (to about 23000 un-acked tuples/second). It > seems increasing the sizes for these two parameters further does not > improve the performance further. Is there anything else that I can try? > > On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <[email protected]> > wrote: > >> 100,000 records is about 12MB. >> I'll try bumping the numbers, by 100 fold to see if it makes any >> difference. >> thanks, >> -Clay >> >> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura < >> [email protected]> wrote: >> >>> I would bump these numbers up by a lot: >>> >>> kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400 >>> >>> Say 10 or 100 times that or more. I dont know by heart how much I >>> increased those numbers on my topology. >>> >>> How many bytes are you writting per minute on kafka? Try dumping 1 >>> minute of messages to a file to figure out how many bytes that is.. >>> I am reading (sending data to the topic) about 100,000 records per >>> second. My kafka consumer can consume the 3 millions records in less than >>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even >>> get 1500 records per second from the topology. With ack disabled, I get >>> about 12000/second. >>> I don't lose any data, it is just the data is emitted from the spout to >>> the bolt very slowly. >>> >>> I did bump my buffer sizes but I am not sure if they are sufficient. >>> >>> topology.transfer.buffer.size: 2048 >>> topology.executor.buffer.size: 65536 >>> topology.receiver.buffer.size: 16 >>> topology.executor.send.buffer.size: 65536 >>> >>> kafka.fetch.size.bytes: 102400 >>> kafka.buffer.size.bytes: 102400 >>> >>> thanks >>> Clay >>> >>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura < >>> [email protected]> wrote: >>> >>>> can you share a screenshot of the Storm UI for your spout? >>>> >>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <[email protected]> >>>> wrote: >>>> >>>>> I have this issue with any amount of load. Different max spout >>>>> pendings do not seem to make much a difference. I've lowered this >>>>> parameter >>>>> to 100, still a little difference . At this point the bolt consuming the >>>>> data does no processing. >>>>> >>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <[email protected]> >>>>> wrote: >>>>> >>>>>> I'm not sure, that i understand your problem .. but here is few >>>>>> points: >>>>>> If you have large pending spout size and slow processing - you will >>>>>> see large latency at kafka spout probably. Spout emits message .. it >>>>>> stays >>>>>> in queue for long time (that will add latency) .. and finally is >>>>>> processed >>>>>> and ack received. You will see queue time + processing time in kafka >>>>>> spout >>>>>> latency. >>>>>> Take a look at load factors of your bolts - are they close to 1 or >>>>>> more ? and load factor of kafka spout. >>>>>> >>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> have you tried increasing max spout pending parameter for the spout? >>>>>>> >>>>>>> builder.setSpout("kafka", >>>>>>> new KafkaSpout(spoutConfig), >>>>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>> //the maximum parallelism you can have on a KafkaSpout is >>>>>>> the number of partitions >>>>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >>>>>>> >>>>>>> ---------- >>>>>>> Andrey Yegorov >>>>>>> >>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> In my topology, kafka spout is responsible for over 85% of the >>>>>>>> latency. I have tried different spout max pending and played with the >>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize >>>>>>>> the >>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high >>>>>>>> throughput with the simple kafka consumer. >>>>>>>> >>>>>>>> thank you for your feedback >>>>>>>> Clay >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
