CPU is around 100% On Wed, Feb 4, 2015 at 9:30 PM, Michael Rose <[email protected]> wrote:
> How does your CPU look at 23000 tuples/s? Still low? > > Have you profiled to see if anything is blocking? Is your spout constantly > doing work? > > *Michael Rose* > Senior Platform Engineer > *Full*Contact | fullcontact.com > <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures> > m: +1.720.837.1357 | t: @xorlev > > > All Your Contacts, Updated and In One Place. > Try FullContact for Free > <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures> > > On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <[email protected]> > wrote: > >> I bumped the kafka buffer/fetch sizes to >> >> kafka.fetch.size.bytes: 12582912 >> kafka.buffer.size.bytes: 12582912 >> >> The throughput almost doubled (to about 23000 un-acked tuples/second). It >> seems increasing the sizes for these two parameters further does not >> improve the performance further. Is there anything else that I can try? >> >> On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <[email protected]> >> wrote: >> >>> 100,000 records is about 12MB. >>> I'll try bumping the numbers, by 100 fold to see if it makes any >>> difference. >>> thanks, >>> -Clay >>> >>> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura < >>> [email protected]> wrote: >>> >>>> I would bump these numbers up by a lot: >>>> >>>> kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400 >>>> >>>> Say 10 or 100 times that or more. I dont know by heart how much I >>>> increased those numbers on my topology. >>>> >>>> How many bytes are you writting per minute on kafka? Try dumping 1 >>>> minute of messages to a file to figure out how many bytes that is.. >>>> I am reading (sending data to the topic) about 100,000 records per >>>> second. My kafka consumer can consume the 3 millions records in less than >>>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even >>>> get 1500 records per second from the topology. With ack disabled, I get >>>> about 12000/second. >>>> I don't lose any data, it is just the data is emitted from the spout to >>>> the bolt very slowly. >>>> >>>> I did bump my buffer sizes but I am not sure if they are sufficient. >>>> >>>> topology.transfer.buffer.size: 2048 >>>> topology.executor.buffer.size: 65536 >>>> topology.receiver.buffer.size: 16 >>>> topology.executor.send.buffer.size: 65536 >>>> >>>> kafka.fetch.size.bytes: 102400 >>>> kafka.buffer.size.bytes: 102400 >>>> >>>> thanks >>>> Clay >>>> >>>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura < >>>> [email protected]> wrote: >>>> >>>>> can you share a screenshot of the Storm UI for your spout? >>>>> >>>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <[email protected]> >>>>> wrote: >>>>> >>>>>> I have this issue with any amount of load. Different max spout >>>>>> pendings do not seem to make much a difference. I've lowered this >>>>>> parameter >>>>>> to 100, still a little difference . At this point the bolt consuming the >>>>>> data does no processing. >>>>>> >>>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I'm not sure, that i understand your problem .. but here is few >>>>>>> points: >>>>>>> If you have large pending spout size and slow processing - you will >>>>>>> see large latency at kafka spout probably. Spout emits message .. it >>>>>>> stays >>>>>>> in queue for long time (that will add latency) .. and finally is >>>>>>> processed >>>>>>> and ack received. You will see queue time + processing time in kafka >>>>>>> spout >>>>>>> latency. >>>>>>> Take a look at load factors of your bolts - are they close to 1 or >>>>>>> more ? and load factor of kafka spout. >>>>>>> >>>>>>> On 4 February 2015 at 21:19, Andrey Yegorov < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> have you tried increasing max spout pending parameter for the spout? >>>>>>>> >>>>>>>> builder.setSpout("kafka", >>>>>>>> new KafkaSpout(spoutConfig), >>>>>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>>> //the maximum parallelism you can have on a KafkaSpout is >>>>>>>> the number of partitions >>>>>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >>>>>>>> >>>>>>>> ---------- >>>>>>>> Andrey Yegorov >>>>>>>> >>>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> In my topology, kafka spout is responsible for over 85% of the >>>>>>>>> latency. I have tried different spout max pending and played with the >>>>>>>>> buffer size and fetch size, still no luck. Any hint on how to >>>>>>>>> optimize the >>>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high >>>>>>>>> throughput with the simple kafka consumer. >>>>>>>>> >>>>>>>>> thank you for your feedback >>>>>>>>> Clay >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
