Re: heartbeat defaults and profiling--potential performance bottleneck?

John Yost Sun, 10 Jan 2016 04:38:03 -0800

Hi Abhishek,

Good thoughts, thank you! i figured the heartbeat defaults would be
sufficient for the vast majority of Storm use cases, and it's very likely
that upping the heartbeat intervals won't have an effect. I agree with you
that it's worth trying, though, and I'll give it a short tomorrow and let
you know how it goes.

The topology is indeed under heavy load, with a general description as
follows:

200 workers, 20 KafkaSpout -> 1200 Bolt A -> 200 Bolt B -> 50 Bolt C where,
Bolt A is highly CPU-intensive bolt that processes incoming tuples from a
KafkaSpout, followed by a fan-out where each incoming tuple produces 20
tuples emitted from Bolt A.  Bolt B is a "fan-in" bolt that is sent tuples
via localOrShuffleGrouping, Bolt B emits via fieldsGrouping to Bolt C,
which persists the processed tuples to a NoSQL database. The fan-in bolt is
a concept I introduced to solve a severe slowdown that appears to result
from the combination of the dramatic fan in/fieldsGrouping step in the
original configuration of Bolt A directly to Bolt C. Note: I need to use
fieldsGrouping as we are organizing tuples based upon DB shard, so all
tuples for a given shard have to go to the same Bolt C executor.

With the available hardware we *barely* keep up, so I am hoping that
decreasing the heartbeat frequency will give us the CPU cycles needed to
get the throughput we need.

Again, many thanks for your time and thoughts--much appreciated.

--John

On Sat, Jan 9, 2016 at 12:52 PM, Abhishek Agarwal <[email protected]>
wrote:

> IMO one second of interval is not small enough for context switching to be
> a problem. You can try increasing the heartbeat interval and see if it
> affects the throughput. Though for it to be possible, the topology has to
> have a high load for sure.
>
> On Sat, Jan 9, 2016 at 8:59 PM, John Yost <[email protected]> wrote:
>
>> HI Abhishek,
>>
>> Good points, thanks!  I guess to phrase my question more precisely, if
>> there are Java threads dedicated to sending messages to zookeeper that are
>> constantly in a running mode, I am thinking that would lead to more context
>> switching for the Bolt and Spout threads actually doing work, which I am
>> thinking would decrease throughput. Do you think this a valid concern, or
>> am I worrying about something that's really not a big deal? :)
>>
>> Please confirm--thanks again for your thoughts!
>>
>> --John
>>
>> On Sat, Jan 9, 2016 at 9:59 AM, Abhishek Agarwal <[email protected]>
>> wrote:
>>
>>> Heartbeats are lightweight operations on zookeeper. In the topology, so
>>> long heartbeats are happening in a different thread, 1 second interval
>>> isn't much of a performance bottleneck.
>>>
>>> A contrasting example is zookeeper state update in kafka spout. Now
>>> there the interval is directly related to performance because it is a write
>>> operation (costly in zookeeper) and it happens in the same thread which
>>> generates the tuples.
>>>
>>>
>>>
>>> On Sat, Jan 9, 2016 at 6:11 PM, John Yost <[email protected]> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> During the course of profiling (actually sampling) a worker process
>>>> with jvisualvm  I noticed the executor and worker zk send threads are
>>>> always active. I then checked the heartbeat settings on my topology and
>>>> noted that the default executor heartbeat is 1 second.
>>>>
>>>> Question: is this a potential performance bottleneck? Would it make
>>>> sense to bump this up to 5 or 10 seconds?  Would like to see what y'all
>>>> think.
>>>>
>>>> Thanks
>>>>
>>>> --John
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Abhishek Agarwal
>>>
>>>
>>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>

Re: heartbeat defaults and profiling--potential performance bottleneck?

Reply via email to