Hi Abhishek, Good thoughts, thank you! i figured the heartbeat defaults would be sufficient for the vast majority of Storm use cases, and it's very likely that upping the heartbeat intervals won't have an effect. I agree with you that it's worth trying, though, and I'll give it a short tomorrow and let you know how it goes.
The topology is indeed under heavy load, with a general description as follows: 200 workers, 20 KafkaSpout -> 1200 Bolt A -> 200 Bolt B -> 50 Bolt C where, Bolt A is highly CPU-intensive bolt that processes incoming tuples from a KafkaSpout, followed by a fan-out where each incoming tuple produces 20 tuples emitted from Bolt A. Bolt B is a "fan-in" bolt that is sent tuples via localOrShuffleGrouping, Bolt B emits via fieldsGrouping to Bolt C, which persists the processed tuples to a NoSQL database. The fan-in bolt is a concept I introduced to solve a severe slowdown that appears to result from the combination of the dramatic fan in/fieldsGrouping step in the original configuration of Bolt A directly to Bolt C. Note: I need to use fieldsGrouping as we are organizing tuples based upon DB shard, so all tuples for a given shard have to go to the same Bolt C executor. With the available hardware we *barely* keep up, so I am hoping that decreasing the heartbeat frequency will give us the CPU cycles needed to get the throughput we need. Again, many thanks for your time and thoughts--much appreciated. --John On Sat, Jan 9, 2016 at 12:52 PM, Abhishek Agarwal <[email protected]> wrote: > IMO one second of interval is not small enough for context switching to be > a problem. You can try increasing the heartbeat interval and see if it > affects the throughput. Though for it to be possible, the topology has to > have a high load for sure. > > On Sat, Jan 9, 2016 at 8:59 PM, John Yost <[email protected]> wrote: > >> HI Abhishek, >> >> Good points, thanks! I guess to phrase my question more precisely, if >> there are Java threads dedicated to sending messages to zookeeper that are >> constantly in a running mode, I am thinking that would lead to more context >> switching for the Bolt and Spout threads actually doing work, which I am >> thinking would decrease throughput. Do you think this a valid concern, or >> am I worrying about something that's really not a big deal? :) >> >> Please confirm--thanks again for your thoughts! >> >> --John >> >> On Sat, Jan 9, 2016 at 9:59 AM, Abhishek Agarwal <[email protected]> >> wrote: >> >>> Heartbeats are lightweight operations on zookeeper. In the topology, so >>> long heartbeats are happening in a different thread, 1 second interval >>> isn't much of a performance bottleneck. >>> >>> A contrasting example is zookeeper state update in kafka spout. Now >>> there the interval is directly related to performance because it is a write >>> operation (costly in zookeeper) and it happens in the same thread which >>> generates the tuples. >>> >>> >>> >>> On Sat, Jan 9, 2016 at 6:11 PM, John Yost <[email protected]> wrote: >>> >>>> Hi Everyone, >>>> >>>> During the course of profiling (actually sampling) a worker process >>>> with jvisualvm I noticed the executor and worker zk send threads are >>>> always active. I then checked the heartbeat settings on my topology and >>>> noted that the default executor heartbeat is 1 second. >>>> >>>> Question: is this a potential performance bottleneck? Would it make >>>> sense to bump this up to 5 or 10 seconds? Would like to see what y'all >>>> think. >>>> >>>> Thanks >>>> >>>> --John >>>> >>> >>> >>> >>> -- >>> Regards, >>> Abhishek Agarwal >>> >>> >> > > > -- > Regards, > Abhishek Agarwal > >
