Would like to second Stig’s suggestion of doing fetches in nextTuple. Spouts and bolts are already running async of each other. The computing model is designed so you don’t have to do your own threading and synchronization. Currently, your background thread has effectively become the virtual spout that is fetching data and emitting to the actual storm spout (mostly via your own a synchronized queue). The spout is then merely draining that queue and forwarding it. Therefore the spout is operating like a forwarding bolt. Native programming model is simpler and nevertheless efficient way. Consider doing bulk fetches (with timeouts) as Stig suggested, whenever needed in the spout. You can hold them in an simple (i.e. not synchronized) buffer. In nextTuple, the spout can first drain some or all from the buffer. After emitting, if buffer is empty, it can reload the buffer before exiting nextTuple. That way spout is reloading while the downstream bolts are busy working through the just generated backlog. Roshan
Sent from Yahoo Mail for iPhone On Saturday, March 2, 2019, 1:40 AM, Stig Rohde Døssing <[email protected]> wrote: 1. No, the worker heartbeats to Nimbus in a thread independent of the executors. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L187 2. This will most likely not work. As you can see at https://github.com/apache/storm/blob/901f4aa81ccb9596ea46dac23bd7d82aba2cfaa2/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java#L194, the spout executor checks the emit count before and after calling nextTuple to decide whether to wait. You could pass the messages to the spout thread via a concurrent queue and emit them during nextTuple if you like. Another option would be doing the fetching in nextTuple, and capping how long you're willing to block while waiting for new messages. Blocking in nextTuple is okay, as long as the block is short. Den fre. 1. mar. 2019 kl. 13.14 skrev Jayant Sharma <[email protected]>: Hi, We are using Storm in our production environment and we recently noticed that most of the CPU time is being used by Spout threads. I have few questions regarding this: 1 - Can spout threads taking most of the cpu time cause delay in heartbeats sent to nimbus, we noticed our workers getting relaunched by nimbus frequently. 2 - To make spouts unblocking, we spawn a hystrix thread from our spouts which take care of fetching the messages and then emitting the tuple using spout's collector object. Thus our spouts do not do anything other than creating new hystrix thread. To reduce spout's cpu usage we want to increase 'topology.sleep.spout.wait.strategy.time.ms' . My understanding is that this is the duration spout executor will sleep for if we do not emit anything in nextTuple call, but, in the design which we are using, will all the nextTuple calls to spout be considered as "nextTuple did not emit anything" and every further nextTuple call will be delayed by above property's value? Thanks,Jayant Sharma
