Re: Spouts hogging a lot of CPU

Roshan Naik Sat, 02 Mar 2019 02:33:57 -0800

Would like to second Stig’s suggestion of doing fetches in nextTuple. 
Spouts and bolts are already running async of each other. The computing model 
is designed so you don’t have to do your own threading and synchronization. 
Currently, your background thread has effectively become the virtual spout that 
is fetching data and emitting to the actual storm spout (mostly via your own a 
synchronized queue). The spout is then merely draining that queue and 
forwarding it.  Therefore the spout is operating like a forwarding bolt. 
Native programming model is simpler and nevertheless efficient way.  Consider 
doing  bulk fetches (with timeouts) as Stig suggested, whenever needed in the 
spout. You can hold them in an simple (i.e. not synchronized) buffer.  
In nextTuple, the spout can first drain some or all from the buffer. After 
emitting, if buffer is empty, it can reload the buffer before exiting 
nextTuple. That way spout is reloading while the downstream bolts are busy 
working through the just generated backlog. 
Roshan

Sent from Yahoo Mail for iPhone

On Saturday, March 2, 2019, 1:40 AM, Stig Rohde Døssing
<[email protected]> wrote:

1. No, the worker heartbeats to Nimbus in a thread independent of the
executors.
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L187

2. This will most likely not work. As you can see at
https://github.com/apache/storm/blob/901f4aa81ccb9596ea46dac23bd7d82aba2cfaa2/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java#L194,
the spout executor checks the emit count before and after calling nextTuple to
decide whether to wait. You could pass the messages to the spout thread via a
concurrent queue and emit them during nextTuple if you like. Another option
would be doing the fetching in nextTuple, and capping how long you're willing
to block while waiting for new messages. Blocking in nextTuple is okay, as long
as the block is short.

Den fre. 1. mar. 2019 kl. 13.14 skrev Jayant Sharma <[email protected]>:

Hi,

We are using Storm in our production environment and we recently noticed that
most of the CPU time is being used by Spout threads. I have few questions
regarding this:

1 - Can spout threads taking most of the cpu time cause delay in heartbeats
sent to nimbus, we noticed our workers getting relaunched by nimbus frequently.

2 - To make spouts unblocking, we spawn a hystrix thread from our spouts which
take care of fetching the messages and then emitting the tuple using spout's
collector object. Thus our spouts do not do anything other than creating new
hystrix thread. To reduce spout's cpu usage we want to increase
'topology.sleep.spout.wait.strategy.time.ms' . My understanding is that this is
the duration spout executor will sleep for if we do not emit anything in
nextTuple call, but, in the design which we are using, will all the nextTuple
calls to spout be considered as "nextTuple did not emit anything" and every
further nextTuple call will be delayed by above property's value?

Thanks,Jayant Sharma

Re: Spouts hogging a lot of CPU

Reply via email to