Re: How do you run a task in a separate thread in a Bolt?

Enno Shioji Sun, 08 May 2016 09:12:01 -0700

> Yes, I considered a threadpool, but the confusion was about where to
declare them, initiate a thread run and when to join the threads. Any code
samples or pseudocode that could help?

You would mark the thread pool transient and allocate/shutdown using the
lifecycle callback methods provided by Storm. If you mean Thread.join, you
shouldn't be using raw threads. There is really nothing special about using
threads in Storm so you should be able to refer to generic threading
documentation.

> Besides, there's this thread where a person advises
<https://mail-archives.apache.org/mod_mbox/storm-user/201311.mbox/%3CCAAYLz+pUZ44GNsNNJ9O5hjTr2rZLW=CKM=fgvcfwbnw613r...@mail.gmail.com%3E>
not
using a thread pool.

Storm comes with its own concurrency scheme so before using a raw thread
pool one should ask if it's really justified. Lots of computation
frameworks discourage use of your own thread pool because frequently the
problems are better solved by the concurrency mechanism provided by the
framework. In this particular case i.e. not wanting to block the Storm's
thread while you perform network IO is IMO a justifiable.

>  What exactly is the backpressure
<https://issues.apache.org/jira/browse/STORM-431> concept? Is it something
about having enough of bolts to process the tuples the spout emits so that
acks would be received by the spout on time?

It's a mechanism to avoid killing the topology by overloading it. Very
common example is OOME due to too many pending tasks.
In this particular case, all it means is that you should let Storm know
that the processing is falling behind by blocking the Storm's thread. E.g.
you'd submit tasks to your thread pool and if you see too many tasks being
queued up, you'd start blocking Storm's thread so that it knows not to (or
rather unable to) send more tuples until you have finished a few tasks and
have capacity again. The easiest way to do this is to use a Thread Pool
with a bounded queue + the Caller Runs Policy, to which you let Storm's
thread submit the task. That way, when the queue is full Storm's thread
will be blocked until the task it tried to submit is finished.

On Sun, May 8, 2016 at 1:56 PM, Navin Ipe <[email protected]>
wrote:

> Yes, I considered a threadpool, but the confusion was about where to
> declare them, initiate a thread run and when to join the threads. Any code
> samples or pseudocode that could help?
> Besides, there's this thread where a person advises
> <https://mail-archives.apache.org/mod_mbox/storm-user/201311.mbox/%3CCAAYLz+pUZ44GNsNNJ9O5hjTr2rZLW=CKM=fgvcfwbnw613r...@mail.gmail.com%3E>
> not using a threadpool.
>
> What exactly is the backpressure
> <https://issues.apache.org/jira/browse/STORM-431> concept? Is it
> something about having enough of bolts to process the tuples the spout
> emits so that acks would be received by the spout on time?
>
> On Sun, May 8, 2016 at 5:20 PM, Enno Shioji <[email protected]> wrote:
>
>> There's nothing that keeps you from simply having a thread pool in your
>> bolts. Or you could go for an async DB client.
>>
>> You will have to be careful about providing back pressure (e.g. by using
>> a bounded queue).
>>
>> On Sun, May 8, 2016 at 12:12 PM, Navin Ipe <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I've wanted to do this and this post confirms the idea:
>>> http://stackoverflow.com/a/36106683/453673
>>> But when I have a spout that constantly has nextTuple() being called by
>>> Storm and I have a bolt that constantly has execute() being called whenever
>>> it receives a tuple, how do I program the Spout to have a separate thread
>>> which reads from MongoDB or for the bolt to have a separate thread that
>>> writes to DB?
>>>
>>> If Storm is in complete charge of calling nextTuple() and execute(),
>>> then how do I start my own thread which does something? This is important,
>>> because I don't want my bolt to spend time writing to DB, when it should
>>> actually be busy receiving and processing hundreds of tuples.
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>
>>
>
>
> --
> Regards,
> Navin
>

Re: How do you run a task in a separate thread in a Bolt?

Reply via email to