> Yes, I considered a threadpool, but the confusion was about where to declare them, initiate a thread run and when to join the threads. Any code samples or pseudocode that could help?
You would mark the thread pool transient and allocate/shutdown using the lifecycle callback methods provided by Storm. If you mean Thread.join, you shouldn't be using raw threads. There is really nothing special about using threads in Storm so you should be able to refer to generic threading documentation. > Besides, there's this thread where a person advises <https://mail-archives.apache.org/mod_mbox/storm-user/201311.mbox/%3CCAAYLz+pUZ44GNsNNJ9O5hjTr2rZLW=CKM=fgvcfwbnw613r...@mail.gmail.com%3E> not using a thread pool. Storm comes with its own concurrency scheme so before using a raw thread pool one should ask if it's really justified. Lots of computation frameworks discourage use of your own thread pool because frequently the problems are better solved by the concurrency mechanism provided by the framework. In this particular case i.e. not wanting to block the Storm's thread while you perform network IO is IMO a justifiable. > What exactly is the backpressure <https://issues.apache.org/jira/browse/STORM-431> concept? Is it something about having enough of bolts to process the tuples the spout emits so that acks would be received by the spout on time? It's a mechanism to avoid killing the topology by overloading it. Very common example is OOME due to too many pending tasks. In this particular case, all it means is that you should let Storm know that the processing is falling behind by blocking the Storm's thread. E.g. you'd submit tasks to your thread pool and if you see too many tasks being queued up, you'd start blocking Storm's thread so that it knows not to (or rather unable to) send more tuples until you have finished a few tasks and have capacity again. The easiest way to do this is to use a Thread Pool with a bounded queue + the Caller Runs Policy, to which you let Storm's thread submit the task. That way, when the queue is full Storm's thread will be blocked until the task it tried to submit is finished. On Sun, May 8, 2016 at 1:56 PM, Navin Ipe <[email protected]> wrote: > Yes, I considered a threadpool, but the confusion was about where to > declare them, initiate a thread run and when to join the threads. Any code > samples or pseudocode that could help? > Besides, there's this thread where a person advises > <https://mail-archives.apache.org/mod_mbox/storm-user/201311.mbox/%3CCAAYLz+pUZ44GNsNNJ9O5hjTr2rZLW=CKM=fgvcfwbnw613r...@mail.gmail.com%3E> > not using a threadpool. > > What exactly is the backpressure > <https://issues.apache.org/jira/browse/STORM-431> concept? Is it > something about having enough of bolts to process the tuples the spout > emits so that acks would be received by the spout on time? > > On Sun, May 8, 2016 at 5:20 PM, Enno Shioji <[email protected]> wrote: > >> There's nothing that keeps you from simply having a thread pool in your >> bolts. Or you could go for an async DB client. >> >> You will have to be careful about providing back pressure (e.g. by using >> a bounded queue). >> >> On Sun, May 8, 2016 at 12:12 PM, Navin Ipe < >> [email protected]> wrote: >> >>> Hi, >>> >>> I've wanted to do this and this post confirms the idea: >>> http://stackoverflow.com/a/36106683/453673 >>> But when I have a spout that constantly has nextTuple() being called by >>> Storm and I have a bolt that constantly has execute() being called whenever >>> it receives a tuple, how do I program the Spout to have a separate thread >>> which reads from MongoDB or for the bolt to have a separate thread that >>> writes to DB? >>> >>> If Storm is in complete charge of calling nextTuple() and execute(), >>> then how do I start my own thread which does something? This is important, >>> because I don't want my bolt to spend time writing to DB, when it should >>> actually be busy receiving and processing hundreds of tuples. >>> >>> -- >>> Regards, >>> Navin >>> >> >> > > > -- > Regards, > Navin >
