I was partly right. In earlier versions (pre-1.0) was not thread safe (due to shuffle grouping). 1.0 introduced a new shuffle grouping implementation that is thread-safe, in addition to a load-aware shuffle grouping. I had missed the fact that the new shuffle grouping is thread safe.
-Taylor > On Apr 28, 2016, at 1:48 PM, Steven Lewis <[email protected]> wrote: > > We tried implementing our own thread safe output collector but it seems > ridiculous for such a concurrent system. Why don’t they implement it in core > Storm? They can have the current one, and then build a separate one called > Concurrent_OutputCollector or some such. I was hoping they fixed that in > version 1.0 and I missed it. > > From: Stephen Powis <[email protected] <mailto:[email protected]>> > Reply-To: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Date: Thursday, April 28, 2016 at 10:58 AM > To: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: thread safe output collector > > So the Spout documentation (assuming its correct...) here > (http://storm.apache.org/releases/current/Concepts.html#spouts > <http://storm.apache.org/releases/current/Concepts.html#spouts>) mentions > this: > > "The main method on spouts is nextTuple. nextTuple either emits a new tuple > into the topology or simply returns if there are no new tuples to emit. It is > imperative that nextTuple does not block for any spout implementation, > because Storm calls all the spout methods on the same thread." > > When developing a custom spout we interpreted it to mean that any "real work" > done by a spout should be done in a separate thread, and decided on the > following pattern which seems some what relevant to what you are trying to do > in your bolts. > > On Spout prepare, we create a concurrent/thread safe queue. We then create a > new Thread passing it a reference to our thread safe queue. This thread > handles finding new data that needs to be emitted. When that thread finds > data, it adds it to the shared queue. When the spout's nextTuple() method is > called, it looks for data on the shared queue and emits it. > > I imagine doing async processing in a bolt using one or more threads could > work with a similar pattern. On prepare you setup your thread(s) with > references to a shared queue. The bolt passes work to be completed to the > thread(s), the thread(s) communicate back to the bolt the result via a shared > queue. Add in the concept of tick tuples to ensure your bolt checks for > completed work on a regular basis? > > Is there a better way to do this? > > On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche > <[email protected] <mailto:[email protected]>> wrote: > Thanks for the clarification > > On 28 April 2016 at 15:12, P. Taylor Goetz <[email protected] > <mailto:[email protected]>> wrote: > The documentation is wrong. See: > > https://issues.apache.org/jira/browse/STORM-841 > <https://issues.apache.org/jira/browse/STORM-841> > > At some point it looks like the change made there got reverted. I will reopen > it to make sure the documentation is corrected. > > OutputCollector is NOT thread-safe. > > -Taylor > >> On Apr 28, 2016, at 9:06 AM, Stephen Powis <[email protected] >> <mailto:[email protected]>> wrote: >> >> "Its perfectly fine to launch new threads in bolts that do processing >> asynchronously. OutputCollector >> <http://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html> >> is thread-safe and can be called at any time." >> >> >> From the docs for 0.9.6: >> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts >> <http://storm.apache.org/releases/0.9.6/Concepts.html#bolts> >> >> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz <[email protected] >> <mailto:[email protected]>> wrote: >> IIRC there was discussion about making it thread safe, but I don't believe >> it was implemented. >> >> -Taylor >> >> On Apr 28, 2016, at 3:52 AM, Julien Nioche <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Hi Stephen >>> >>> I asked the same question in February but did not get a reply >>> >>> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3cca+-fm0urpf3fuerozywpzmxu-kdbgf-zj3wbyr8evsaqjc6...@mail.gmail.com%3E >>> >>> <https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3cca+-fm0urpf3fuerozywpzmxu-kdbgf-zj3wbyr8evsaqjc6...@mail.gmail.com%3E> >>> >>> Anyone who could confirm this? >>> >>> Thanks >>> >>> On 27 April 2016 at 14:05, Steven Lewis <[email protected] >>> <mailto:[email protected]>> wrote: >>> I have conflicting information, and have not checked personally but has the >>> output collector finally been made thread safe for emitting in version 1.0 >>> or 0.10? I know it was a huge problem in 0.9.5 when trying to do threading >>> in a bolt for async future calls and emitting once it returns. >>> >>> This email and any files transmitted with it are confidential and intended >>> solely for the individual or entity to whom they are addressed. If you have >>> received this email in error destroy it immediately. *** Walmart >>> Confidential *** >>> >>> >>> >>> -- >>> >>> Open Source Solutions for Text Engineering >>> >>> http://www.digitalpebble.com <http://www.digitalpebble.com/> >>> http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/> >>> #digitalpebble <http://twitter.com/digitalpebble> >> > > > > > -- > > Open Source Solutions for Text Engineering > > http://www.digitalpebble.com <http://www.digitalpebble.com/> > http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/> > #digitalpebble <http://twitter.com/digitalpebble> > > This email and any files transmitted with it are confidential and intended > solely for the individual or entity to whom they are addressed. If you have > received this email in error destroy it immediately. *** Walmart Confidential > ***
signature.asc
Description: Message signed with OpenPGP using GPGMail
