I played around a little bit with Stephen's test and it seems that the Collection.shuffle() call here is causing the problem (at least the problem Stephen is talking about). https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58
I created a ticket to address this uneven task distribution: https://issues.apache.org/jira/browse/STORM-2210 On Mon, Nov 21, 2016 at 11:20 AM, Stephen Powis <[email protected]> wrote: > So we've seen some weird distributions using ShuffleGrouping as well. I > noticed there's no test case for ShuffleGrouping and got curious. Also the > implementation seemed overly complicated (in my head anyhow, perhaps > there's a reason for it?) so I put together a much more simple version of > round robin shuffling. > > Gist here: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde > > Its possible I've setup my test cases incorrectly, but it seems like when > using multiple threads in my test ShuffleGrouping provides wildly un-even > distribution? In the Javadocs above each test case I've pasted the output > that I get locally. > > Thoughts? > > On Sat, Nov 19, 2016 at 2:49 AM, Ohad Edelstein <[email protected]> wrote: > >> It happened to you also? >> We are upgrading from 0.9.3 to 1.0.1, >> In 0.9.3 we didn’t have that problem. >> >> But Ones I use localOrShuffle the messages are send only to the same >> machine. >> >> From: Chien Le <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Saturday, 19 November 2016 at 6:05 >> To: "[email protected]" <[email protected]> >> Subject: Re: Testing serializers with multiple workers >> >> Ohad, >> >> >> We found that we had to use localOrShuffle grouping in order to see >> activity in the same worker as the spout. >> >> >> -Chien >> >> >> ------------------------------ >> *From:* Ohad Edelstein <[email protected]> >> *Sent:* Friday, November 18, 2016 8:38:35 AM >> *To:* [email protected] >> *Subject:* Re: Testing serializers with multiple workers >> >> Hello, >> >> We just finished setting up storm 1.0.1 with 3 supervisors and one nimbus >> machine. >> Total of 4 machines in aws. >> >> We see the following phanomenon: >> lets say spout on host2, >> host1 - using 100% cpu >> host3 - using 100% cpu >> host2 - idle (some message are being handled by it, not many) >> its not slots problem, we have even amount of bolts. >> >> We also tried to deploy only 2 host, and the same thing happened, the >> host with the spout is idle, the other host at 100% cpu. >> >> We switched from shuffleGrouping to noneGrouping, and its seems to work, >> The documentation says that: >> None grouping: This grouping specifies that you don't care how the stream >> is grouped. Currently, none groupings are equivalent to shuffle groupings. >> Eventually though, Storm will push down bolts with none groupings to >> execute in the same thread as the bolt or spout they subscribe from (when >> possible). >> >> We are still trying to understand what is wrong with shuffleGrouping in >> our system, >> >> Any ideas? >> >> Thanks! >> >> From: Aaron Niskodé-Dossett <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Friday, 18 November 2016 at 17:04 >> To: "[email protected]" <[email protected]> >> Subject: Re: Testing serializers with multiple workers >> >> Hit send too soon... that really is the option :-) >> >> On Fri, Nov 18, 2016 at 9:03 AM Aaron Niskodé-Dossett <[email protected]> >> wrote: >> >>> topology.testing.always.try.serialize = true >>> >>> On Fri, Nov 18, 2016 at 8:57 AM Kristopher Kane <[email protected]> >>> wrote: >>> >>> Does anyone have any techniques for testing serializers that would only >>> surface when the serializer is uses in a multi-worker topology? >>> >>> Kris >>> >>> >
