Stephen, can you describe what the distribution that you are seeing? What we see is as follow: In storm version 0.9.3 we see that the machine with the spout works harder ( get almost twice the work then other machines )
In storm version 1.0.1 the machine with the spout actually gets work to do ( with the shuffleGrouping we only see a few task submitted to the bolts ). Again according to the documentation noneGrouping should works the same as shuffleGrouping, But we see that noneGrouping distributes the result better. I didn’t find any complaints on the web to that, so I guess that this issue has something to do with what we do. From: Kevin Peek <kp...@salesforce.com<mailto:kp...@salesforce.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Monday, 21 November 2016 at 19:47 To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Subject: Re: problem with shuffleGrouping I played around a little bit with Stephen's test and it seems that the Collection.shuffle() call here is causing the problem (at least the problem Stephen is talking about). https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58 I created a ticket to address this uneven task distribution: https://issues.apache.org/jira/browse/STORM-2210 On Mon, Nov 21, 2016 at 11:20 AM, Stephen Powis <spo...@salesforce.com<mailto:spo...@salesforce.com>> wrote: So we've seen some weird distributions using ShuffleGrouping as well. I noticed there's no test case for ShuffleGrouping and got curious. Also the implementation seemed overly complicated (in my head anyhow, perhaps there's a reason for it?) so I put together a much more simple version of round robin shuffling. Gist here: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde Its possible I've setup my test cases incorrectly, but it seems like when using multiple threads in my test ShuffleGrouping provides wildly un-even distribution? In the Javadocs above each test case I've pasted the output that I get locally. Thoughts? On Sat, Nov 19, 2016 at 2:49 AM, Ohad Edelstein <oh...@mintigo.com<mailto:oh...@mintigo.com>> wrote: It happened to you also? We are upgrading from 0.9.3 to 1.0.1, In 0.9.3 we didn’t have that problem. But Ones I use localOrShuffle the messages are send only to the same machine. From: Chien Le <chien...@ds-iq.com<mailto:chien...@ds-iq.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Saturday, 19 November 2016 at 6:05 To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Subject: Re: Testing serializers with multiple workers Ohad, We found that we had to use localOrShuffle grouping in order to see activity in the same worker as the spout. -Chien ________________________________ From: Ohad Edelstein <oh...@mintigo.com<mailto:oh...@mintigo.com>> Sent: Friday, November 18, 2016 8:38:35 AM To: user@storm.apache.org<mailto:user@storm.apache.org> Subject: Re: Testing serializers with multiple workers Hello, We just finished setting up storm 1.0.1 with 3 supervisors and one nimbus machine. Total of 4 machines in aws. We see the following phanomenon: lets say spout on host2, host1 - using 100% cpu host3 - using 100% cpu host2 - idle (some message are being handled by it, not many) its not slots problem, we have even amount of bolts. We also tried to deploy only 2 host, and the same thing happened, the host with the spout is idle, the other host at 100% cpu. We switched from shuffleGrouping to noneGrouping, and its seems to work, The documentation says that: None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible). We are still trying to understand what is wrong with shuffleGrouping in our system, Any ideas? Thanks! From: Aaron Niskodé-Dossett <doss...@gmail.com<mailto:doss...@gmail.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Friday, 18 November 2016 at 17:04 To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Subject: Re: Testing serializers with multiple workers Hit send too soon... that really is the option :-) On Fri, Nov 18, 2016 at 9:03 AM Aaron Niskodé-Dossett <doss...@gmail.com<mailto:doss...@gmail.com>> wrote: topology.testing.always.try.se<http://topology.testing.always.try.se>rialize = true On Fri, Nov 18, 2016 at 8:57 AM Kristopher Kane <kkane.l...@gmail.com<mailto:kkane.l...@gmail.com>> wrote: Does anyone have any techniques for testing serializers that would only surface when the serializer is uses in a multi-worker topology? Kris