Re: problem with shuffleGrouping

Ohad Edelstein Mon, 21 Nov 2016 12:01:11 -0800

Stephen, can you describe what the distribution that you are seeing?

What we see is as follow:
In storm version 0.9.3 we see that the machine with the spout works harder ( 
get almost twice the work then other machines )


In storm version 1.0.1 the machine with the spout actually gets work to do ( 
with the shuffleGrouping we only see a few task submitted to the bolts ).
Again according to the documentation noneGrouping should works the same as 
shuffleGrouping, But we see that noneGrouping distributes the result better.

I didn’t find any complaints on the web to that, so I guess that this issue has 
something to do with what we do.


From: Kevin Peek <kp...@salesforce.com<mailto:kp...@salesforce.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Monday, 21 November 2016 at 19:47
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: problem with shuffleGrouping

I played around a little bit with Stephen's test and it seems that the 
Collection.shuffle() call here is causing the problem (at least the problem 
Stephen is talking about). 
https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58

I created a ticket to address this uneven task distribution: 
https://issues.apache.org/jira/browse/STORM-2210

On Mon, Nov 21, 2016 at 11:20 AM, Stephen Powis 
<spo...@salesforce.com<mailto:spo...@salesforce.com>> wrote:
So we've seen some weird distributions using ShuffleGrouping as well.  I 
noticed there's no test case for ShuffleGrouping and got curious.  Also the 
implementation seemed overly complicated (in my head anyhow, perhaps there's a 
reason for it?) so I put together a much more simple version of round robin 
shuffling.

Gist here: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde

Its possible I've setup my test cases incorrectly, but it seems like when using 
multiple threads in my test ShuffleGrouping provides wildly un-even 
distribution?  In the Javadocs above each test case I've pasted the output that 
I get locally.

Thoughts?

On Sat, Nov 19, 2016 at 2:49 AM, Ohad Edelstein 
<oh...@mintigo.com<mailto:oh...@mintigo.com>> wrote:
It happened to you also?
We are upgrading from 0.9.3 to 1.0.1,
In 0.9.3 we didn’t have that problem.

But Ones I use localOrShuffle the messages are send only to the same machine.

From: Chien Le <chien...@ds-iq.com<mailto:chien...@ds-iq.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Saturday, 19 November 2016 at 6:05
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: Testing serializers with multiple workers


Ohad,


We found that we had to use localOrShuffle grouping in order to see activity in 
the same worker as the spout.


-Chien


________________________________
From: Ohad Edelstein <oh...@mintigo.com<mailto:oh...@mintigo.com>>
Sent: Friday, November 18, 2016 8:38:35 AM
To: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: Re: Testing serializers with multiple workers

Hello,

We just finished setting up storm 1.0.1 with 3 supervisors and one nimbus 
machine.
Total of 4 machines in aws.

We see the following phanomenon:
lets say spout on host2,
host1 - using 100% cpu
host3 - using 100% cpu
host2 - idle (some message are being handled by it, not many)
its not slots problem, we have even amount of bolts.

We also tried to deploy only 2 host, and the same thing happened, the host with 
the spout is idle, the other host at 100% cpu.

We switched from shuffleGrouping to noneGrouping, and its seems to work,
The documentation says that:
None grouping: This grouping specifies that you don't care how the stream is 
grouped. Currently, none groupings are equivalent to shuffle groupings. 
Eventually though, Storm will push down bolts with none groupings to execute in 
the same thread as the bolt or spout they subscribe from (when possible).

We are still trying to understand what is wrong with shuffleGrouping in our 
system,

Any ideas?

Thanks!

From: Aaron Niskodé-Dossett <doss...@gmail.com<mailto:doss...@gmail.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Friday, 18 November 2016 at 17:04
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: Testing serializers with multiple workers

Hit send too soon... that really is the option :-)

On Fri, Nov 18, 2016 at 9:03 AM Aaron Niskodé-Dossett 
<doss...@gmail.com<mailto:doss...@gmail.com>> wrote:
topology.testing.always.try.se<http://topology.testing.always.try.se>rialize = 
true

On Fri, Nov 18, 2016 at 8:57 AM Kristopher Kane 
<kkane.l...@gmail.com<mailto:kkane.l...@gmail.com>> wrote:
Does anyone have any techniques for testing serializers that would only surface 
when the serializer is uses in a multi-worker topology?

Kris

Re: problem with shuffleGrouping

Reply via email to