Re: problem with shuffleGrouping

Kevin Peek Mon, 21 Nov 2016 09:48:00 -0800

I played around a little bit with Stephen's test and it seems that the
Collection.shuffle() call here is causing the problem (at least the problem
Stephen is talking about).
https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58


I created a ticket to address this uneven task distribution:
https://issues.apache.org/jira/browse/STORM-2210

On Mon, Nov 21, 2016 at 11:20 AM, Stephen Powis <[email protected]>
wrote:

> So we've seen some weird distributions using ShuffleGrouping as well.  I
> noticed there's no test case for ShuffleGrouping and got curious.  Also the
> implementation seemed overly complicated (in my head anyhow, perhaps
> there's a reason for it?) so I put together a much more simple version of
> round robin shuffling.
>
> Gist here: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde
>
> Its possible I've setup my test cases incorrectly, but it seems like when
> using multiple threads in my test ShuffleGrouping provides wildly un-even
> distribution?  In the Javadocs above each test case I've pasted the output
> that I get locally.
>
> Thoughts?
>
> On Sat, Nov 19, 2016 at 2:49 AM, Ohad Edelstein <[email protected]> wrote:
>
>> It happened to you also?
>> We are upgrading from 0.9.3 to 1.0.1,
>> In 0.9.3 we didn’t have that problem.
>>
>> But Ones I use localOrShuffle the messages are send only to the same
>> machine.
>>
>> From: Chien Le <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Saturday, 19 November 2016 at 6:05
>> To: "[email protected]" <[email protected]>
>> Subject: Re: Testing serializers with multiple workers
>>
>> Ohad,
>>
>>
>> We found that we had to use localOrShuffle grouping in order to see
>> activity in the same worker as the spout.
>>
>>
>> -Chien
>>
>>
>> ------------------------------
>> *From:* Ohad Edelstein <[email protected]>
>> *Sent:* Friday, November 18, 2016 8:38:35 AM
>> *To:* [email protected]
>> *Subject:* Re: Testing serializers with multiple workers
>>
>> Hello,
>>
>> We just finished setting up storm 1.0.1 with 3 supervisors and one nimbus
>> machine.
>> Total of 4 machines in aws.
>>
>> We see the following phanomenon:
>> lets say spout on host2,
>> host1 - using 100% cpu
>> host3 - using 100% cpu
>> host2 - idle (some message are being handled by it, not many)
>> its not slots problem, we have even amount of bolts.
>>
>> We also tried to deploy only 2 host, and the same thing happened, the
>> host with the spout is idle, the other host at 100% cpu.
>>
>> We switched from shuffleGrouping to noneGrouping, and its seems to work,
>> The documentation says that:
>> None grouping: This grouping specifies that you don't care how the stream
>> is grouped. Currently, none groupings are equivalent to shuffle groupings.
>> Eventually though, Storm will push down bolts with none groupings to
>> execute in the same thread as the bolt or spout they subscribe from (when
>> possible).
>>
>> We are still trying to understand what is wrong with shuffleGrouping in
>> our system,
>>
>> Any ideas?
>>
>> Thanks!
>>
>> From: Aaron Niskodé-Dossett <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Friday, 18 November 2016 at 17:04
>> To: "[email protected]" <[email protected]>
>> Subject: Re: Testing serializers with multiple workers
>>
>> Hit send too soon... that really is the option :-)
>>
>> On Fri, Nov 18, 2016 at 9:03 AM Aaron Niskodé-Dossett <[email protected]>
>> wrote:
>>
>>> topology.testing.always.try.serialize = true
>>>
>>> On Fri, Nov 18, 2016 at 8:57 AM Kristopher Kane <[email protected]>
>>> wrote:
>>>
>>> Does anyone have any techniques for testing serializers that would only
>>> surface when the serializer is uses in a multi-worker topology?
>>>
>>> Kris
>>>
>>>
>

Re: problem with shuffleGrouping

Reply via email to