I agree that shuffleGrouping will probably fix this problem for you.  How
much care is required when using localOrShuffleGrouping depends on how many
bolts / spouts / workers you have, but yes in general I would agree that
care should be taken.  In general if the upstream component has fewer
executors than your number of workers, it's better to use shuffle,
otherwise localOrShuffle will probably give you better performance.

On Fri, Nov 14, 2014 at 4:16 PM, Luke Rohde <[email protected]> wrote:

> Thanks, so if I read you correctly this immediate problem should be
> alleviated by using shuffleGrouping on the terminal bolt.
> In general, though, it sounds like care should be taken with
> localOrShuffle to avoid this sort of scenario.
>
> On Fri Nov 14 2014 at 4:08:57 PM Nathan Leung <[email protected]> wrote:
>
>> if you have 2 workers, and 1 bolt with 1 executor that feeds into the
>> terminal bolt, with the terminal bolt subscribing using
>> localOrShuffleGrouping, then the upstream bolt will send all of its tuples
>> to the terminal bolts in the same worker process (due to
>> localOrShuffleGrouping) and the other half of the terminal bolts will be
>> idle.  Without knowing more details it's hard to say if this is what you're
>> seeing.
>>
>> On Fri, Nov 14, 2014 at 4:02 PM, Luke Rohde <[email protected]> wrote:
>>
>>> Hi, I have a topology that’s bottlenecked right now by a terminal bolt
>>> that’s writing small batches to an endpoint. I’ve increased the number of
>>> executors several times so that it’s no longer bottlenecked there, but I
>>> still notice when there’s a traffic spike that despite capacity hovering
>>> around 1.0, probably half of the executors are idle.
>>>
>>> Can anyone give insight as to why this might be? I’ve read the docs on
>>> storm parallelism and can’t understand why this is happening. FWIW, all of
>>> the non-fieldsGrouping bolts are declared using localOrShuffleGrouping -
>>> perhaps this has something to do with it? I have a feeling that this is the
>>> core of the problem, but it’s not clear to me why exactly you wouldn’t use
>>> localOrShuffle over Shuffle.
>>>
>>> Thanks, Luke
>>>
>>
>>

Reply via email to