Re: Distribute Spout output among all bolts

Michael Rose Wed, 16 Jul 2014 16:58:28 -0700

It doesn't say so, but if you have 4 workers, the 4 executors will be
shared evenly over the 4 workers. Likewise, 16 will partition 4 each. The
only case where a worker will not get a specific executor is when there are
less executors than workers (e.g. 8 workers, 4 executors), 4 of the workers
will receive an executor but the others will not.


It sounds like for your case, shuffle+parallelism is more than sufficient.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
[email protected]


On Wed, Jul 16, 2014 at 5:53 PM, Andrew Xor <[email protected]>
wrote:

> Hey Stephen, Michael,
>
>  Yea I feared as much... as searching the docs and API did not surface any
> reliable and elegant way of doing that unless you had a "RouterBolt". If
> setting the parallelism of a component is enough for load balancing the
> processes across different machines that are part of the Storm cluster then
> this would suffice in my use case. Although here
> <https://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html>
> the documentation says executors are threads and it does not explicitly say
> anywhere that threads are spawned across different nodes of the cluster...
> I want to avoid the possibility of these threads only spawning locally and
> not in a distributed fashion among the cluster nodes..
>
> Andrew.
>
>
> On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose <[email protected]>
> wrote:
>
>> Maybe we can help with your topology design if you let us know what
>> you're doing that requires you to shuffle half of the whole stream output
>> to each of the two different types of bolts.
>>
>> If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two
>> different types) as above, there's no point to doing this. Setting the
>> parallelism will make sure that data is partitioned across machines (by
>> default, setting parallelism sets tasks = executors = parallelism).
>>
>> Unfortunately, I don't know of any way to do this other than shuffling
>> the output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt
>> b0 round-robin the received tuples between two streams, then have b1 and b2
>> shuffle over those streams instead.
>>
>>
>>
>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>> [email protected]
>>
>>
>> On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor <[email protected]>
>> wrote:
>>
>>> 
>>> Hi Tomas,
>>>
>>>  As I said in my previous mail the grouping is for a bolt *task* not for
>>> the actual number of spawned bolts; for example let's say you have two
>>> bolts that have a parallelism hint of 3 and these two bolts are wired to
>>> the same spout. If you set the bolts as such:
>>>
>>> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint
>>> */).shuffleGrouping("spout1");
>>> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint
>>> */).shuffleGrouping("spout1");
>>>
>>> Then each of the tasks will receive half of the spout tuples but each
>>> actual spawned bolt will receive all of the tuples emitted from the spout.
>>> This is more evident if you set up a counter in the bolt counting how many
>>> tuples if has received and testing this with no parallelism hint as such:
>>>
>>> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1");
>>> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1");
>>>
>>> Now you will see that both bolts will receive all tuples emitted by
>>> spout1.
>>>
>>> Hope this helps.
>>>
>>> 
>>> Andrew.
>>>
>>>
>>> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <[email protected]>
>>> wrote:
>>>
>>>> Andrew,
>>>>
>>>> when you connect your bolt to your spout you specify the grouping. If
>>>> you use shuffle grouping then any free bolt gets the tuple - in my
>>>> experience even in lightly loaded topologies the distribution amongst bolts
>>>> is pretty even. If you use all grouping then all bolts receive a copy of
>>>> the tuple.
>>>> Use shuffle grouping and each of your bolts will get about 1/3 of the
>>>> workload.
>>>>
>>>> Tomas
>>>>
>>>>
>>>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor <
>>>> [email protected]> wrote:
>>>>
>>>>> H
>>>>> i,
>>>>>
>>>>>  I am trying to distribute the spout output to it's subscribed bolts
>>>>> evenly; let's say that I have a spout that emits tuples and three bolts
>>>>> that are subscribed to it. I want each of the three bolts to receive 1/3
>>>>> rth of the output (or emit a tuple to each one of these bolts in turns).
>>>>> Unfortunately as far as I understand all bolts will receive all of the
>>>>> emitted tuples of that particular spout regardless of the grouping defined
>>>>> (as grouping from my understanding is for bolt *tasks* not actual bolts).
>>>>>
>>>>>  I've searched a bit and I can't seem to find a way to accomplish
>>>>> that... is there a way to do that or I am searching in vain?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Tomas Mazukna
>>>> 678-557-3834
>>>>
>>>
>>>
>>
>

Re: Distribute Spout output among all bolts

Reply via email to