Maybe we can help with your topology design if you let us know what you're doing that requires you to shuffle half of the whole stream output to each of the two different types of bolts.
If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two different types) as above, there's no point to doing this. Setting the parallelism will make sure that data is partitioned across machines (by default, setting parallelism sets tasks = executors = parallelism). Unfortunately, I don't know of any way to do this other than shuffling the output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt b0 round-robin the received tuples between two streams, then have b1 and b2 shuffle over those streams instead. Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> [email protected] On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor <[email protected]> wrote: > > Hi Tomas, > > As I said in my previous mail the grouping is for a bolt *task* not for > the actual number of spawned bolts; for example let's say you have two > bolts that have a parallelism hint of 3 and these two bolts are wired to > the same spout. If you set the bolts as such: > > tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint > */).shuffleGrouping("spout1"); > tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint > */).shuffleGrouping("spout1"); > > Then each of the tasks will receive half of the spout tuples but each > actual spawned bolt will receive all of the tuples emitted from the spout. > This is more evident if you set up a counter in the bolt counting how many > tuples if has received and testing this with no parallelism hint as such: > > tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1"); > tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1"); > > Now you will see that both bolts will receive all tuples emitted by > spout1. > > Hope this helps. > > > Andrew. > > > On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <[email protected]> > wrote: > >> Andrew, >> >> when you connect your bolt to your spout you specify the grouping. If you >> use shuffle grouping then any free bolt gets the tuple - in my experience >> even in lightly loaded topologies the distribution amongst bolts is pretty >> even. If you use all grouping then all bolts receive a copy of the tuple. >> Use shuffle grouping and each of your bolts will get about 1/3 of the >> workload. >> >> Tomas >> >> >> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor <[email protected]> >> wrote: >> >>> H >>> i, >>> >>> I am trying to distribute the spout output to it's subscribed bolts >>> evenly; let's say that I have a spout that emits tuples and three bolts >>> that are subscribed to it. I want each of the three bolts to receive 1/3 >>> rth of the output (or emit a tuple to each one of these bolts in turns). >>> Unfortunately as far as I understand all bolts will receive all of the >>> emitted tuples of that particular spout regardless of the grouping defined >>> (as grouping from my understanding is for bolt *tasks* not actual bolts). >>> >>> I've searched a bit and I can't seem to find a way to accomplish >>> that... is there a way to do that or I am searching in vain? >>> >>> Thanks. >>> >> >> >> >> -- >> Tomas Mazukna >> 678-557-3834 >> > >
