Michael, Thanks for the response but I think another problem arises; as I just cooked up a small example the increased number of workers only spawns mirrors of the topology. This poses a problem for me due to the fact that my spout reads from a very big file and converts each line into a tuple and feeds that in the topology. What I wanted to do in the first place is to actually send each tuple produced to a different subscribed bolt each time (using Round Robing or smth) so that each one of them got 1/n nth (where n the number of bolts) of the input stream. If I spawn 2 workers both will read the same file and emit the same tuples so both topology workers will produce the same results.
I wanted to avoid to create a spout that takes a file offset as an input and wire a lot more stuff than I have to; so I was trying to find a way to perform what I told you in an elegant and scalable fashion...so far I have found nil. On Thu, Jul 17, 2014 at 2:57 AM, Michael Rose <[email protected]> wrote: > It doesn't say so, but if you have 4 workers, the 4 executors will be > shared evenly over the 4 workers. Likewise, 16 will partition 4 each. The > only case where a worker will not get a specific executor is when there are > less executors than workers (e.g. 8 workers, 4 executors), 4 of the workers > will receive an executor but the others will not. > > It sounds like for your case, shuffle+parallelism is more than sufficient. > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > [email protected] > > > On Wed, Jul 16, 2014 at 5:53 PM, Andrew Xor <[email protected]> > wrote: > >> Hey Stephen, Michael, >> >> Yea I feared as much... as searching the docs and API did not surface >> any reliable and elegant way of doing that unless you had a "RouterBolt". >> If setting the parallelism of a component is enough for load balancing the >> processes across different machines that are part of the Storm cluster then >> this would suffice in my use case. Although here >> <https://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html> >> the documentation says executors are threads and it does not explicitly say >> anywhere that threads are spawned across different nodes of the cluster... >> I want to avoid the possibility of these threads only spawning locally and >> not in a distributed fashion among the cluster nodes.. >> >> Andrew. >> >> >> On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose <[email protected]> >> wrote: >> >>> Maybe we can help with your topology design if you let us know what >>> you're doing that requires you to shuffle half of the whole stream output >>> to each of the two different types of bolts. >>> >>> If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two >>> different types) as above, there's no point to doing this. Setting the >>> parallelism will make sure that data is partitioned across machines (by >>> default, setting parallelism sets tasks = executors = parallelism). >>> >>> Unfortunately, I don't know of any way to do this other than shuffling >>> the output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt >>> b0 round-robin the received tuples between two streams, then have b1 and b2 >>> shuffle over those streams instead. >>> >>> >>> >>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>> [email protected] >>> >>> >>> On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor <[email protected] >>> > wrote: >>> >>>> >>>> Hi Tomas, >>>> >>>> As I said in my previous mail the grouping is for a bolt *task* not >>>> for the actual number of spawned bolts; for example let's say you have two >>>> bolts that have a parallelism hint of 3 and these two bolts are wired to >>>> the same spout. If you set the bolts as such: >>>> >>>> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint >>>> */).shuffleGrouping("spout1"); >>>> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint >>>> */).shuffleGrouping("spout1"); >>>> >>>> Then each of the tasks will receive half of the spout tuples but each >>>> actual spawned bolt will receive all of the tuples emitted from the spout. >>>> This is more evident if you set up a counter in the bolt counting how many >>>> tuples if has received and testing this with no parallelism hint as such: >>>> >>>> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1"); >>>> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1"); >>>> >>>> Now you will see that both bolts will receive all tuples emitted by >>>> spout1. >>>> >>>> Hope this helps. >>>> >>>> >>>> Andrew. >>>> >>>> >>>> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <[email protected] >>>> > wrote: >>>> >>>>> Andrew, >>>>> >>>>> when you connect your bolt to your spout you specify the grouping. If >>>>> you use shuffle grouping then any free bolt gets the tuple - in my >>>>> experience even in lightly loaded topologies the distribution amongst >>>>> bolts >>>>> is pretty even. If you use all grouping then all bolts receive a copy of >>>>> the tuple. >>>>> Use shuffle grouping and each of your bolts will get about 1/3 of the >>>>> workload. >>>>> >>>>> Tomas >>>>> >>>>> >>>>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor < >>>>> [email protected]> wrote: >>>>> >>>>>> H >>>>>> i, >>>>>> >>>>>> I am trying to distribute the spout output to it's subscribed bolts >>>>>> evenly; let's say that I have a spout that emits tuples and three bolts >>>>>> that are subscribed to it. I want each of the three bolts to receive 1/3 >>>>>> rth of the output (or emit a tuple to each one of these bolts in turns). >>>>>> Unfortunately as far as I understand all bolts will receive all of the >>>>>> emitted tuples of that particular spout regardless of the grouping >>>>>> defined >>>>>> (as grouping from my understanding is for bolt *tasks* not actual bolts). >>>>>> >>>>>> I've searched a bit and I can't seem to find a way to accomplish >>>>>> that... is there a way to do that or I am searching in vain? >>>>>> >>>>>> Thanks. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Tomas Mazukna >>>>> 678-557-3834 >>>>> >>>> >>>> >>> >> >
