so here what you can get - look at the prepare method of a bolt, you get a Context, you can get the bolt name, but if you have more then one instance you will get the same name - what you defined in topology builder. You can also get the component id - which is unique in the whole topology.
On Wed, Jul 16, 2014 at 9:24 PM, Andrew Xor <[email protected]> wrote: > Let me rephrase that, I want to know the id of each component at runtime, > can I do that? For example when I set a bolt with an id of "b1" I want to > be able to retrieve that string during runtime within that bolt; so I can > perform something like what you are doing with the kafka partition > offsets... but again I cannot find in the API a call that retrieves that... > > Andrew. > > On Thu, Jul 17, 2014 at 4:21 AM, Tomas Mazukna <[email protected]> > wrote: > >> Kafka client handles that, it is stored in zookeeper with the offset. >> I wrote a kafka spout based on kafka groups consumer api. Kafka allows >> only one consumer per partition per group. >> >> >> On Wed, Jul 16, 2014 at 8:41 PM, Andrew Xor <[email protected]> >> wrote: >> >>> Ok, but upon runtime how to you set in the spout which kafka partition >>> to subscribe at? >>> >>> Kindly yours, >>> >>> Andrew Grammenos >>> >>> -- PGP PKey -- >>> <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt> >>> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt >>> >>> >>> On Thu, Jul 17, 2014 at 3:30 AM, Tomas Mazukna <[email protected]> >>> wrote: >>> >>>> So you want to define only one instance of the spout that reads the >>>> file. Number of bolts will only depend on how fast you need to process the >>>> data. >>>> I have a topology that has a spout with parallelism of 40 - connected >>>> to 40 partitions of a kafka topic. It send traffic to the first bolt which >>>> has parallelism 320. The whole topology is split up into 4 workers. that >>>> makes 10 spout instances in each jvm, feeding 80 bolts. In my case I have >>>> grouping so tuples get routed to different physical machines. >>>> >>>> Tomas >>>> >>>> >>>> On Wed, Jul 16, 2014 at 8:10 PM, Andrew Xor < >>>> [email protected]> wrote: >>>> >>>>> Michael, >>>>> >>>>> Thanks for the response but I think another problem arises; as I >>>>> just cooked up a small example the increased number of workers only spawns >>>>> mirrors of the topology. This poses a problem for me due to the fact that >>>>> my spout reads from a very big file and converts each line into a tuple >>>>> and >>>>> feeds that in the topology. What I wanted to do in the first place is to >>>>> actually send each tuple produced to a different subscribed bolt each time >>>>> (using Round Robing or smth) so that each one of them got 1/n nth (where n >>>>> the number of bolts) of the input stream. If I spawn 2 workers both will >>>>> read the same file and emit the same tuples so both topology workers will >>>>> produce the same results. >>>>> >>>>> I wanted to avoid to create a spout that takes a file offset as an >>>>> input and wire a lot more stuff than I have to; so I was trying to find a >>>>> way to perform what I told you in an elegant and scalable fashion...so far >>>>> I have found nil. >>>>> >>>>> >>>>> On Thu, Jul 17, 2014 at 2:57 AM, Michael Rose <[email protected] >>>>> > wrote: >>>>> >>>>>> It doesn't say so, but if you have 4 workers, the 4 executors will be >>>>>> shared evenly over the 4 workers. Likewise, 16 will partition 4 each. The >>>>>> only case where a worker will not get a specific executor is when there >>>>>> are >>>>>> less executors than workers (e.g. 8 workers, 4 executors), 4 of the >>>>>> workers >>>>>> will receive an executor but the others will not. >>>>>> >>>>>> It sounds like for your case, shuffle+parallelism is more than >>>>>> sufficient. >>>>>> >>>>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>>>> [email protected] >>>>>> >>>>>> >>>>>> On Wed, Jul 16, 2014 at 5:53 PM, Andrew Xor < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Stephen, Michael, >>>>>>> >>>>>>> Yea I feared as much... as searching the docs and API did not >>>>>>> surface any reliable and elegant way of doing that unless you had a >>>>>>> "RouterBolt". If setting the parallelism of a component is enough for >>>>>>> load >>>>>>> balancing the processes across different machines that are part of the >>>>>>> Storm cluster then this would suffice in my use case. Although here >>>>>>> <https://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html> >>>>>>> the documentation says executors are threads and it does not explicitly >>>>>>> say >>>>>>> anywhere that threads are spawned across different nodes of the >>>>>>> cluster... >>>>>>> I want to avoid the possibility of these threads only spawning locally >>>>>>> and >>>>>>> not in a distributed fashion among the cluster nodes.. >>>>>>> >>>>>>> Andrew. >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Maybe we can help with your topology design if you let us know what >>>>>>>> you're doing that requires you to shuffle half of the whole stream >>>>>>>> output >>>>>>>> to each of the two different types of bolts. >>>>>>>> >>>>>>>> If bolt b1 and bolt b2 are both instances of ExampleBolt (and not >>>>>>>> two different types) as above, there's no point to doing this. Setting >>>>>>>> the >>>>>>>> parallelism will make sure that data is partitioned across machines (by >>>>>>>> default, setting parallelism sets tasks = executors = parallelism). >>>>>>>> >>>>>>>> Unfortunately, I don't know of any way to do this other than >>>>>>>> shuffling the output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then >>>>>>>> having bolt b0 round-robin the received tuples between two streams, >>>>>>>> then >>>>>>>> have b1 and b2 shuffle over those streams instead. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>>>>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>>>>>> [email protected] >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi Tomas, >>>>>>>>> >>>>>>>>> As I said in my previous mail the grouping is for a bolt *task* >>>>>>>>> not for the actual number of spawned bolts; for example let's say you >>>>>>>>> have >>>>>>>>> two bolts that have a parallelism hint of 3 and these two bolts are >>>>>>>>> wired >>>>>>>>> to the same spout. If you set the bolts as such: >>>>>>>>> >>>>>>>>> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint >>>>>>>>> */).shuffleGrouping("spout1"); >>>>>>>>> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint >>>>>>>>> */).shuffleGrouping("spout1"); >>>>>>>>> >>>>>>>>> Then each of the tasks will receive half of the spout tuples but >>>>>>>>> each actual spawned bolt will receive all of the tuples emitted from >>>>>>>>> the >>>>>>>>> spout. This is more evident if you set up a counter in the bolt >>>>>>>>> counting >>>>>>>>> how many tuples if has received and testing this with no parallelism >>>>>>>>> hint >>>>>>>>> as such: >>>>>>>>> >>>>>>>>> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1"); >>>>>>>>> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1"); >>>>>>>>> >>>>>>>>> Now you will see that both bolts will receive all tuples emitted >>>>>>>>> by spout1. >>>>>>>>> >>>>>>>>> Hope this helps. >>>>>>>>> >>>>>>>>> >>>>>>>>> Andrew. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Andrew, >>>>>>>>>> >>>>>>>>>> when you connect your bolt to your spout you specify the >>>>>>>>>> grouping. If you use shuffle grouping then any free bolt gets the >>>>>>>>>> tuple - >>>>>>>>>> in my experience even in lightly loaded topologies the distribution >>>>>>>>>> amongst >>>>>>>>>> bolts is pretty even. If you use all grouping then all bolts receive >>>>>>>>>> a copy >>>>>>>>>> of the tuple. >>>>>>>>>> Use shuffle grouping and each of your bolts will get about 1/3 of >>>>>>>>>> the workload. >>>>>>>>>> >>>>>>>>>> Tomas >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> H >>>>>>>>>>> i, >>>>>>>>>>> >>>>>>>>>>> I am trying to distribute the spout output to it's subscribed >>>>>>>>>>> bolts evenly; let's say that I have a spout that emits tuples and >>>>>>>>>>> three >>>>>>>>>>> bolts that are subscribed to it. I want each of the three bolts to >>>>>>>>>>> receive >>>>>>>>>>> 1/3 rth of the output (or emit a tuple to each one of these bolts in >>>>>>>>>>> turns). Unfortunately as far as I understand all bolts will receive >>>>>>>>>>> all of >>>>>>>>>>> the emitted tuples of that particular spout regardless of the >>>>>>>>>>> grouping >>>>>>>>>>> defined (as grouping from my understanding is for bolt *tasks* not >>>>>>>>>>> actual >>>>>>>>>>> bolts). >>>>>>>>>>> >>>>>>>>>>> I've searched a bit and I can't seem to find a way to >>>>>>>>>>> accomplish that... is there a way to do that or I am searching in >>>>>>>>>>> vain? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Tomas Mazukna >>>>>>>>>> 678-557-3834 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Tomas Mazukna >>>> 678-557-3834 >>>> >>> >>> >> >> >> -- >> Tomas Mazukna >> 678-557-3834 >> > > -- Tomas Mazukna 678-557-3834
