Thank you for all reply. I am clear about partial key grouping. So, in a one application view, when I use "Partial Key Grouping", I need next bolts to unite data since the tuple which has the same key may be forwarded to several bolts.
I have following questions about guaranteed message processing. I understood how it works in high level, but some detail parts are not clear. Based on TOPOLOGY_ACKER_EXECUTORS - By not setting this variable or setting it as null, Storm will set the number of acker executors to be equal to the number of workers configured for this topology. ( http://storm.apache.org/releases/current/javadocs/org/apache/storm/Config.html#TOPOLOGY_ACKER_EXECUTORS ) So, in my example, it creates four acker executors in each worker. Are they connected this way (Internally) via netty to exchange ack() or fail() messages between them? 1. spout acker <-> split acker <--> count acker <--> spout acker (to let storm know tuple is fully processed. Hash value is zero). <--> count acker <--> OR 2. spout acker <-> split acker <-> two count ackers Second option is more reasonable. Can you someone explain internal behavior of storm? Thanks in advance. Junguk 2016-06-21 7:54 GMT-04:00 Navin Ipe <[email protected]>: > But, as far as I know, if tuple A goes to task 1 and task 2, then tuple A > will always continue going to task 1 and task 2. Partial key grouping is > the same as fields grouping, but being balanced across tasks. > > On Tue, Jun 21, 2016 at 4:24 PM, Satish Duggana <[email protected]> > wrote: > >> >> In partial key grouping, the same fields (name) sometimes go to first and >> second node even though I used "name" as partial key grouping fields. >> Is it right behavior? >> >> >> partial-key-grouping does not always send the tuples of the same fields >> values to the same task. This grouping computes two hash values and finds >> two tasks to which those tuples can be sent. It load balances between those >> two tasks. That is why both the tasks in your environment have almost equal >> no of tuples processed. >> For ex: If you have 10 tasks and a tuples containing with selected name >> field as David can go to task-1 and task-3. So, all tuples with name as >> David are load balanced between task-1 and task-3. That is why you do not >> see all the tuples with the same field does not go to the same task. >> >> >> When I used WordCountExample, if errors happen in "Count" bolt, how does >> "fail" from Count bolt forward to spout? >> >> 1) Count bolt sends "fail" to previous bolt (split) and then split bolt >> sends it to "spout" >> 2) Count bolt directly sends "fail" to "spout" >> >> >> You can go through the below link for understanding guaranteed message >> processing. You can send queries if you have any after that. >> >> http://storm.apache.org/releases/current/Guaranteeing-message-processing.html >> >> >> >> Thanks, >> Satish. >> >> On Mon, Jun 20, 2016 at 7:16 PM, Junguk Cho <[email protected]> wrote: >> >>> Hi. >>> >>> I have two questions. >>> >>> First, it is about "Partial Key grouping" & "Fields Grouping". >>> >>> In my examples, I used employee class which has "name", "phonenumber", >>> "salary" as tuple to send next worker. >>> I only used "name" as key for groupings. >>> >>> Fields Grouping works as what I expected. >>> Based on "Fields" values, it sends tuples to a next hop. >>> However, Partial Key grouping did not work what I expected. >>> Below are outputs from programs. >>> >>> >>> # From fieldsgrouping >>> # First node >>> Mike 12345 13451 >>> David 12345 13451 >>> Andy 12345 13451 >>> Junguk 12345 13452 >>> Mike 12345 13452 >>> David 12345 13452 >>> Andy 12345 13452 >>> >>> # Second node >>> Bob 12345 13451 >>> Bob 12345 13452 >>> >>> >>> #From partial key grouping >>> # First node >>> Mike 12345 13451 >>> David 12345 13451 >>> Mike 12345 13452 >>> Bob 12345 13452 >>> >>> # Second node >>> Junguk 12345 13451 >>> Bob 12345 13451 >>> Andy 12345 13451 >>> Junguk 12345 13452 >>> David 12345 13452 >>> Andy 12345 13452 >>> >>> In partial key grouping, the same fields (name) sometimes go to first >>> and second node even though I used "name" as partial key grouping fields. >>> Is it right behavior? >>> Or when we use partial key grouping, does it need other nodes to >>> aggregate information from one first and second nodes? >>> >>> # Second question >>> It is about "guaranteeing message processing" >>> If I want to make a topology reliable, >>> first in spout, I used *emit >>> <https://nathanmarz.github.io/storm/doc/backtype/storm/spout/SpoutOutputCollector.html#emit(java.util.List,%20java.lang.Object)>*(java.util.List<java.lang.Object> >>> tuple, >>> java.lang.Object messageId) method from SpoutOUtputCollector >>> and then in bolts, I used collector.emit(Tuple ahchor, List<Object> >>> tuple). >>> >>> When I used WordCountExample, if errors happen in "Count" bolt, how does >>> "fail" from Count bolt forward to spout? >>> >>> 1) Count bolt sends "fail" to previous bolt (split) and then split bolt >>> sends it to "spout" >>> 2) Count bolt directly sends "fail" to "spout" >>> >>> >>> Thanks in advance. >>> - Junguk >>> >>> >> > > > -- > Regards, > Navin >
