Re: Storm partialkeygrouping & reliability

Junguk Cho Tue, 21 Jun 2016 11:36:21 -0700

Thank you for all reply.

I am clear about partial key grouping.
So, in a one application view, when I use "Partial Key Grouping", I need
next bolts to unite data since the tuple which has the same key may be
forwarded to several bolts.


I have following questions about guaranteed message processing.
I understood how it works in high level, but some detail parts are not
clear.
Based on TOPOLOGY_ACKER_EXECUTORS -  By not setting this variable or
setting it as null, Storm will set the number of acker executors to be
equal to the number of workers configured for this topology. (
http://storm.apache.org/releases/current/javadocs/org/apache/storm/Config.html#TOPOLOGY_ACKER_EXECUTORS
)

So, in my example, it creates four acker executors in each worker.

Are they connected this way (Internally) via netty to exchange ack() or
fail() messages between them?
1. spout acker <-> split acker <--> count acker <--> spout acker (to let
storm know tuple is fully processed. Hash value is zero).
                                                            <--> count
acker <-->

OR
2. spout acker <-> split acker
                    <-> two count ackers

Second option is more reasonable.
Can you someone explain internal behavior of storm?

Thanks in advance.
Junguk



2016-06-21 7:54 GMT-04:00 Navin Ipe <[email protected]>:

> But, as far as I know, if tuple A goes to task 1 and task 2, then tuple A
> will always continue going to task 1 and task 2. Partial key grouping is
> the same as fields grouping, but being balanced across tasks.
>
> On Tue, Jun 21, 2016 at 4:24 PM, Satish Duggana <[email protected]>
> wrote:
>
>>
>> In partial key grouping, the same fields (name) sometimes go to first and
>> second node even though I used "name" as partial  key grouping fields.
>> Is it right behavior?
>>
>>
>> partial-key-grouping does not always send the tuples of the same fields
>> values to the same task. This grouping computes two hash values and finds
>> two tasks to which those tuples can be sent. It load balances between those
>> two tasks. That is why both the tasks in your environment have almost equal
>> no of tuples processed.
>> For ex: If you have 10 tasks and a tuples containing with selected name
>> field as David can go to task-1 and task-3. So, all tuples with name as
>> David are load balanced between task-1 and task-3. That is why you do not
>> see all the tuples with the same field does not go to the same task.
>>
>>
>> When I used WordCountExample, if errors happen in "Count" bolt, how does
>> "fail" from Count bolt forward to spout?
>>
>> 1) Count bolt sends "fail" to previous bolt (split) and then split bolt
>> sends it to "spout"
>> 2) Count bolt directly sends "fail" to "spout"
>>
>>
>> You can go through the below link for understanding guaranteed message
>> processing. You can send queries if you have any after that.
>>
>> http://storm.apache.org/releases/current/Guaranteeing-message-processing.html
>>
>>
>>
>> Thanks,
>> Satish.
>>
>> On Mon, Jun 20, 2016 at 7:16 PM, Junguk Cho <[email protected]> wrote:
>>
>>> Hi.
>>>
>>> I have two questions.
>>>
>>> First, it is about "Partial Key grouping" & "Fields Grouping".
>>>
>>> In my examples, I used employee class which has "name", "phonenumber",
>>> "salary" as tuple to send next worker.
>>> I only used "name" as key for groupings.
>>>
>>> Fields Grouping works as what I expected.
>>> Based on "Fields" values, it sends tuples to a next hop.
>>> However, Partial Key grouping did not work what I expected.
>>> Below are outputs from programs.
>>>
>>>
>>> # From fieldsgrouping
>>> # First node
>>> Mike 12345 13451
>>> David 12345 13451
>>> Andy 12345 13451
>>> Junguk 12345 13452
>>> Mike 12345 13452
>>> David 12345 13452
>>> Andy 12345 13452
>>>
>>> # Second node
>>> Bob 12345 13451
>>> Bob 12345 13452
>>>
>>>
>>> #From partial key grouping
>>> # First node
>>> Mike 12345 13451
>>> David 12345 13451
>>> Mike 12345 13452
>>> Bob 12345 13452
>>>
>>> # Second node
>>> Junguk 12345 13451
>>> Bob 12345 13451
>>> Andy 12345 13451
>>> Junguk 12345 13452
>>> David 12345 13452
>>> Andy 12345 13452
>>>
>>> In partial key grouping, the same fields (name) sometimes go to first
>>> and second node even though I used "name" as partial  key grouping fields.
>>> Is it right behavior?
>>> Or when we use partial key grouping, does it need other nodes to
>>> aggregate information from one first and second nodes?
>>>
>>> # Second question
>>> It is about "guaranteeing message processing"
>>> If I want to make a topology reliable,
>>> first in spout,  I used  *emit
>>> <https://nathanmarz.github.io/storm/doc/backtype/storm/spout/SpoutOutputCollector.html#emit(java.util.List,%20java.lang.Object)>*(java.util.List<java.lang.Object>
>>>  tuple,
>>> java.lang.Object messageId) method from SpoutOUtputCollector
>>> and then in bolts,   I used collector.emit(Tuple ahchor, List<Object>
>>> tuple).
>>>
>>> When I used WordCountExample, if errors happen in "Count" bolt, how does
>>> "fail" from Count bolt forward to spout?
>>>
>>> 1) Count bolt sends "fail" to previous bolt (split) and then split bolt
>>> sends it to "spout"
>>> 2) Count bolt directly sends "fail" to "spout"
>>>
>>>
>>> Thanks in advance.
>>> - Junguk
>>>
>>>
>>
>
>
> --
> Regards,
> Navin
>

Re: Storm partialkeygrouping & reliability

Reply via email to