Re: Implementing a barrier mechanism in storm

Spico Florin Thu, 31 Jul 2014 23:26:22 -0700

Hello!
  Thanks for your reply. I was thinking to use such the cache mechanism
(the guava cache as a static field in the final bolt). The question that I
have now, is how do you keep track of the bolts upfront? Suppose that you
have 3 bolts then you are counting the number of bolts that were processing
one message Map<msgId,count_of_bolts>?
Can you send the name of the bolts that have been processed the message to
the final bolt, and in the final bolt to check if the the list of all
processing bolts is the same?


What is the best approach here?

Thanks .
Regards,
  Florin

On Fri, Aug 1, 2014 at 7:51 AM, Michael Rose <[email protected]>
wrote:

> It's another case of a streaming join. I've done this before, there aren't
> too many gotchas, other than you need a datastructure which purges stale
> unresolved joins beyond the tuple timeout time (I used a Guava cache for
> this).
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> [email protected]
>
>
> On Thu, Jul 31, 2014 at 10:44 PM, Varun Vijayaraghavan <
> [email protected]> wrote:
>
>> That's interesting. Note that I have not used such a pattern before - and
>> have done something similar. I have not used trident - so this probably
>> will not answer your last question completely :)
>>
>> If you set up the topology such that links between bolt {1, 2, 3} and
>> final bolt is stream grouped by "msgId" - you could keep the partially
>> processed results in memory (or in a persisted state somewhere) - till you
>> see the processed result for all the bolts.
>>
>> I would also expire msgIds which have not seen further results for beyond
>> a certain threshold time.
>>
>> What do you think?
>>
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 AM, Spico Florin <[email protected]>
>> wrote:
>>
>>> Hello!
>>>   I have a case study where the same message (identified by an id) is
>>> spread over a couple of processing bolts and a final bolt should act as a
>>> barrier. This final bolt should do its job on the message ID only when all
>>> the upfront bolts have finished their process on the message.
>>>
>>> As sketch is bellow
>>>
>>>
>>> Spout -> (msgid1, payload)->process bolt 1 (msgId1, payloadb1)        |
>>>                                               ->process bolt 2(msgId1,
>>> payloadb2)  |->final bolt(msg1,finalp)
>>>                                               ->process bolt 3(msgId1,
>>> payloadb3)  |
>>>
>>> So the final bolt should not start a work on a message id till the
>>> message was not processed by all the 3 processing bolts.
>>>   My question are:
>>> 1. Can be these case viable for storm?
>>> 2. If the answer for the first question is yes, how can I achieve this
>>> request?
>>> 3. Is possible to achieve this request without using trident?
>>> I look forward for your answers.
>>> Thanks,
>>>   Florin
>>>
>>
>>
>>
>> --
>> - varun :)
>>
>
>

Re: Implementing a barrier mechanism in storm

Reply via email to