Thanks Kishore,

Yes, you got my point, I want it to be a PaaS system to propagate updates.
As you mentioned, I initially was considering a queue, like Kafka and
workers to pull the task from the queue and run the updates. I turned to
storm when I realized how hard would be to scale this architecture out.

One more clarification, I thought that I could simply push a new task to
storm without need of a queue. Can you clarify? I guess I'm missing
something, I need urgently to make some practice with storm I guess.

Thanks all for your help!
Michel



On Sun, Aug 16, 2015 at 12:15 AM, Kishore Senji <[email protected]> wrote:

> If the number of such events (user name update) in your system is not too
> huge then you can do it online (i.e you can acknowledge to the user when it
> interacts on your site to update his name only when you update all the
> documents that are relevant in your NoSql db). But if there are too many
> events and you would like to process them asynchronously so that the
> latency to the end user is low then you can decouple them to a nearline
> system where you stage such events in a Kafka queue. You have have some
> modules running to pull the Kafka queue and update all the systems you have
> to. Even for this you may not need Storm. But you can use Storm if you view
> it as PaaS system. Taking care of fault tolerance of failed nodes and
> pushing out new code to all your nodes is also not an easy task to
> maintain. Storm does this for you for free. So you can use Storm which
> pulls from the Kafka queue and updates the appropriate data stores. If you
> just package the bolt and give it to Storm it will make sure that a number
> of instances running for you so it acts as a PaaS system and you do not
> have to keep monitoring your otherwise batch system.
>
> On Sat, Aug 15, 2015 at 4:40 AM John Yost <[email protected]>
> wrote:
>
>> Hi Michel,
>>
>> I am actually doing something very similar.  I am processing data coming
>> in from a KafkaSpout in Bolt A and then sending the processed tuples to
>> Bolt B where I cache 'em until the collections reach a certain size and
>> then flush the key/value pairs to SequenceFiles that are then loaded into
>> HBase.
>>
>> My topology is working well except for the Bolt A to Bolt B step, but got
>> some great feedback and ideas from Javier and Kishore, and will apply their
>> thoughts.  I think you are on the right track, especially given the message
>> processing guarantees embedded within Storm.
>>
>> --John
>>
>> On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I'm very new to apache-storm and I have to admit I don't have a lot of
>>> experience with NoSQL in general either.
>>>
>>> I'm modelling my data using a document-based approach and I'm trying to
>>> figure out how to update versions of a (sub) document stored in different
>>> "documents". It's the classic scenario where you store user's info in the
>>> comments table. Updates to the user's name (for example) should be
>>> propagated to all the comments.
>>>
>>> My understanding is that in this scenario people would trigger a
>>> procedure on the user's name update that scans all the related documents to
>>> update the user's name.
>>>
>>> I was considering using apache-storm to propagate updates and I would
>>> like to have some feedback from more experienced developers on this kind of
>>> problems.
>>>
>>> Would apache-storm be too much? Should I just use zookeeper?
>>>
>>> My understanding is that apache-storm is mostly used for complex data
>>> manipulations and here all I need to do is to keep the data in sync for
>>> consistency when accessed by users. Am I going the wrong direction? How do
>>> you guys solve this kind of problems?
>>>
>>> Thanks,
>>> Michel
>>>
>>
>>

Reply via email to