We would give to Storm a topology and it contains a Spout (where the
messages to act upon will be emitted to the topology) and bolts which act
upon on that message and can further emit more messages. We can create a
network to process messages.

In you case you would have a bolt to process a message to update your NoSql
data stores. The spout which emits the messages to act upon by the bolt
would typically be reading it from a queue. There is another reason to host
it in a Kafka queue, if your bolt fails to process a message for any reason
(datastore being down temporarily or the new code you rolled out could have
some bugs etc), you would still have those messages hosted in the Kafka
queue for replaying the message (processing reliability) and Storm has
built-in reliability guarantees and it works well with a KafkaSpout.


On Sun, Aug 16, 2015 at 9:51 PM Michel Blase <[email protected]> wrote:

> Thanks Kishore,
>
> Yes, you got my point, I want it to be a PaaS system to propagate updates.
> As you mentioned, I initially was considering a queue, like Kafka and
> workers to pull the task from the queue and run the updates. I turned to
> storm when I realized how hard would be to scale this architecture out.
>
> One more clarification, I thought that I could simply push a new task to
> storm without need of a queue. Can you clarify? I guess I'm missing
> something, I need urgently to make some practice with storm I guess.
>
> Thanks all for your help!
> Michel
>
>
>
> On Sun, Aug 16, 2015 at 12:15 AM, Kishore Senji <[email protected]> wrote:
>
>> If the number of such events (user name update) in your system is not too
>> huge then you can do it online (i.e you can acknowledge to the user when it
>> interacts on your site to update his name only when you update all the
>> documents that are relevant in your NoSql db). But if there are too many
>> events and you would like to process them asynchronously so that the
>> latency to the end user is low then you can decouple them to a nearline
>> system where you stage such events in a Kafka queue. You have have some
>> modules running to pull the Kafka queue and update all the systems you have
>> to. Even for this you may not need Storm. But you can use Storm if you view
>> it as PaaS system. Taking care of fault tolerance of failed nodes and
>> pushing out new code to all your nodes is also not an easy task to
>> maintain. Storm does this for you for free. So you can use Storm which
>> pulls from the Kafka queue and updates the appropriate data stores. If you
>> just package the bolt and give it to Storm it will make sure that a number
>> of instances running for you so it acts as a PaaS system and you do not
>> have to keep monitoring your otherwise batch system.
>>
>> On Sat, Aug 15, 2015 at 4:40 AM John Yost <[email protected]>
>> wrote:
>>
>>> Hi Michel,
>>>
>>> I am actually doing something very similar.  I am processing data coming
>>> in from a KafkaSpout in Bolt A and then sending the processed tuples to
>>> Bolt B where I cache 'em until the collections reach a certain size and
>>> then flush the key/value pairs to SequenceFiles that are then loaded into
>>> HBase.
>>>
>>> My topology is working well except for the Bolt A to Bolt B step, but
>>> got some great feedback and ideas from Javier and Kishore, and will apply
>>> their thoughts.  I think you are on the right track, especially given the
>>> message processing guarantees embedded within Storm.
>>>
>>> --John
>>>
>>> On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm very new to apache-storm and I have to admit I don't have a lot of
>>>> experience with NoSQL in general either.
>>>>
>>>> I'm modelling my data using a document-based approach and I'm trying to
>>>> figure out how to update versions of a (sub) document stored in different
>>>> "documents". It's the classic scenario where you store user's info in the
>>>> comments table. Updates to the user's name (for example) should be
>>>> propagated to all the comments.
>>>>
>>>> My understanding is that in this scenario people would trigger a
>>>> procedure on the user's name update that scans all the related documents to
>>>> update the user's name.
>>>>
>>>> I was considering using apache-storm to propagate updates and I would
>>>> like to have some feedback from more experienced developers on this kind of
>>>> problems.
>>>>
>>>> Would apache-storm be too much? Should I just use zookeeper?
>>>>
>>>> My understanding is that apache-storm is mostly used for complex data
>>>> manipulations and here all I need to do is to keep the data in sync for
>>>> consistency when accessed by users. Am I going the wrong direction? How do
>>>> you guys solve this kind of problems?
>>>>
>>>> Thanks,
>>>> Michel
>>>>
>>>
>>>
>

Reply via email to