Hi Michel,

I am actually doing something very similar.  I am processing data coming in
from a KafkaSpout in Bolt A and then sending the processed tuples to Bolt B
where I cache 'em until the collections reach a certain size and then flush
the key/value pairs to SequenceFiles that are then loaded into HBase.

My topology is working well except for the Bolt A to Bolt B step, but got
some great feedback and ideas from Javier and Kishore, and will apply their
thoughts.  I think you are on the right track, especially given the message
processing guarantees embedded within Storm.

--John

On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]> wrote:

> Hi all,
>
> I'm very new to apache-storm and I have to admit I don't have a lot of
> experience with NoSQL in general either.
>
> I'm modelling my data using a document-based approach and I'm trying to
> figure out how to update versions of a (sub) document stored in different
> "documents". It's the classic scenario where you store user's info in the
> comments table. Updates to the user's name (for example) should be
> propagated to all the comments.
>
> My understanding is that in this scenario people would trigger a procedure
> on the user's name update that scans all the related documents to update
> the user's name.
>
> I was considering using apache-storm to propagate updates and I would like
> to have some feedback from more experienced developers on this kind of
> problems.
>
> Would apache-storm be too much? Should I just use zookeeper?
>
> My understanding is that apache-storm is mostly used for complex data
> manipulations and here all I need to do is to keep the data in sync for
> consistency when accessed by users. Am I going the wrong direction? How do
> you guys solve this kind of problems?
>
> Thanks,
> Michel
>

Reply via email to