Hi Michel, I am actually doing something very similar. I am processing data coming in from a KafkaSpout in Bolt A and then sending the processed tuples to Bolt B where I cache 'em until the collections reach a certain size and then flush the key/value pairs to SequenceFiles that are then loaded into HBase.
My topology is working well except for the Bolt A to Bolt B step, but got some great feedback and ideas from Javier and Kishore, and will apply their thoughts. I think you are on the right track, especially given the message processing guarantees embedded within Storm. --John On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]> wrote: > Hi all, > > I'm very new to apache-storm and I have to admit I don't have a lot of > experience with NoSQL in general either. > > I'm modelling my data using a document-based approach and I'm trying to > figure out how to update versions of a (sub) document stored in different > "documents". It's the classic scenario where you store user's info in the > comments table. Updates to the user's name (for example) should be > propagated to all the comments. > > My understanding is that in this scenario people would trigger a procedure > on the user's name update that scans all the related documents to update > the user's name. > > I was considering using apache-storm to propagate updates and I would like > to have some feedback from more experienced developers on this kind of > problems. > > Would apache-storm be too much? Should I just use zookeeper? > > My understanding is that apache-storm is mostly used for complex data > manipulations and here all I need to do is to keep the data in sync for > consistency when accessed by users. Am I going the wrong direction? How do > you guys solve this kind of problems? > > Thanks, > Michel >
