If the number of such events (user name update) in your system is not too huge then you can do it online (i.e you can acknowledge to the user when it interacts on your site to update his name only when you update all the documents that are relevant in your NoSql db). But if there are too many events and you would like to process them asynchronously so that the latency to the end user is low then you can decouple them to a nearline system where you stage such events in a Kafka queue. You have have some modules running to pull the Kafka queue and update all the systems you have to. Even for this you may not need Storm. But you can use Storm if you view it as PaaS system. Taking care of fault tolerance of failed nodes and pushing out new code to all your nodes is also not an easy task to maintain. Storm does this for you for free. So you can use Storm which pulls from the Kafka queue and updates the appropriate data stores. If you just package the bolt and give it to Storm it will make sure that a number of instances running for you so it acts as a PaaS system and you do not have to keep monitoring your otherwise batch system.
On Sat, Aug 15, 2015 at 4:40 AM John Yost <[email protected]> wrote: > Hi Michel, > > I am actually doing something very similar. I am processing data coming > in from a KafkaSpout in Bolt A and then sending the processed tuples to > Bolt B where I cache 'em until the collections reach a certain size and > then flush the key/value pairs to SequenceFiles that are then loaded into > HBase. > > My topology is working well except for the Bolt A to Bolt B step, but got > some great feedback and ideas from Javier and Kishore, and will apply their > thoughts. I think you are on the right track, especially given the message > processing guarantees embedded within Storm. > > --John > > On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]> wrote: > >> Hi all, >> >> I'm very new to apache-storm and I have to admit I don't have a lot of >> experience with NoSQL in general either. >> >> I'm modelling my data using a document-based approach and I'm trying to >> figure out how to update versions of a (sub) document stored in different >> "documents". It's the classic scenario where you store user's info in the >> comments table. Updates to the user's name (for example) should be >> propagated to all the comments. >> >> My understanding is that in this scenario people would trigger a >> procedure on the user's name update that scans all the related documents to >> update the user's name. >> >> I was considering using apache-storm to propagate updates and I would >> like to have some feedback from more experienced developers on this kind of >> problems. >> >> Would apache-storm be too much? Should I just use zookeeper? >> >> My understanding is that apache-storm is mostly used for complex data >> manipulations and here all I need to do is to keep the data in sync for >> consistency when accessed by users. Am I going the wrong direction? How do >> you guys solve this kind of problems? >> >> Thanks, >> Michel >> > >
