Thanks Kishore, Yes, you got my point, I want it to be a PaaS system to propagate updates. As you mentioned, I initially was considering a queue, like Kafka and workers to pull the task from the queue and run the updates. I turned to storm when I realized how hard would be to scale this architecture out.
One more clarification, I thought that I could simply push a new task to storm without need of a queue. Can you clarify? I guess I'm missing something, I need urgently to make some practice with storm I guess. Thanks all for your help! Michel On Sun, Aug 16, 2015 at 12:15 AM, Kishore Senji <[email protected]> wrote: > If the number of such events (user name update) in your system is not too > huge then you can do it online (i.e you can acknowledge to the user when it > interacts on your site to update his name only when you update all the > documents that are relevant in your NoSql db). But if there are too many > events and you would like to process them asynchronously so that the > latency to the end user is low then you can decouple them to a nearline > system where you stage such events in a Kafka queue. You have have some > modules running to pull the Kafka queue and update all the systems you have > to. Even for this you may not need Storm. But you can use Storm if you view > it as PaaS system. Taking care of fault tolerance of failed nodes and > pushing out new code to all your nodes is also not an easy task to > maintain. Storm does this for you for free. So you can use Storm which > pulls from the Kafka queue and updates the appropriate data stores. If you > just package the bolt and give it to Storm it will make sure that a number > of instances running for you so it acts as a PaaS system and you do not > have to keep monitoring your otherwise batch system. > > On Sat, Aug 15, 2015 at 4:40 AM John Yost <[email protected]> > wrote: > >> Hi Michel, >> >> I am actually doing something very similar. I am processing data coming >> in from a KafkaSpout in Bolt A and then sending the processed tuples to >> Bolt B where I cache 'em until the collections reach a certain size and >> then flush the key/value pairs to SequenceFiles that are then loaded into >> HBase. >> >> My topology is working well except for the Bolt A to Bolt B step, but got >> some great feedback and ideas from Javier and Kishore, and will apply their >> thoughts. I think you are on the right track, especially given the message >> processing guarantees embedded within Storm. >> >> --John >> >> On Fri, Aug 14, 2015 at 2:17 PM, Michel Blase <[email protected]> wrote: >> >>> Hi all, >>> >>> I'm very new to apache-storm and I have to admit I don't have a lot of >>> experience with NoSQL in general either. >>> >>> I'm modelling my data using a document-based approach and I'm trying to >>> figure out how to update versions of a (sub) document stored in different >>> "documents". It's the classic scenario where you store user's info in the >>> comments table. Updates to the user's name (for example) should be >>> propagated to all the comments. >>> >>> My understanding is that in this scenario people would trigger a >>> procedure on the user's name update that scans all the related documents to >>> update the user's name. >>> >>> I was considering using apache-storm to propagate updates and I would >>> like to have some feedback from more experienced developers on this kind of >>> problems. >>> >>> Would apache-storm be too much? Should I just use zookeeper? >>> >>> My understanding is that apache-storm is mostly used for complex data >>> manipulations and here all I need to do is to keep the data in sync for >>> consistency when accessed by users. Am I going the wrong direction? How do >>> you guys solve this kind of problems? >>> >>> Thanks, >>> Michel >>> >> >>
