We do something kinda similar. I think you will need another store to keep track of these, not sure about storm's distributed cache, we use cassandra, but you could use zookeeper, or some other store. The issue is since rows come out of storm in no guaranteed order, you really don't know when you are done. You need to know when you are complete in order to do remove the messages from your source (or otherwise update stuff over there).
So what we do is keep track of how many rows are on the read side (in some store). Then as we process we update on the write side, how many rows we wrote. Then by checking this count, how many we updated vs how many we expected in total, we know when we are done. It sounds like you situation might be more complicated than ours if you talking about rows from many different tables all inside same transaction, but in any event some type of pattern like this should work. For perspective, we essential ETL data out of 100's of tables like this into Cassandra, and it works quite well. You just need to be super careful with the completion logic there are many edge cases to consider. On Thu, Apr 14, 2016 at 9:00 AM, Nikos R. Katsipoulakis <[email protected]> wrote: > Hello Sreekumar, > > Have you thought of using Storm's distributed cache? If not, that might a > way to cache messages before you push them to the target DB. Another way to > do so, is if you can create your own Bolt to periodically push messages in > the database. > > I hope I helped. > > Cheers, > Nikos > > On Thu, Apr 14, 2016 at 12:54 AM, pradeep s <[email protected]> > wrote: >> >> Hi, >> We are using Storm for processing CDC messages from Oracle Golden Gate . >> Pipeline is as below >> Oracle GoldenGate-->Queue-->Storm-->Relational DB >> We have a requirement to hold the messages for a transaction Id till all >> the messages for that transaction is available in Storm. There can be >> scenarios like 1 million updates happening in onme transaction source oracle >> system. >> Can you please suggest a best approach for holding the messages and then >> pushing to target db only when all messages for tran id is available in >> storm. >> >> Regards >> Pradeep S > > > > > -- > Nikos R. Katsipoulakis, > Department of Computer Science > University of Pittsburgh -- John Bush Trax Technologies, Inc. M: 480-227-2910 TraxTech.Com -- CONFIDENTIALITY NOTICE: The preceding and/or attached information may be confidential or privileged. It should be used or disseminated solely for the purpose of conducting business with Trax. If you are not an intended recipient, please notify the sender by replying to this message and then delete the information from your system. Thank you for your cooperation.
