We do something kinda similar.  I think you will need another store to
keep track of these, not sure about storm's distributed cache, we use
cassandra, but you could use zookeeper, or some other store.  The
issue is since rows come out of storm in no guaranteed order, you
really don't know when you are done.  You need to know when you are
complete in order to do remove the messages from your source (or
otherwise update stuff over there).

So what we do is keep track of how many rows are on the read side (in
some store).  Then as we process we update on the write side, how many
rows we wrote.  Then by checking this count, how many we updated vs
how many we expected in total, we know when we are done.  It sounds
like you situation might be more complicated than ours if you talking
about rows from many different tables all inside same transaction, but
in any event some type of pattern like this should work.

For perspective, we essential ETL data out of 100's of tables like
this into Cassandra, and it works quite well.  You just need to be
super careful with the completion logic there are many edge cases to
consider.

On Thu, Apr 14, 2016 at 9:00 AM, Nikos R. Katsipoulakis
<[email protected]> wrote:
> Hello Sreekumar,
>
> Have you thought of using Storm's distributed cache? If not, that might a
> way to cache messages before you push them to the target DB. Another way to
> do so, is if you can create your own Bolt to periodically push messages in
> the database.
>
> I hope I helped.
>
> Cheers,
> Nikos
>
> On Thu, Apr 14, 2016 at 12:54 AM, pradeep s <[email protected]>
> wrote:
>>
>> Hi,
>> We are using Storm for processing CDC messages from Oracle Golden Gate .
>> Pipeline is as below
>> Oracle GoldenGate-->Queue-->Storm-->Relational DB
>>  We have a requirement to hold the messages for a transaction Id till all
>> the messages for that transaction is available in Storm. There can be
>> scenarios like 1 million updates happening in onme transaction source oracle
>> system.
>> Can you please suggest a best approach for holding the messages and then
>> pushing to target db only when all messages for tran id is available in
>> storm.
>>
>> Regards
>> Pradeep S
>
>
>
>
> --
> Nikos R. Katsipoulakis,
> Department of Computer Science
> University of Pittsburgh



-- 

John Bush
Trax Technologies, Inc.
M: 480-227-2910
TraxTech.Com

-- 
CONFIDENTIALITY NOTICE: The preceding and/or attached information may be 
confidential or privileged. It should be used or disseminated solely for 
the purpose of conducting business with Trax. If you are not an intended 
recipient, please notify the sender by replying to this message and then 
delete the information from your system. Thank you for your cooperation.

Reply via email to