On Tue, Dec 7, 2010 at 10:55 PM, Hiller, Dean (Contractor) <[email protected]> wrote: > We are going to move 7 terabytes(set to grow to 35 when our SLA goes > from 2 years to 10 years of storage) from an RDBMS to hadoop/hbase type > system and I was wondering if anyone knew of how to get events from > hbase on persisted/modified entities so that changes can be replicated > to our RDBMS easily.
Again, I would suggest you to take a look at the RowLog library I mentioned in my previous post. We use it as a message queue to asynchronously feed SOLR indices, which somehow sounds as what you need during your transition week. The Rowlog processor scans a WAL of row update entries at regular intervals, giving you a near-real-time up-to-dateness of your RDBMS replication queue. Nothing out of the box though, you'll have some code to write but at the very least the tricky bits are solved already for you. As Todd is suggesting, you will need to be careful to not overload your RDBMS, but doing it using a tighter integrated mechanism than mass M/R might reduce that chance. Kind regards, Steven. -- Steven Noels http://outerthought.org/ Open Source Content Applications Makers of Kauri, Daisy CMS and Lily
