Hey There, Sqoop is capable of performing incremental updates ( http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_incremental_imports). You can also import into HBase ( http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_importing_data_into_hbase ).
Sqoop should be able to update a single table for all three databases, but you'll need to make sure that the row keys sqoop generates don't overlap. Also, you'll likely have to manage '--last-value' I highly recommend testing such a setup first and reporting back with your findings! -Abe On Sat, Aug 3, 2013 at 2:14 PM, shengjie min <[email protected]> wrote: > Hi All, > > I've asked this question in HBase mailing list, people suggested me better > off ask it here :) so here I am. I am new to sqoop and having a use case > where there is a few applications running in house independently, Let's say > applications A, B, C. Each has its own DB associated. I wanna create a > aggregated view on all the databases so that I don't have to jump into > different dbs to find the info I need. Simply example will be all three > applications have a table called "users", they are v similar, I wanna union > the "users" table. > > I've had a look at sqoop, looks like it allows me to move data from > database A,B,C to a single/centralised place - e.g. HBase? > > The solution I am looking for ideally need to do the followings: > > 1. the centralised storage keeps updated reasonably quick as the original > db (A, B, C) gets updated. By all means, I am not looking for one time bulk > import, I wanna have incremental updates after the initial import. > 2. As long as I provide a schema mapping, Can A,B,C be imported to a > single place, e.g. single HBase table. > > now, my question is: > > Is Sqoop a suitable tool for this? I was originally considering to use > mangodb and write the periodic/parallel import piece myself. But for now, I > am leaning towards sqoop more since in house we have hadoop running > already. Any advices are highly appreciated! > > Thanks, > Shengjie
