Add an update_time column to the source table and do the incremental load by that update_time column.
On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <[email protected]> wrote: > Hi guys, > > TO simplify my question, Let's say, I have a mysql table called 'student', > looks like this: > > +----+----------+-----+ > | id | name | sex | > +----+----------+-----+ > | 1 | Alice | 0 | > | 2 | Bob | 1 | > | 3 | Charles | 1 | > +----+----------+-----+ > > I want to import this table to HBase periodically which means I will run > this sqoop job periodically. There are two goals: > > A. every time there is a new record inserted to mysql table, e.g. (4, > David, 1), I hope my next sqoop import will catch it and put it in HBase. > B. if there is any updates have been made to mysql rows 1, 2, 3, I want > to have the updates in HBase too after next round sqoop import. > > I checked two types incremental updates sqoop has: Append mode seems only > satisfied goal A while Last-modified mode will require my mysql table has a > timestamp column for each row(which I don't in real life). I know if I > don't use incremental updates options at all, I can just get way with it by > running a fresh import every time, but if my mysql table is really huge and > fresh import might be a performance killer. > > Is there anyway I can just do incremental updates instead of having to > re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS? > > > Shengjie -- -- JChan
