RDBMS works on the basis of changes written to redo or transaction log before 
commits.

 

To get a true feed to Hive you will need the committed log deliveries in the 
form of text delimited files loaded to a hive temporary tables and then 
inserted to Hive table following the initial load using Sqoop or anything else. 
The target Hive table needs to have two additional columns; namely op_type = 
insert|update|Delete and op_time = from_unixtime(unix_timestamp()) inserted 
when the new row is loaded from temporary table. Then you will run your ETL to 
populate the end Hive table with up-to-date data.

 

Incremental append using Sqoop will not reflect updates or deletes in the 
source table. They will just add new rows based on the primary key and last 
value.

 

Most RDBMS do not really delete rows unless they are purged or archived being 
older than certain dates say past year or so. A row in RDBMS is created once, 
updated many and deleted once. So the prime interest would be to look for 
Inserts and updates.

 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Ashok Kumar [mailto:ashok34...@yahoo.com] 
Sent: 05 May 2015 18:41
To: user@hive.apache.org
Subject: downloading RDBMS table data to Hive with Sqoop import

 


Hi gurus,

I can use Sqoop import to get RDBMS data say Oracle to Hive first and then use 
incremental append for new rows with PK and last value.

However, how do you account for updates and deletes with Sqoop without full 
load of table from RDBMS to Hive?

Thanks

 

Reply via email to