RDBMS works on the basis of changes written to redo or transaction log before commits.
To get a true feed to Hive you will need the committed log deliveries in the form of text delimited files loaded to a hive temporary tables and then inserted to Hive table following the initial load using Sqoop or anything else. The target Hive table needs to have two additional columns; namely op_type = insert|update|Delete and op_time = from_unixtime(unix_timestamp()) inserted when the new row is loaded from temporary table. Then you will run your ETL to populate the end Hive table with up-to-date data. Incremental append using Sqoop will not reflect updates or deletes in the source table. They will just add new rows based on the primary key and last value. Most RDBMS do not really delete rows unless they are purged or archived being older than certain dates say past year or so. A row in RDBMS is created once, updated many and deleted once. So the prime interest would be to look for Inserts and updates. HTH Mich Talebzadeh http://talebzadehmich.wordpress.com Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Ashok Kumar [mailto:ashok34...@yahoo.com] Sent: 05 May 2015 18:41 To: user@hive.apache.org Subject: downloading RDBMS table data to Hive with Sqoop import Hi gurus, I can use Sqoop import to get RDBMS data say Oracle to Hive first and then use incremental append for new rows with PK and last value. However, how do you account for updates and deletes with Sqoop without full load of table from RDBMS to Hive? Thanks