Thanks to Boglarka. It seemed Sqoop2 is quite different from Sqoop 1. On Sat, Dec 10, 2016 at 1:11 AM, Boglarka Egyed <[email protected]> wrote:
> *About Sqoop2:* > There are some incremental import related features implemented in Sqoop2, > but I have no experience with it unfortunately. > A good starting point could be for example to check the configuration of > it here: https://github.com/apache/sqoop/blob/sqoop2/connector/connec > tor-generic-jdbc/src/main/java/org/apache/sqoop/connecto > r/jdbc/configuration/IncrementalRead.java > > *About using Sqoop1 incremental import on Hive tables:* > Sqoop1 does not support incremental import with the --hive-import option > because the Hive import functionality is not able to merge data. > > However, there is a workaround for this issue: > * first, all the data should be imported into Hive tables from the DB > * then changed data should be imported with incremental import from the DB > by setting the --target-dir option to the directory holding the Hive table > data and we have to specify the field delimiter, line delimiter and escape > character used by Hive > > An example: > * first: > sqoop import --connect <conn> --username <uname> --password <pwd> --table > <table_name> --num-mappers 1 --hive-import --hive-table <table_name> > > This command will create the Hive table and load all the data from the DB. > > * then: > sqoop import --connect <conn> --username <uname> --password <pwd> --table > <table_name> --incremental lastmodified --check-column <col> --merge-key id > --num-mappers 1 --target-dir /user/hive/warehouse/<table_name> > --last-value <last_value> --fields-terminated-by <field_del> > --lines-terminated-by <lines_del> --escaped-by <esc> > > When the second command runs it queries records from DB which were > modified after <last_value>, it puts them in a temporary directory then > merges this result with the already imported data. The result will be put > into the target directory which is the directory provided to Hive so the > next time we query the <table_name> table in Hive we will see the merged > data. > > *About handling DELETION:* > Sqoop incremental import doesn't handle deletion on the source, it > supports two types: append (for new rows) and lastmodified (for alternate > table update). > > On Thu, Dec 8, 2016 at 2:56 AM, wenxing zheng <[email protected]> > wrote: > >> And another important case is how the incremental imports can cope with >> the "DELETION" on the source. >> >> Thanks again, >> >> On Thu, Dec 8, 2016 at 9:42 AM, Buntu Dev <[email protected]> wrote: >> >>> I'm interested in this as well and would like to sqoop to a hadoop >>> cluster as a Hive table. How does one handle the updates on the table using >>> the incremental imports? >>> >>> On Wed, Dec 7, 2016 at 5:29 PM, wenxing zheng <[email protected]> >>> wrote: >>> >>>> Thanks to Bogi. Is it available in the Sqoop2? >>>> >>>> On Thu, Dec 8, 2016 at 4:16 AM, Boglarka Egyed <[email protected]> >>>> wrote: >>>> >>>>> Hi Wenxing, >>>>> >>>>> Yes, Sqoop is capable to support CDC, it can be performed by >>>>> "incremental" imports. >>>>> >>>>> Please see the user guide for details: https://sqoop.apache. >>>>> org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports >>>>> >>>>> Cheers, >>>>> Bogi >>>>> >>>>> On Wed, Dec 7, 2016 at 12:21 PM, wenxing zheng < >>>>> [email protected]> wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I am new to the Sqoop, and would like to know whether the >>>>>> architecture of Sqoop is able to support "change data capture"? or can I >>>>>> say that the "Generic JDBC Connector" is only for bulk data transferring? >>>>>> >>>>>> Appreciated for any hints >>>>>> >>>>>> Thanks, Wenxing >>>>>> >>>>> >>>>> >>>> >>> >> >
