[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573571#comment-15573571 ] Ruslan Dautkhanov commented on SQOOP-3022: -- Thank you [~maugli], I will definitely try and report back here. The other performance problems we had with sqoop was related to wide dataset so not related to this jira (This issue is related to Oracle-side bottlenecks like redo log generation, direct-path insert contention etc). The issue with wide datasets was purely sqoop-side problem (again, not related to this issue) was partially fixed in SQOOP-2920 and other fixes are coming in your SQOOP-2983 - Thank You for your work on that. > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572070#comment-15572070 ] Hudson commented on SQOOP-2986: --- SUCCESS: Integrated in Jenkins build Sqoop-hadoop200 #1067 (See [https://builds.apache.org/job/Sqoop-hadoop200/1067/]) SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8]) * (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java * (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572019#comment-15572019 ] Hudson commented on SQOOP-2986: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop100 #1027 (See [https://builds.apache.org/job/Sqoop-hadoop100/1027/]) SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8]) * (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java * (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571970#comment-15571970 ] Hudson commented on SQOOP-2952: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop23 #1264 (See [https://builds.apache.org/job/Sqoop-hadoop23/1264/]) SQOOP-2952: Fixing bug (row key not added into column family using (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1]) * (edit) build.xml * (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java * (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java * (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java * (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java * (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java * (edit) ivy.xml * (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571968#comment-15571968 ] Hudson commented on SQOOP-2986: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop23 #1264 (See [https://builds.apache.org/job/Sqoop-hadoop23/1264/]) SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8]) * (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java * (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 52426: row key not added into column family using --hbase-bulkload
> On Oct. 13, 2016, 12:38 p.m., Attila Szabo wrote: > > Hi Szabi, > > > > Thanks so much for the update, it seems to work on my side either. > > +1 for creating the JIRA tickets for cleaning up the profiles, especially > > because unittesting+3rd_party_testing with different profiles (without > > clean) could cause unexpected behaviours/errors (I'd run into that already > > even with your changeset). So I kindly ask you to create those iteams on > > issues.apache.org as a follwup of this issue. > > > > A big +1 for using DDT tools of JUnit. > > > > Nice and clean solution! Hi Attila! Thank you for reviewing and committing this patch! I have created a JIRA for profile cleanup: https://issues.apache.org/jira/browse/SQOOP-3023 Regards, Szabolcs - Szabolcs --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52426/#review152486 --- On Oct. 10, 2016, 1:42 p.m., Szabolcs Vasas wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52426/ > --- > > (Updated Oct. 10, 2016, 1:42 p.m.) > > > Review request for Sqoop and Attila Szabo. > > > Bugs: SQOOP-2952 > https://issues.apache.org/jira/browse/SQOOP-2952 > > > Repository: sqoop-trunk > > > Description > --- > > row key not added into column family using --hbase-bulkload > > > Diffs > - > > build.xml 97e5502 > ivy.xml a502530 > src/java/org/apache/sqoop/hbase/HBasePutProcessor.java b2431ac > src/java/org/apache/sqoop/hbase/PutTransformer.java 8d6bcac > src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java b5cad1d > src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java 363b5d7 > src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java cfbb1d3 > src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java 37dc004 > > Diff: https://reviews.apache.org/r/52426/diff/ > > > Testing > --- > > New unit test cases are added. > > HBaseImportAddRowKeyTest can be run with the following command: > > ant clean test -Dtestcase=HBaseImportAddRowKeyTest -Dhadoopversion=260 > -Dhbaseprofile=95 > > > Thanks, > > Szabolcs Vasas > >
[jira] [Created] (SQOOP-3023) Cleanup build profiles in build.xml
Szabolcs Vasas created SQOOP-3023: - Summary: Cleanup build profiles in build.xml Key: SQOOP-3023 URL: https://issues.apache.org/jira/browse/SQOOP-3023 Project: Sqoop Issue Type: Task Affects Versions: 1.4.6 Reporter: Szabolcs Vasas Sqoop build.xml contains a number of profiles developers can choose to build the project. Some of these profiles builds with really old dependencies and may not be needed anymore. The task is to cleanup the build script and remove the unnecessary profiles (hbaseprofile, hadoopversion etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571938#comment-15571938 ] Hudson commented on SQOOP-2986: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop20 #1062 (See [https://builds.apache.org/job/Sqoop-hadoop20/1062/]) SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8]) * (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java * (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571939#comment-15571939 ] Hudson commented on SQOOP-2952: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop20 #1062 (See [https://builds.apache.org/job/Sqoop-hadoop20/1062/]) SQOOP-2952: Fixing bug (row key not added into column family using (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1]) * (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java * (edit) ivy.xml * (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java * (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java * (edit) build.xml * (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java * (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java * (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571908#comment-15571908 ] Hudson commented on SQOOP-2952: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop200 #1066 (See [https://builds.apache.org/job/Sqoop-hadoop200/1066/]) SQOOP-2952: Fixing bug (row key not added into column family using (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1]) * (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java * (edit) build.xml * (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java * (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java * (edit) ivy.xml * (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java * (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java * (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571909#comment-15571909 ] Attila Szabo commented on SQOOP-2986: - Thank you [~vasas] for your contribution! IMHO this validation will make the user expectations much clearer, and not letting them to run into unexpected scenario! Also nice job around the testing! Nice hit sir! Thanks, [~maugli] > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Szabo closed SQOOP-2986. --- > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50566/#review152491 --- Ship it! Nice job Szabi! +1 for using @Rule feature, makes the unittets much more concise and declarative. Keep up the good work! - Attila Szabo On Oct. 13, 2016, 1:11 p.m., Szabolcs Vasas wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50566/ > --- > > (Updated Oct. 13, 2016, 1:11 p.m.) > > > Review request for Sqoop. > > > Bugs: SQOOP-2986 > https://issues.apache.org/jira/browse/SQOOP-2986 > > > Repository: sqoop-trunk > > > Description > --- > > Add validation check for --hive-import and --incremental lastmodified > > > Diffs > - > > src/java/org/apache/sqoop/tool/BaseSqoopTool.java b71bc5e > src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/50566/diff/ > > > Testing > --- > > New unit test cases are added, also tested manually by executing sqoop import > command. > > > Thanks, > > Szabolcs Vasas > >
[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571897#comment-15571897 ] ASF subversion and git services commented on SQOOP-2986: Commit 14754342d3a9bd6e146b9628b2e103ff30f310d8 in sqoop's branch refs/heads/trunk from [~maugli] [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=1475434 ] SQOOP-2986: Add validation check for --hive-import and --incremental lastmodified (Szabolcs Vasas via Attila Szabo) > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-3011) sqoop import to HIVE external table based on file system other than HDFS
[ https://issues.apache.org/jira/browse/SQOOP-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] VISHNU S NAIR updated SQOOP-3011: - Assignee: (was: VISHNU S NAIR) > sqoop import to HIVE external table based on file system other than HDFS > > > Key: SQOOP-3011 > URL: https://issues.apache.org/jira/browse/SQOOP-3011 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Hui Cao > > 1, Create external Hive table using swift URI in Hive shell > CREATE TABLE foo(id INT, msg STRING, INT) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION ’swift://swift.location/’; > This table is created on an external file system instead of hdfs, in this > case, I’m using swift Object Store > 2, Use sqoop to insert data to this table, > $SQOOP_PATH/sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect > jdbc:db://:/ --username --password > --table FOO --hive-import --hive-home hive > the process shows following error: > FAILED: SemanticException [Error 10028]: Line 2:17 Path is not legal > ''hdfs://:8020/user/hdfs/FOO'': Move from: > hdfs://:8020/user/hdfs/FOO to: swift://swift.location/ is not > valid. Please check that values for params "default.fs.name" and > "hive.metastore.warehouse.dir" do not conflict. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
[ https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Vasas updated SQOOP-2986: -- Attachment: SQOOP-2986.patch > Add validation check for --hive-import and --incremental lastmodified > - > > Key: SQOOP-2986 > URL: https://issues.apache.org/jira/browse/SQOOP-2986 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Szabolcs Vasas >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2986.patch, SQOOP-2986.patch > > > Sqoop import with --hive-import and --incremental lastmodified options is not > supported, however the application is able to run with these parameters but > it produces unexpected results, the output can contain duplicate rows. > Steps to reproduce the issue: > 1) Create the necessary table for example in MySQL: > CREATE TABLE "Employees" ( > "id" int(11) NOT NULL, > "name" varchar(45) DEFAULT NULL, > "salary" varchar(45) DEFAULT NULL, > "change_date" datetime DEFAULT NULL, > PRIMARY KEY ("id") > ) ENGINE=MyISAM DEFAULT CHARSET=latin1; > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (1,'employee1',1000,now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (2,'employee2','2000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (3,'employee3','3000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (4,'employee4','4000',now()); > INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES > (5,'employee5','5000',now()); > 2) Import the table to Hive > sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop > --username sqoop --password sqoop --table Employees --num-mappers 1 > --hive-import --hive-table Employees > 3) Update some rows in MySQL: > UPDATE Employees SET salary=1010, change_date=now() where id=1; > UPDATE Employees SET salary=2010, change_date=now() where id=2; > 4) Execute the incremental import command: > sudo -u hdfs sqoop import --verbose --connect > jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table > Employees --incremental lastmodified --check-column change_date --merge-key > id --num-mappers 1 --hive-import --hive-table Employees --last-value > "last_timestamp" > 5) As a result employees with ids 1 and 2 will not be updated but we will see > duplicate rows in the Hive table. > The task is to introduce a fail-fast validation which will make the Sqoop > import fail if it was submitted with --hive-import and --incremental > lastmodified options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50566/ --- (Updated Oct. 13, 2016, 1:11 p.m.) Review request for Sqoop. Bugs: SQOOP-2986 https://issues.apache.org/jira/browse/SQOOP-2986 Repository: sqoop-trunk Description --- Add validation check for --hive-import and --incremental lastmodified Diffs (updated) - src/java/org/apache/sqoop/tool/BaseSqoopTool.java b71bc5e src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java PRE-CREATION Diff: https://reviews.apache.org/r/50566/diff/ Testing --- New unit test cases are added, also tested manually by executing sqoop import command. Thanks, Szabolcs Vasas
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571852#comment-15571852 ] Hudson commented on SQOOP-2952: --- FAILURE: Integrated in Jenkins build Sqoop-hadoop100 #1026 (See [https://builds.apache.org/job/Sqoop-hadoop100/1026/]) SQOOP-2952: Fixing bug (row key not added into column family using (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1]) * (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java * (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java * (edit) build.xml * (edit) ivy.xml * (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java * (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java * (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java * (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50566/#review152487 --- Hi @Szabi, It seems trunk has been diverged since the time you'd created the lates patch. Although your solution looks good, could you please provide an updated version of the patch file, which applies to the current trunk version? Many thanks in advance! - Attila Szabo On July 28, 2016, 1:07 p.m., Szabolcs Vasas wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50566/ > --- > > (Updated July 28, 2016, 1:07 p.m.) > > > Review request for Sqoop. > > > Bugs: SQOOP-2986 > https://issues.apache.org/jira/browse/SQOOP-2986 > > > Repository: sqoop-trunk > > > Description > --- > > Add validation check for --hive-import and --incremental lastmodified > > > Diffs > - > > src/java/org/apache/sqoop/tool/BaseSqoopTool.java fecdf43 > src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/50566/diff/ > > > Testing > --- > > New unit test cases are added, also tested manually by executing sqoop import > command. > > > Thanks, > > Szabolcs Vasas > >
[jira] [Closed] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Szabo closed SQOOP-2952. --- > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Szabo updated SQOOP-2952: Fix Version/s: 1.4.7 > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Fix For: 1.4.7 > > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571803#comment-15571803 ] Attila Szabo commented on SQOOP-2952: - Hi [~vasas], Thank you for your contribution! Especially for figuring out the profile/dependency related problems in connection with the HBase realted 3rd party tests. Nice and clean job! > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload
[ https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571801#comment-15571801 ] ASF subversion and git services commented on SQOOP-2952: Commit b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1 in sqoop's branch refs/heads/trunk from [~maugli] [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=b4afcf4 ] SQOOP-2952: Fixing bug (row key not added into column family using --hbase-bulkload) (Szabolcs Vasas via Attila Szabo) > row key not added into column family using --hbase-bulkload > --- > > Key: SQOOP-2952 > URL: https://issues.apache.org/jira/browse/SQOOP-2952 > Project: Sqoop > Issue Type: Bug >Reporter: Xiaomin Zhang >Assignee: Szabolcs Vasas > Attachments: SQOOP-2952.patch, SQOOP-2952.patch > > > While using --hbase-bulkload to import a table to HBase, the row key were not > added into the column family even sqoop.hbase.add.row.key=true was defined > Example command line: > sqoop import -Dsqoop.hbase.add.row.key=true --connect > jdbc:mysql://localhost:3306/XXX --username xxx --password xxx > --hbase-create-table --hbase-table XXX --column-family cf --table TBL > --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 52426: row key not added into column family using --hbase-bulkload
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52426/#review152486 --- Ship it! Hi Szabi, Thanks so much for the update, it seems to work on my side either. +1 for creating the JIRA tickets for cleaning up the profiles, especially because unittesting+3rd_party_testing with different profiles (without clean) could cause unexpected behaviours/errors (I'd run into that already even with your changeset). So I kindly ask you to create those iteams on issues.apache.org as a follwup of this issue. A big +1 for using DDT tools of JUnit. Nice and clean solution! - Attila Szabo On Oct. 10, 2016, 1:42 p.m., Szabolcs Vasas wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52426/ > --- > > (Updated Oct. 10, 2016, 1:42 p.m.) > > > Review request for Sqoop and Attila Szabo. > > > Bugs: SQOOP-2952 > https://issues.apache.org/jira/browse/SQOOP-2952 > > > Repository: sqoop-trunk > > > Description > --- > > row key not added into column family using --hbase-bulkload > > > Diffs > - > > build.xml 97e5502 > ivy.xml a502530 > src/java/org/apache/sqoop/hbase/HBasePutProcessor.java b2431ac > src/java/org/apache/sqoop/hbase/PutTransformer.java 8d6bcac > src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java b5cad1d > src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java 363b5d7 > src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java cfbb1d3 > src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java 37dc004 > > Diff: https://reviews.apache.org/r/52426/diff/ > > > Testing > --- > > New unit test cases are added. > > HBaseImportAddRowKeyTest can be run with the following command: > > ant clean test -Dtestcase=HBaseImportAddRowKeyTest -Dhadoopversion=260 > -Dhbaseprofile=95 > > > Thanks, > > Szabolcs Vasas > >
[jira] [Issue Comment Deleted] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Szabo updated SQOOP-3022: Comment: was deleted (was: Hi [~Tagar], I would strongly encourage you to do an exhaustive testing on your side, before closing this ticket. Functionality wise I do think OraOop is feature complete, and you could achieve your goals with it (as I've referred to it before, and as [~david.robson] confirmed that too). Although as partitioned/non partitioned versions of the import are very much different from performance POV, and as there is an ongoing performance related change ( [SQOOP-2983] which might be interesting for you ), I think it would make sense to evaluate if the current solution+performance is satisfying for you (from the past I do remember you had serious performance related constraints in your system/pipline). ) > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571371#comment-15571371 ] Attila Szabo commented on SQOOP-3022: - Hi [~Tagar], I would strongly encourage you to do an exhaustive testing on your side, before closing this ticket. Functionality wise I do think OraOop is feature complete, and you could achieve your goals with it (as I've referred to it before, and as [~david.robson] confirmed that too). Although as partitioned/non partitioned versions of the import are very much different from performance POV, and as there is an ongoing performance related change ( [SQOOP-2983] which might be interesting for you ), I think it would make sense to evaluate if the current solution+performance is satisfying for you (from the past I do remember you had serious performance related constraints in your system/pipline). > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571372#comment-15571372 ] Attila Szabo commented on SQOOP-3022: - Hi [~Tagar], I would strongly encourage you to do an exhaustive testing on your side, before closing this ticket. Functionality wise I do think OraOop is feature complete, and you could achieve your goals with it (as I've referred to it before, and as [~david.robson] confirmed that too). Although as partitioned/non partitioned versions of the import are very much different from performance POV, and as there is an ongoing performance related change ( [SQOOP-2983] which might be interesting for you ), I think it would make sense to evaluate if the current solution+performance is satisfying for you (from the past I do remember you had serious performance related constraints in your system/pipline). > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)