[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-13 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573571#comment-15573571
 ] 

Ruslan Dautkhanov commented on SQOOP-3022:
--

Thank you [~maugli], I will definitely try and report back here.

The other performance problems we had with sqoop was related to wide dataset so 
not related to this jira (This issue is related to Oracle-side bottlenecks like 
redo log generation, direct-path insert contention etc). The issue with wide 
datasets was purely sqoop-side problem (again, not related to this issue) was 
partially fixed in SQOOP-2920 and other fixes are coming in your SQOOP-2983 - 
Thank You for your work on that.

> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572070#comment-15572070
 ] 

Hudson commented on SQOOP-2986:
---

SUCCESS: Integrated in Jenkins build Sqoop-hadoop200 #1067 (See 
[https://builds.apache.org/job/Sqoop-hadoop200/1067/])
SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8])
* (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java


> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572019#comment-15572019
 ] 

Hudson commented on SQOOP-2986:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop100 #1027 (See 
[https://builds.apache.org/job/Sqoop-hadoop100/1027/])
SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8])
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java
* (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java


> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571970#comment-15571970
 ] 

Hudson commented on SQOOP-2952:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop23 #1264 (See 
[https://builds.apache.org/job/Sqoop-hadoop23/1264/])
SQOOP-2952: Fixing bug (row key not added into column family using (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1])
* (edit) build.xml
* (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java
* (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java
* (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java
* (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java
* (edit) ivy.xml
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java


> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571968#comment-15571968
 ] 

Hudson commented on SQOOP-2986:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop23 #1264 (See 
[https://builds.apache.org/job/Sqoop-hadoop23/1264/])
SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8])
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java
* (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java


> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 52426: row key not added into column family using --hbase-bulkload

2016-10-13 Thread Szabolcs Vasas


> On Oct. 13, 2016, 12:38 p.m., Attila Szabo wrote:
> > Hi Szabi,
> > 
> > Thanks so much for the update, it seems to work on my side either.
> > +1 for creating the JIRA tickets for cleaning up the profiles, especially 
> > because unittesting+3rd_party_testing with different profiles (without 
> > clean) could cause unexpected behaviours/errors (I'd run into that already 
> > even with your changeset). So I kindly ask you to create those iteams on 
> > issues.apache.org as a follwup of this issue.
> > 
> > A big +1 for using DDT tools of JUnit.
> > 
> > Nice and clean solution!

Hi Attila!

Thank you for reviewing and committing this patch! I have created a JIRA for 
profile cleanup: https://issues.apache.org/jira/browse/SQOOP-3023

Regards,
Szabolcs


- Szabolcs


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52426/#review152486
---


On Oct. 10, 2016, 1:42 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52426/
> ---
> 
> (Updated Oct. 10, 2016, 1:42 p.m.)
> 
> 
> Review request for Sqoop and Attila Szabo.
> 
> 
> Bugs: SQOOP-2952
> https://issues.apache.org/jira/browse/SQOOP-2952
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> row key not added into column family using --hbase-bulkload
> 
> 
> Diffs
> -
> 
>   build.xml 97e5502 
>   ivy.xml a502530 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java b2431ac 
>   src/java/org/apache/sqoop/hbase/PutTransformer.java 8d6bcac 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java b5cad1d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java 363b5d7 
>   src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java cfbb1d3 
>   src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java 37dc004 
> 
> Diff: https://reviews.apache.org/r/52426/diff/
> 
> 
> Testing
> ---
> 
> New unit test cases are added.
> 
> HBaseImportAddRowKeyTest can be run with the following command:
> 
> ant clean test -Dtestcase=HBaseImportAddRowKeyTest -Dhadoopversion=260 
> -Dhbaseprofile=95
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Created] (SQOOP-3023) Cleanup build profiles in build.xml

2016-10-13 Thread Szabolcs Vasas (JIRA)
Szabolcs Vasas created SQOOP-3023:
-

 Summary: Cleanup build profiles in build.xml
 Key: SQOOP-3023
 URL: https://issues.apache.org/jira/browse/SQOOP-3023
 Project: Sqoop
  Issue Type: Task
Affects Versions: 1.4.6
Reporter: Szabolcs Vasas


Sqoop build.xml contains a number of profiles developers can choose to build 
the project. Some of these profiles builds with really old dependencies and may 
not be needed anymore. The task is to cleanup the build script and remove the 
unnecessary profiles (hbaseprofile, hadoopversion etc.).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571938#comment-15571938
 ] 

Hudson commented on SQOOP-2986:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop20 #1062 (See 
[https://builds.apache.org/job/Sqoop-hadoop20/1062/])
SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=14754342d3a9bd6e146b9628b2e103ff30f310d8])
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java
* (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java


> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571939#comment-15571939
 ] 

Hudson commented on SQOOP-2952:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop20 #1062 (See 
[https://builds.apache.org/job/Sqoop-hadoop20/1062/])
SQOOP-2952: Fixing bug (row key not added into column family using (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1])
* (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java
* (edit) ivy.xml
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java
* (edit) build.xml
* (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java
* (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java
* (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java


> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571908#comment-15571908
 ] 

Hudson commented on SQOOP-2952:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop200 #1066 (See 
[https://builds.apache.org/job/Sqoop-hadoop200/1066/])
SQOOP-2952: Fixing bug (row key not added into column family using (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1])
* (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java
* (edit) build.xml
* (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java
* (edit) ivy.xml
* (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java
* (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java


> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571909#comment-15571909
 ] 

Attila Szabo commented on SQOOP-2986:
-

Thank you [~vasas] for your contribution!

IMHO this validation will make the user expectations much clearer, and not 
letting them to run into unexpected scenario!

Also nice job around the testing!

Nice hit sir!

Thanks,
[~maugli]

> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Attila Szabo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Szabo closed SQOOP-2986.
---

> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Attila Szabo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50566/#review152491
---


Ship it!




Nice job Szabi!

+1 for using @Rule feature, makes the unittets much more concise and 
declarative.

Keep up the good work!

- Attila Szabo


On Oct. 13, 2016, 1:11 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50566/
> ---
> 
> (Updated Oct. 13, 2016, 1:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2986
> https://issues.apache.org/jira/browse/SQOOP-2986
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Add validation check for --hive-import and --incremental lastmodified
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b71bc5e 
>   src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50566/diff/
> 
> 
> Testing
> ---
> 
> New unit test cases are added, also tested manually by executing sqoop import 
> command.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571897#comment-15571897
 ] 

ASF subversion and git services commented on SQOOP-2986:


Commit 14754342d3a9bd6e146b9628b2e103ff30f310d8 in sqoop's branch 
refs/heads/trunk from [~maugli]
[ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=1475434 ]

SQOOP-2986: Add validation check for --hive-import and --incremental 
lastmodified

(Szabolcs Vasas via Attila Szabo)


> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-3011) sqoop import to HIVE external table based on file system other than HDFS

2016-10-13 Thread VISHNU S NAIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

VISHNU S NAIR updated SQOOP-3011:
-
Assignee: (was: VISHNU S NAIR)

> sqoop import to HIVE external table based on file system other than HDFS
> 
>
> Key: SQOOP-3011
> URL: https://issues.apache.org/jira/browse/SQOOP-3011
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Hui Cao
>
> 1, Create external Hive table using swift URI in Hive shell
>   CREATE TABLE foo(id INT, msg STRING, INT) ROW FORMAT
>   DELIMITED FIELDS TERMINATED BY ','
>   LINES TERMINATED BY '\n' 
>   STORED AS TEXTFILE
>   LOCATION ’swift://swift.location/’;
>   This table is created on an external file system instead of hdfs, in this 
> case, I’m using swift Object Store
> 2, Use sqoop to insert data to this table, 
> $SQOOP_PATH/sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect 
> jdbc:db://:/ --username  --password 
>  --table FOO --hive-import --hive-home hive
> the process shows following error:
> FAILED: SemanticException [Error 10028]: Line 2:17 Path is not legal 
> ''hdfs://:8020/user/hdfs/FOO'': Move from: 
> hdfs://:8020/user/hdfs/FOO to: swift://swift.location/ is not 
> valid. Please check that values for params "default.fs.name" and 
> "hive.metastore.warehouse.dir" do not conflict.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Szabolcs Vasas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Vasas updated SQOOP-2986:
--
Attachment: SQOOP-2986.patch

> Add validation check for --hive-import and --incremental lastmodified
> -
>
> Key: SQOOP-2986
> URL: https://issues.apache.org/jira/browse/SQOOP-2986
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Szabolcs Vasas
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50566/
---

(Updated Oct. 13, 2016, 1:11 p.m.)


Review request for Sqoop.


Bugs: SQOOP-2986
https://issues.apache.org/jira/browse/SQOOP-2986


Repository: sqoop-trunk


Description
---

Add validation check for --hive-import and --incremental lastmodified


Diffs (updated)
-

  src/java/org/apache/sqoop/tool/BaseSqoopTool.java b71bc5e 
  src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/50566/diff/


Testing
---

New unit test cases are added, also tested manually by executing sqoop import 
command.


Thanks,

Szabolcs Vasas



[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571852#comment-15571852
 ] 

Hudson commented on SQOOP-2952:
---

FAILURE: Integrated in Jenkins build Sqoop-hadoop100 #1026 (See 
[https://builds.apache.org/job/Sqoop-hadoop100/1026/])
SQOOP-2952: Fixing bug (row key not added into column family using (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git=commit=b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1])
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java
* (edit) src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java
* (edit) build.xml
* (edit) ivy.xml
* (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java
* (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java
* (edit) src/java/org/apache/sqoop/hbase/PutTransformer.java
* (edit) src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java


> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50566: Add validation check for --hive-import and --incremental lastmodified

2016-10-13 Thread Attila Szabo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50566/#review152487
---



Hi @Szabi,

It seems trunk has been diverged since the time you'd created the lates patch. 
Although your solution looks good, could you please provide an updated version 
of the patch file, which applies to the current trunk version?

Many thanks in advance!

- Attila Szabo


On July 28, 2016, 1:07 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50566/
> ---
> 
> (Updated July 28, 2016, 1:07 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2986
> https://issues.apache.org/jira/browse/SQOOP-2986
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Add validation check for --hive-import and --incremental lastmodified
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java fecdf43 
>   src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50566/diff/
> 
> 
> Testing
> ---
> 
> New unit test cases are added, also tested manually by executing sqoop import 
> command.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Closed] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Attila Szabo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Szabo closed SQOOP-2952.
---

> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Attila Szabo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Szabo updated SQOOP-2952:

Fix Version/s: 1.4.7

> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Fix For: 1.4.7
>
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571803#comment-15571803
 ] 

Attila Szabo commented on SQOOP-2952:
-

Hi [~vasas],

Thank you for your contribution!
Especially for figuring out the profile/dependency related problems in 
connection with the HBase realted 3rd party tests.

Nice and clean job!

> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2952) row key not added into column family using --hbase-bulkload

2016-10-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571801#comment-15571801
 ] 

ASF subversion and git services commented on SQOOP-2952:


Commit b4afcf4179b13c25b5e9bd182d75cab5d2e6c8d1 in sqoop's branch 
refs/heads/trunk from [~maugli]
[ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=b4afcf4 ]

SQOOP-2952: Fixing bug (row key not added into column family using 
--hbase-bulkload)

(Szabolcs Vasas via Attila Szabo)


> row key not added into column family using --hbase-bulkload
> ---
>
> Key: SQOOP-2952
> URL: https://issues.apache.org/jira/browse/SQOOP-2952
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Xiaomin Zhang
>Assignee: Szabolcs Vasas
> Attachments: SQOOP-2952.patch, SQOOP-2952.patch
>
>
> While using --hbase-bulkload to import a table to HBase, the row key were not 
> added into the column family even sqoop.hbase.add.row.key=true was defined
> Example command line:
> sqoop import -Dsqoop.hbase.add.row.key=true --connect 
> jdbc:mysql://localhost:3306/XXX --username xxx --password xxx 
> --hbase-create-table --hbase-table XXX --column-family cf --table TBL 
> --split-by ID --hbase-row-key ID --hbase-bulkload --target-dir /tmp/bulkload



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 52426: row key not added into column family using --hbase-bulkload

2016-10-13 Thread Attila Szabo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52426/#review152486
---


Ship it!




Hi Szabi,

Thanks so much for the update, it seems to work on my side either.
+1 for creating the JIRA tickets for cleaning up the profiles, especially 
because unittesting+3rd_party_testing with different profiles (without clean) 
could cause unexpected behaviours/errors (I'd run into that already even with 
your changeset). So I kindly ask you to create those iteams on 
issues.apache.org as a follwup of this issue.

A big +1 for using DDT tools of JUnit.

Nice and clean solution!

- Attila Szabo


On Oct. 10, 2016, 1:42 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52426/
> ---
> 
> (Updated Oct. 10, 2016, 1:42 p.m.)
> 
> 
> Review request for Sqoop and Attila Szabo.
> 
> 
> Bugs: SQOOP-2952
> https://issues.apache.org/jira/browse/SQOOP-2952
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> row key not added into column family using --hbase-bulkload
> 
> 
> Diffs
> -
> 
>   build.xml 97e5502 
>   ivy.xml a502530 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java b2431ac 
>   src/java/org/apache/sqoop/hbase/PutTransformer.java 8d6bcac 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java b5cad1d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java 363b5d7 
>   src/test/com/cloudera/sqoop/hbase/HBaseImportAddRowKeyTest.java cfbb1d3 
>   src/test/com/cloudera/sqoop/hbase/HBaseTestCase.java 37dc004 
> 
> Diff: https://reviews.apache.org/r/52426/diff/
> 
> 
> Testing
> ---
> 
> New unit test cases are added.
> 
> HBaseImportAddRowKeyTest can be run with the following command:
> 
> ant clean test -Dtestcase=HBaseImportAddRowKeyTest -Dhadoopversion=260 
> -Dhbaseprofile=95
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Issue Comment Deleted] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-13 Thread Attila Szabo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Szabo updated SQOOP-3022:

Comment: was deleted

(was: Hi [~Tagar],

I would strongly encourage you to do an exhaustive testing on your side, before 
closing this ticket.

Functionality wise I do think OraOop is feature complete, and you could achieve 
your goals with it (as I've referred to it before, and as [~david.robson] 
confirmed that too).

Although as partitioned/non partitioned versions of the import are very much 
different from performance POV, and as there is an ongoing performance related 
change ( [SQOOP-2983] which might be interesting for you ), I think it would 
make sense to evaluate if the current solution+performance is satisfying for 
you (from the past I do remember you had serious performance related 
constraints in your system/pipline).
)

> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-13 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571371#comment-15571371
 ] 

Attila Szabo commented on SQOOP-3022:
-

Hi [~Tagar],

I would strongly encourage you to do an exhaustive testing on your side, before 
closing this ticket.

Functionality wise I do think OraOop is feature complete, and you could achieve 
your goals with it (as I've referred to it before, and as [~david.robson] 
confirmed that too).

Although as partitioned/non partitioned versions of the import are very much 
different from performance POV, and as there is an ongoing performance related 
change ( [SQOOP-2983] which might be interesting for you ), I think it would 
make sense to evaluate if the current solution+performance is satisfying for 
you (from the past I do remember you had serious performance related 
constraints in your system/pipline).


> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-13 Thread Attila Szabo (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571372#comment-15571372
 ] 

Attila Szabo commented on SQOOP-3022:
-

Hi [~Tagar],

I would strongly encourage you to do an exhaustive testing on your side, before 
closing this ticket.

Functionality wise I do think OraOop is feature complete, and you could achieve 
your goals with it (as I've referred to it before, and as [~david.robson] 
confirmed that too).

Although as partitioned/non partitioned versions of the import are very much 
different from performance POV, and as there is an ongoing performance related 
change ( [SQOOP-2983] which might be interesting for you ), I think it would 
make sense to evaluate if the current solution+performance is satisfying for 
you (from the past I do remember you had serious performance related 
constraints in your system/pipline).


> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)