Release to support Hadoop 3

2018-04-12 Thread Dániel Vörös
Dear All,

After some development towards supporting Hadoop 3 (and latest version of
downstream components) I'd like to summarize the current state of the
upgrade and start the conversation about releasing a new version of Sqoop
with Hadoop 3 support.

Here's what happened so far:
 - Upgraded Hadoop dependency to 3.0.0
 - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
 - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
 - Dealt with a bunch of minor issues like changed Hadoop configuration
names and different packaging of Maven artifacts.

For details please refer to this ticket and the attached review request:
https://issues.apache.org/jira/browse/SQOOP-3305

Remaining work:
 - Parquet importing doesn't work. It was broken by a standalone-metastore
change in Hive and fixing would require a new Kite version to be built
against Hive 3.
 - Hive 3 is going to enable ACID tables by default. We should support
importing into these. Details:
https://issues.apache.org/jira/browse/SQOOP-3311

Other blocking issues:
 - There's no Hive 3 release (no alpha/beta) yet.

I'd like to kindly ask you all to share any other tasks/issues you know of
that we should address to support the latest versions. Also, there are a
couple open questions:
 1) How to get a new Kite release? Maybe we should remove the Kite
dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
 2) Should we drop support for Hadoop 2?
 3) What version number should we use? To avoid confusion with Sqoop2 I'd
go with 3.0.
 4) Does (should?) this affect the 1.5 release?

Regards,
Daniel


Re: Review Request 66361: Implement HiveServer2 client

2018-04-12 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66361/
---

(Updated April 12, 2018, 2:10 p.m.)


Review request for Sqoop.


Changes
---

Fail fast is introduced for --hs2-url and --as-parquetfile option.


Bugs: SQOOP-3309
https://issues.apache.org/jira/browse/SQOOP-3309


Repository: sqoop-trunk


Description
---

This JIRA covers the implementation of the client for HiveServer2 and its 
integration into the classes which use HiveImport.

- HiveClient interface is introduced with 2 implementation:
  - HiveImport: this is the original implementation which uses HiveCLI
  - HiveServer2Client: the new clients which connects to HS2 using JDBC 
connection
  - The common code is extracted to HiveCommon class
- HiveClient should be instantiated using HiveClientFactory which creates and 
configures the right HiveClient based on the configuration in SqoopOptions
- HiveMiniCluster is introduced with a couple of helper classes to enable 
end-to-end HS2 tests
- A couple of new options are added to SqoopOptions to be able to configure the 
connection to HS2
- Validation is implemented for these new options


Diffs (updated)
-

  build.xml 7f68b573c65a61150ca78d158084586c87775d84 
  ivy.xml 6be4fa20fbbf1f303c69d86942b1874e18a14afc 
  src/docs/user/hive-args.txt 441f54e8e0cee63595937f4e1811abc2d89f9237 
  src/docs/user/hive.txt 3dc8bb463d602d525fe5f2d07d52cb97efcbab7e 
  src/java/org/apache/sqoop/SqoopOptions.java 
651cebd69ee7e75d06c75945e3607c4fab7eb11c 
  src/java/org/apache/sqoop/hive/HiveClient.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveClientCommon.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveClientFactory.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveImport.java 
c2729119d31f7e585f204f2d31b2051eea71b72b 
  src/java/org/apache/sqoop/hive/HiveServer2Client.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveServer2ConnectionFactory.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/TableDefWriter.java 
b7a25b7809e0d50166966a77161dc8ff603fb2d2 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 
b02e4fe7fda25c7f8171c7db17d15a7987459687 
  src/java/org/apache/sqoop/tool/CreateHiveTableTool.java 
d259566180369a55d490144e6f865e728f4f2e61 
  src/java/org/apache/sqoop/tool/ImportAllTablesTool.java 
18f7a0af48d972d5186e9414475e080f1eb765f3 
  src/java/org/apache/sqoop/tool/ImportTool.java 
e9920058858653bec7407bf7992eb6445401e813 
  src/test/org/apache/sqoop/hive/TestHiveClientFactory.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveMiniCluster.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveServer2Client.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestTableDefWriter.java 
8bdc3beb3677312ec0ee2e612616358bca4ca838 
  src/test/org/apache/sqoop/hive/minicluster/AuthenticationConfiguration.java 
PRE-CREATION 
  src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java PRE-CREATION 
  
src/test/org/apache/sqoop/hive/minicluster/KerberosAuthenticationConfiguration.java
 PRE-CREATION 
  src/test/org/apache/sqoop/hive/minicluster/NoAuthenticationConfiguration.java 
PRE-CREATION 
  
src/test/org/apache/sqoop/hive/minicluster/PasswordAuthenticationConfiguration.java
 PRE-CREATION 
  src/test/org/apache/sqoop/testutil/HiveServer2TestUtil.java PRE-CREATION 
  src/test/org/apache/sqoop/tool/TestHiveServer2OptionValidations.java 
PRE-CREATION 
  src/test/org/apache/sqoop/tool/TestImportTool.java 
1c0cf4d863692f75bb8831e834fae47fc18b5df5 


Diff: https://reviews.apache.org/r/66361/diff/4/

Changes: https://reviews.apache.org/r/66361/diff/3-4/


Testing
---

Ran unit and third party tests suite.


Thanks,

Szabolcs Vasas



Re: Review Request 66361: Implement HiveServer2 client

2018-04-12 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66361/
---

(Updated April 12, 2018, 1:51 p.m.)


Review request for Sqoop.


Changes
---

Documentation is added.


Bugs: SQOOP-3309
https://issues.apache.org/jira/browse/SQOOP-3309


Repository: sqoop-trunk


Description (updated)
---

This JIRA covers the implementation of the client for HiveServer2 and its 
integration into the classes which use HiveImport.

- HiveClient interface is introduced with 2 implementation:
  - HiveImport: this is the original implementation which uses HiveCLI
  - HiveServer2Client: the new clients which connects to HS2 using JDBC 
connection
  - The common code is extracted to HiveCommon class
- HiveClient should be instantiated using HiveClientFactory which creates and 
configures the right HiveClient based on the configuration in SqoopOptions
- HiveMiniCluster is introduced with a couple of helper classes to enable 
end-to-end HS2 tests
- A couple of new options are added to SqoopOptions to be able to configure the 
connection to HS2
- Validation is implemented for these new options


Diffs (updated)
-

  build.xml 7f68b573c65a61150ca78d158084586c87775d84 
  ivy.xml 6be4fa20fbbf1f303c69d86942b1874e18a14afc 
  src/docs/user/hive-args.txt 441f54e8e0cee63595937f4e1811abc2d89f9237 
  src/docs/user/hive.txt 3dc8bb463d602d525fe5f2d07d52cb97efcbab7e 
  src/java/org/apache/sqoop/SqoopOptions.java 
651cebd69ee7e75d06c75945e3607c4fab7eb11c 
  src/java/org/apache/sqoop/hive/HiveClient.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveClientCommon.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveClientFactory.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveImport.java 
c2729119d31f7e585f204f2d31b2051eea71b72b 
  src/java/org/apache/sqoop/hive/HiveServer2Client.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/HiveServer2ConnectionFactory.java PRE-CREATION 
  src/java/org/apache/sqoop/hive/TableDefWriter.java 
b7a25b7809e0d50166966a77161dc8ff603fb2d2 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 
b02e4fe7fda25c7f8171c7db17d15a7987459687 
  src/java/org/apache/sqoop/tool/CreateHiveTableTool.java 
d259566180369a55d490144e6f865e728f4f2e61 
  src/java/org/apache/sqoop/tool/ImportAllTablesTool.java 
18f7a0af48d972d5186e9414475e080f1eb765f3 
  src/java/org/apache/sqoop/tool/ImportTool.java 
e9920058858653bec7407bf7992eb6445401e813 
  src/test/org/apache/sqoop/hive/TestHiveClientFactory.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveMiniCluster.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveServer2Client.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestTableDefWriter.java 
8bdc3beb3677312ec0ee2e612616358bca4ca838 
  src/test/org/apache/sqoop/hive/minicluster/AuthenticationConfiguration.java 
PRE-CREATION 
  src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java PRE-CREATION 
  
src/test/org/apache/sqoop/hive/minicluster/KerberosAuthenticationConfiguration.java
 PRE-CREATION 
  src/test/org/apache/sqoop/hive/minicluster/NoAuthenticationConfiguration.java 
PRE-CREATION 
  
src/test/org/apache/sqoop/hive/minicluster/PasswordAuthenticationConfiguration.java
 PRE-CREATION 
  src/test/org/apache/sqoop/testutil/HiveServer2TestUtil.java PRE-CREATION 
  src/test/org/apache/sqoop/tool/TestHiveServer2OptionValidations.java 
PRE-CREATION 
  src/test/org/apache/sqoop/tool/TestImportTool.java 
1c0cf4d863692f75bb8831e834fae47fc18b5df5 


Diff: https://reviews.apache.org/r/66361/diff/3/

Changes: https://reviews.apache.org/r/66361/diff/2-3/


Testing
---

Ran unit and third party tests suite.


Thanks,

Szabolcs Vasas



Re: Review Request 66446: SQOOP-2567 SQOOP import for Oracle fails with invalid precision/scale for decimal

2018-04-12 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66446/#review200998
---



Hi Feró,

Thank you for submitting this patch, please find my findings inline.


src/java/org/apache/sqoop/config/ConfigurationHelper.java
Lines 255 (patched)


This variable can have a less specific name.



src/java/org/apache/sqoop/manager/oracle/OracleUtils.java
Lines 21 (patched)


Please remove unused imports.



src/test/org/apache/sqoop/TestAvroImportForNumericTypes.java
Lines 59 (patched)


Nice solution to parameterize the test cases instead of introducing 
inheritance!
I have a couple of suggestions here:
- ImportJobTestConfiguration contains methods which could be reused across 
different test cases (e.g. dropTableIfExists) and methods which are specific to 
TestAvroImportForNumericTypes. I think these should be split into seprate 
hierarchies and packages.
- I would also move the failWithoutPadding and failWithoutDefaults boolean 
values from the configuration classes and provide them as a separate parameter 
to the TestAvroImportForNumericTypes. This would improve the readability of the 
test since one would not have to navigate to another classes to determine if a 
test case should succeed or fail.
- If a test case succeeds with a specific 
org.apache.sqoop.testutil.configuration.ImportJobTestConfiguration#getTypes 
array that basically means that all of the enumerated data types work. However 
if a test case fails with such an array it means that  Sqoop does not work with 
at least one of these types but it is not visible which one. I think it would 
be great if we could somehow separate these data types into separate 
configurations maybe.



src/test/org/apache/sqoop/manager/mysql/MySQLTestUtils.java
Lines 26 (patched)


Unused import.


- Szabolcs Vasas


On April 12, 2018, 10:06 a.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66446/
> ---
> 
> (Updated April 12, 2018, 10:06 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2567
> https://issues.apache.org/jira/browse/SQOOP-2567
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This fix allows the user to specify default precision and scale for avro 
> schemas. The default values are then used to override "invalid" values, (when 
> the database returns 0s as precision) and in case of oracle, the -127 scale 
> value. 
> 
> **Key points**
> - The implementation takes place in the ConnManager#toAvroLogicalType 
> function and the overriding funcitons in OraOopConnManager and OracleManager
> - Testing is covered very thoroughly by the TestAvroImportForNumericTypes 
> class and multiple configurations are used to cover MySQL, Oracle, Postgres 
> and MS SQL.
> 
> **Implementation specific concerns**
> - The edge cases aren't well documented. These tests aim to cover the 
> NUMBER/NUMERIC and DECIMAL types with or without specified scale and 
> precision thoroughly. Are there any missed testcases?
> - The new parameters act as overrides only for PSQL and Oracle databases, 
> because we the other databases translate the missing precision to valid 
> values. Even though this is true, I've added testcases for MS SQL and MySQL.
> 
> - In case of Oracle 
> The databae returns if user doesn't specify the default scale and the db 
> return -127, we adjust the precision by that much.
> Should we throw an exception instead?
> 
> - The default precision has to be specified. If it's not there and the 
> database returns 0 we throw an exception. 
> - Instead, if the default precision and scale aren't there, we could just use 
> the maximum possible value i.e. 38 + 127 = 165 as precision and 127 as scale, 
> that would fit everything in a very inefficient manner, mostly containing 0s. 
> (This also opens up the question whether there is an efficient way to store 
> numbers with many 0s in avro.)
> 
> **Testing specific concerns**
> - The ImportJobTestConfiguration#dropTableIfExists method is not really a 
> test configuration related method, however at the time of development, it 
> made sense to have it there. This might be better off in another place, such 
> as BaseSqoopTest (though I'm unsure how that implementation would look like.)
> - The SqlUtil class was created solely to provide a place for the 
> executeStatement method. This might also be better off in another class, such 
> as BaseSqoopTest.
> 
> 
> Diffs
> -
> 
>   

Re: Review Request 66300: Upgrade to Hadoop 3.0.0

2018-04-12 Thread Boglarka Egyed

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66300/#review200992
---



Hi Dani,

Thank you for taking care of these upgrades!

Would it be possible to split this change up into two separate ones: Hadoop and 
Hive/HBase upgrades? I'm asking because now we are depending only on the Hive 3 
release and I'm wondering if we could procedd with upgrading the Hadoop version 
in the meantime.

I also agree on considering to have these changes in a major Sqoop release - 
could you maybe start a discussion about it on sqoop-dev@ mailing list please?

Many thanks,
Bogi

- Boglarka Egyed


On March 27, 2018, 8:50 a.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66300/
> ---
> 
> (Updated March 27, 2018, 8:50 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3305
> https://issues.apache.org/jira/browse/SQOOP-3305
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> 
> 
> Diffs
> -
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
>   src/java/org/apache/sqoop/config/ConfigurationHelper.java e07a699 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 6d1e049 
>   src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 784b5f2 
>   src/java/org/apache/sqoop/util/SqoopJsonUtil.java adf186b 
>   src/test/org/apache/sqoop/TestSqoopOptions.java bb7c20d 
>   src/test/org/apache/sqoop/util/TestSqoopJsonUtil.java fdf972c 
>   testdata/hcatalog/conf/hive-site.xml edac7aa 
> 
> 
> Diff: https://reviews.apache.org/r/66300/diff/3/
> 
> 
> Testing
> ---
> 
> Normal and third-party unit tests.
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66446: SQOOP-2567 Sqoop import using oraoop fails validation intermittently

2018-04-12 Thread Fero Szabo via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66446/
---

(Updated April 12, 2018, 10:06 a.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2567
https://issues.apache.org/jira/browse/SQOOP-2567


Repository: sqoop-trunk


Description
---

This fix allows the user to specify default precision and scale for avro 
schemas. The default values are then used to override "invalid" values, (when 
the database returns 0s as precision) and in case of oracle, the -127 scale 
value. 

**Key points**
- The implementation takes place in the ConnManager#toAvroLogicalType function 
and the overriding funcitons in OraOopConnManager and OracleManager
- Testing is covered very thoroughly by the TestAvroImportForNumericTypes class 
and multiple configurations are used to cover MySQL, Oracle, Postgres and MS 
SQL.

**Implementation specific concerns**
- The edge cases aren't well documented. These tests aim to cover the 
NUMBER/NUMERIC and DECIMAL types with or without specified scale and precision 
thoroughly. Are there any missed testcases?
- The new parameters act as overrides only for PSQL and Oracle databases, 
because we the other databases translate the missing precision to valid values. 
Even though this is true, I've added testcases for MS SQL and MySQL.

- In case of Oracle 
The databae returns if user doesn't specify the default scale and the db return 
-127, we adjust the precision by that much.
Should we throw an exception instead?

- The default precision has to be specified. If it's not there and the database 
returns 0 we throw an exception. 
- Instead, if the default precision and scale aren't there, we could just use 
the maximum possible value i.e. 38 + 127 = 165 as precision and 127 as scale, 
that would fit everything in a very inefficient manner, mostly containing 0s. 
(This also opens up the question whether there is an efficient way to store 
numbers with many 0s in avro.)

**Testing specific concerns**
- The ImportJobTestConfiguration#dropTableIfExists method is not really a test 
configuration related method, however at the time of development, it made sense 
to have it there. This might be better off in another place, such as 
BaseSqoopTest (though I'm unsure how that implementation would look like.)
- The SqlUtil class was created solely to provide a place for the 
executeStatement method. This might also be better off in another class, such 
as BaseSqoopTest.


Diffs
-

  src/java/org/apache/sqoop/config/ConfigurationConstants.java 2197025b 
  src/java/org/apache/sqoop/config/ConfigurationHelper.java e07a6998 
  src/java/org/apache/sqoop/manager/ConnManager.java d88b59bd 
  src/java/org/apache/sqoop/manager/OracleManager.java 929b5061 
  src/java/org/apache/sqoop/manager/SqlManager.java fe997c5f 
  src/java/org/apache/sqoop/manager/oracle/OraOopConnManager.java 09207bb4 
  src/java/org/apache/sqoop/manager/oracle/OracleUtils.java aa56e708 
  src/test/org/apache/sqoop/TestAvroImportForNumericTypes.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/mysql/MySQLLobAvroImportTest.java a6121c9a 
  src/test/org/apache/sqoop/manager/mysql/MySQLTestUtils.java 75ecc357 
  src/test/org/apache/sqoop/manager/oracle/util/OracleUtils.java 6d752aa4 
  src/test/org/apache/sqoop/manager/postgresql/PostgresqlImportTest.java 
846228a1 
  src/test/org/apache/sqoop/manager/postgresql/PostgresqlTestUtil.java 
PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/testutil/SqlUtil.java PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/ImportJobTestConfiguration.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/MSSQLServerImportJobTestConfiguration.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/MySQLImportJobTestConfiguration.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/OracleImportJobTestConfiguration.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/PostgresqlImportJobTestConfiguration.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/testutil/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66446/diff/1/


Testing
---

unit tests and 3rd party tests.


Thanks,

Fero Szabo