[jira] [Assigned] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-04 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3134:
---

Assignee: Daniel Voros  (was: Eric Lin)

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-04 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809951#comment-16809951
 ] 

Daniel Voros commented on SQOOP-3134:
-

Submitted PR: https://github.com/apache/sqoop/pull/78

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808645#comment-16808645
 ] 

Daniel Voros commented on SQOOP-3134:
-

Tests have passed for this patch: 
https://travis-ci.org/dvoros/sqoop/builds/515049441

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807898#comment-16807898
 ] 

Daniel Voros commented on SQOOP-3134:
-

[~ericlin] I've attached the change I had in mind. Would you mind if I were to 
take this over?

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3134:

Attachment: SQOOP-3134.1.patch

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807883#comment-16807883
 ] 

Daniel Voros commented on SQOOP-3134:
-

Just ran into this. Instead of introducing a new option, this could probably 
also be controlled with {{--class-name}}. It would only need a small change in 
the code path changed by SQOOP-2783 to also check for {{className == null}}.

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SQOOP-3289) Add .travis.yml

2018-11-23 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3289:
---

Assignee: Szabolcs Vasas  (was: Daniel Voros)

Thank you [~vasas] for your effort, this is a lot more then what I've had in 
the old review request, so I've closed that one.

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Szabolcs Vasas
>Priority: Minor
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3289.patch
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69433: Setting up Travis CI using Gradle test categories

2018-11-23 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69433/#review210826
---


Ship it!




So cool, thanks for picking this up! I believe this can truly make a difference 
for present and future developers. (: Ship it!

- daniel voros


On Nov. 23, 2018, 10:33 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69433/
> ---
> 
> (Updated Nov. 23, 2018, 10:33 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3289
> https://issues.apache.org/jira/browse/SQOOP-3289
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The patch includes the following changes:
> - Changed the default DB connection parameters to Docker image defaults so 
> the test tasks can be started without specifying connection parameters
> - Connection parameter settings duplications are removed
> - Most of the JDBC drivers are downloaded from Maven repositories the only 
> exception is Oracle. Contributors have to upload ojdbc6.jar to a public drive 
> and make it available to the CI job by setting the ORACLE_DRIVER_URL in Travis
> - Introduced separate test tasks for each databases
> - An Oracle Express Edition Docker image is added to 
> sqoop-thirdpartytest-db-services.yml so Oracle tests which does not require 
> Oracle EE features can be executed much easier
> - The ports for MySQL and PostgreSQL Docker containers are changed because 
> the default ones were used in the Travis VM already.
> - Introduced OracleEe test category for tests requiring Oracle EE database. 
> These tests won't be executed on Travis. The good news is that only a few 
> tests require Oracle EE
> 
> Documentation is still coming feel free to provide a feedback!
> 
> 
> Diffs
> -
> 
>   .travis.yml PRE-CREATION 
>   COMPILING.txt b399ba825 
>   build.gradle efe980d67 
>   build.xml a0e25191e 
>   gradle.properties 722bc8bb2 
>   src/scripts/thirdpartytest/docker-compose/oraclescripts/ee-healthcheck.sh 
> PRE-CREATION 
>   src/scripts/thirdpartytest/docker-compose/oraclescripts/healthcheck.sh 
> fb7800efe 
>   
> src/scripts/thirdpartytest/docker-compose/sqoop-thirdpartytest-db-services.yml
>  b4cf48863 
>   src/test/org/apache/sqoop/manager/cubrid/CubridTestUtils.java 4fd522bae 
>   
> src/test/org/apache/sqoop/manager/db2/DB2ImportAllTableWithSchemaManualTest.java
>  ed949b98f 
>   src/test/org/apache/sqoop/manager/db2/DB2ManagerImportManualTest.java 
> 32dfc5eb2 
>   src/test/org/apache/sqoop/manager/db2/DB2TestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/db2/DB2XmlTypeImportManualTest.java 
> 494c75b08 
>   src/test/org/apache/sqoop/manager/mysql/MySQLTestUtils.java be205c877 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java a60168719 
>   src/test/org/apache/sqoop/manager/oracle/ImportTest.java 5db9fe34e 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 1598813d8 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java 1f67c4697 
>   src/test/org/apache/sqoop/manager/oracle/OracleConnectionFactoryTest.java 
> 34e182f4c 
>   src/test/org/apache/sqoop/manager/oracle/TimestampDataTest.java be086c5c2 
>   src/test/org/apache/sqoop/manager/oracle/util/OracleUtils.java 14b57f91a 
>   
> src/test/org/apache/sqoop/manager/postgresql/DirectPostgreSQLExportManualTest.java
>  7dd6efcf9 
>   
> src/test/org/apache/sqoop/manager/postgresql/PGBulkloadManagerManualTest.java 
> 1fe264456 
>   src/test/org/apache/sqoop/manager/postgresql/PostgresqlExportTest.java 
> eb798fa99 
>   
> src/test/org/apache/sqoop/manager/postgresql/PostgresqlExternalTableImportTest.java
>  8c3d2fd90 
>   src/test/org/apache/sqoop/manager/postgresql/PostgresqlTestUtil.java 
> e9705e5da 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java bd12c5566 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerManagerExportTest.java 
> ab1e8ff2d 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerManagerImportTest.java 
> 3c5bb327e 
>   src/test/org/apache/sqoop/metastore/db2/DB2JobToolTest.java 81ef5fce6 
>   
> src/test/org/apache/sqoop/metastore/db2/DB2MetaConnectIncrementalImportTest.java
>  5403908e2 
>   src/test/org/apache/sqoop/metastore/db2/DB2SavedJobsTest.java b41eda110 
>   src/test/org/apache/sqoop/metastore/postgres/PostgresJobToolTest.java 
> 59ea151a5 
>   
> src/test/org/apache/sqoop/metastore/postgres/PostgresMetaConnectIncrementalImportTe

[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-16 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651992#comment-16651992
 ] 

Daniel Voros commented on SQOOP-3378:
-

Thanks for letting me know [~vasas]! I can confirm, this is failing for me on 
trunk as well when running on Linux. It passes on Mac however. I've opened 
SQOOP-3393 to look into this.

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3393) TestNetezzaExternalTableExportMapper hangs

2018-10-16 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3393:
---

 Summary: TestNetezzaExternalTableExportMapper hangs
 Key: SQOOP-3393
 URL: https://issues.apache.org/jira/browse/SQOOP-3393
 Project: Sqoop
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0, 3.0.0
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


Introduced in SQOOP-3378, spotted by [~vasas].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68687: SQOOP-3381 Upgrade the Parquet library

2018-10-16 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68687/#review209630
---


Ship it!




Thanks Fero, for taking care of this. Ship it!

- daniel voros


On Oct. 16, 2018, 9:37 a.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68687/
> ---
> 
> (Updated Oct. 16, 2018, 9:37 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3381
> https://issues.apache.org/jira/browse/SQOOP-3381
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This change upgrades our parquet library to the newest version and a whole 
> lot of libraries to newer versions with it.
> 
> As we will need to register a data supplier in the fix for parquet decimal 
> support (SQOOP-3382), we will need a version that contains PARQUET-243. We 
> need to upgrade the Parquet library to a version that contains this fix and 
> is compatible with Hadoop 3.0.
> 
> A few things to note:
> - hadoop's version is still 2.8.0
> - hive is upgraded to 2.1.1
> - the rest of the dependency changes are required for the hive version bump.
> 
> There is are a few changes in the codebase, but of course no new 
> functionality at all:
> - in the TestParquetImport class, the new implementation returns a Utf8 
> object for Strings written out.
> - Added the security policy and related code changes from the patch for 
> SQOOP-3305 (upgrade hadoop) written by Daniel Voros.
> - modified HiveMiniCluster config so it won't try to start a web ui (it's 
> unnecessary during tests anyway)
> 
> 
> Diffs
> -
> 
>   build.gradle fc7fc0c4 
>   gradle.properties 0d30378d 
>   gradle/sqoop-package.gradle 1a8d994d 
>   ivy.xml 670cb32d 
>   ivy/libraries.properties 8f3dab2b 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1 
>   src/java/org/apache/sqoop/hive/HiveImport.java 48800366 
>   src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 784b5f2a 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  2180cc20 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  90b910a3 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  66ebc5b8 
>   src/test/org/apache/sqoop/TestParquetExport.java be1d8164 
>   src/test/org/apache/sqoop/TestParquetImport.java 2810e318 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc1 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4 
>   src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java 9dd54486 
>   src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10 
>   testdata/hcatalog/conf/hive-site.xml 8a84a5d3 
> 
> 
> Diff: https://reviews.apache.org/r/68687/diff/5/
> 
> 
> Testing
> ---
> 
> Ant unit and 3rd party tests were successful.
> gradlew test and thirdpartytest were succesful as well.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-11 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646190#comment-16646190
 ] 

Daniel Voros commented on SQOOP-3378:
-

Uploaded, thank you [~vasas].

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-11 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3378:

Attachment: SQOOP-3378.2.patch

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3381) Upgrade the Parquet library from 1.6.0 to 1.9.0

2018-10-05 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639659#comment-16639659
 ] 

Daniel Voros commented on SQOOP-3381:
-

With SQOOP-3305 I've decided to hold off until there's an HBase release that 
supports Hadoop 3.x. I don't think Hive 3.1.0 would help in this regard, since 
parquet classes are still shaded in hive-exec:3.1.0.

> Upgrade the Parquet library from 1.6.0 to 1.9.0
> ---
>
> Key: SQOOP-3381
> URL: https://issues.apache.org/jira/browse/SQOOP-3381
> Project: Sqoop
>  Issue Type: Sub-task
>Affects Versions: 1.4.7
>Reporter: Fero Szabo
>Assignee: Fero Szabo
>Priority: Major
> Fix For: 3.0.0
>
>
> As we will need to register a data supplier in the fix for parquet decimal 
> support, we will need a version that contains PARQUET-243.
> We need to upgrade the Parquet library to a version that contains this fix 
> and is compatible with Hadoop. Most probably, the newest version will be 
> adequate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3381) Upgrade the Parquet library from 1.6.0 to 1.9.0

2018-09-12 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612402#comment-16612402
 ] 

Daniel Voros commented on SQOOP-3381:
-

Hey [~fero], thanks for keeping that in mind. What I've seen during the hadoop3 
upgrade, is that Avro is added to the MR classpath from under hadoop. So where 
this could lead to issues is conflicting versions of Avro in hadoop and Parquet 
shipped with Sqoop.

Could you try your patch (having new parquet jar in lib/) on a cluster with 
current Hadoop versions? I don't think we should bother with testing with 
Hadoop 3, we'll face that in the Hadoop 3 patch.

(One more thing to keep in mind, is that parquet-hadoop-bundle is also shaded 
into the hive-exec artifact. However, I think the classes involved in 
PARQUET-243 are not bundled there.)

> Upgrade the Parquet library from 1.6.0 to 1.9.0
> ---
>
> Key: SQOOP-3381
> URL: https://issues.apache.org/jira/browse/SQOOP-3381
> Project: Sqoop
>  Issue Type: Sub-task
>Affects Versions: 1.4.7
>Reporter: Fero Szabo
>Assignee: Fero Szabo
>Priority: Major
> Fix For: 3.0.0
>
>
> As we will need to register a data supplier in the fix for parquet decimal 
> support, we will need a version that contains PARQUET-243.
> We need to upgrade the Parquet library to a version that contains this fix 
> and is compatible with Hadoop. Most probably, the newest version will be 
> adequate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3374) Assigning HDFS path to --bindir is giving error "java.lang.reflect.InvocationTargetException"

2018-09-04 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602719#comment-16602719
 ] 

Daniel Voros commented on SQOOP-3374:
-

[~amjosh911] setting an HDFS location for {{--bindir}} is not supported at the 
moment. What is your use-case that would require to do so? A workaround might 
be putting the generated files on HDFS manually after the Sqoop job finishes.

> Assigning HDFS path to --bindir is giving error 
> "java.lang.reflect.InvocationTargetException"
> -
>
> Key: SQOOP-3374
> URL: https://issues.apache.org/jira/browse/SQOOP-3374
> Project: Sqoop
>  Issue Type: Wish
>  Components: sqoop2-api
>Reporter: Amit Joshi
>Priority: Blocker
>
> When I am trying to assign the HDFS directory path to --bindir in my sqoop 
> command, it is throwing error "java.lang.reflect.InvocationTargetException".
> My sqoop query looks like this:
> sqoop import -connect connection_string --username username --password-file 
> file_path --query 'select * from EDW_PROD.RXCLM_LINE_FACT_DENIED 
> PARTITION(RXCLM_LINE_FACTP201808) where $CONDITIONS' --as-parquetfile 
> --compression-codec org.apache.hadoop.io.compress.SnappyCodec --append 
> --target-dir target_dir *-bindir hdfs://user/projects/* --split-by RX_ID 
> --null-string '/N' --null-non-string '/N' --fields-terminated-by ',' -m 10
>  
> It is creating folder "hdfs:" in my home directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE

2018-09-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602110#comment-16602110
 ] 

Daniel Voros commented on SQOOP-3058:
-

[~kuldeepkulkarn...@gmail.com], I don't think there's a workaround, but please 
note that this issue is only about reporting an extra NPE in case of an error.

I've submitted a patch to throw a more meaningful exception.

> Sqoop import with Netezza --direct fails properly but also produces NPE
> ---
>
> Key: SQOOP-3058
> URL: https://issues.apache.org/jira/browse/SQOOP-3058
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Markus Kemper
>    Assignee: Daniel Voros
>Priority: Major
>
> The [error] is expected however the [npe] seems like a defect, see [test 
> case] below
> [error]
> ERROR:  relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> [npe]
> 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.NullPointerException
> [test case]
> {noformat}
> #
> # STEP 01 - Setup Netezza Table and Data
> #
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DROP TABLE SQOOP_SME1.T1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "INSERT INTO SQOOP_SME1.T1 VALUES (1)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> #
> # STEP 02 - Test Import and Export (baseline)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
>   
> #
> # STEP 03 - Test Import and Export (with SCHEMA in --table option AND 
> --direct)
> #
> /* Notes: This failure seems correct however the NPE after the failure seems 
> like a defect  */
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> 16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: 
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
>   at 
> org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280)
>   at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)
>   at org.netezza.sql.NzConnection.execute(NzConnection.java:2869)
>   at 
> org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126)
>   at 
> org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143)
>   at 
> org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89)
>   at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116)
>

Review Request 68607: Sqoop import with Netezza --direct fails properly but also produces NPE

2018-09-03 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68607/
---

Review request for Sqoop.


Bugs: SQOOP-3058
https://issues.apache.org/jira/browse/SQOOP-3058


Repository: sqoop-trunk


Description
---

We're not interrupting the import if we were unable to get column names, that 
leads to NPE later. We should check for null instead and throw some more 
meaningful exception.


Diffs
-

  
src/java/org/apache/sqoop/mapreduce/netezza/NetezzaExternalTableExportJob.java 
11ac95df 
  
src/test/org/apache/sqoop/mapreduce/netezza/TestNetezzaExternalTableExportJob.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68607/diff/1/


Testing
---

added UT


Thanks,

daniel voros



[jira] [Assigned] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE

2018-09-03 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3058:
---

Assignee: Daniel Voros

> Sqoop import with Netezza --direct fails properly but also produces NPE
> ---
>
> Key: SQOOP-3058
> URL: https://issues.apache.org/jira/browse/SQOOP-3058
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Markus Kemper
>    Assignee: Daniel Voros
>Priority: Major
>
> The [error] is expected however the [npe] seems like a defect, see [test 
> case] below
> [error]
> ERROR:  relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> [npe]
> 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.NullPointerException
> [test case]
> {noformat}
> #
> # STEP 01 - Setup Netezza Table and Data
> #
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DROP TABLE SQOOP_SME1.T1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "INSERT INTO SQOOP_SME1.T1 VALUES (1)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> #
> # STEP 02 - Test Import and Export (baseline)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
>   
> #
> # STEP 03 - Test Import and Export (with SCHEMA in --table option AND 
> --direct)
> #
> /* Notes: This failure seems correct however the NPE after the failure seems 
> like a defect  */
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> 16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: 
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
>   at 
> org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280)
>   at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)
>   at org.netezza.sql.NzConnection.execute(NzConnection.java:2869)
>   at 
> org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126)
>   at 
> org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143)
>   at 
> org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89)
>   at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116)
>   at 
> org.apache.sqoop.mapreduce.netezza.NetezzaExternalTableExportJob.configureOutputFormat(NetezzaExternalTableExportJob.java:128)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
>   at

[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-09-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602058#comment-16602058
 ] 

Daniel Voros commented on SQOOP-3378:
-

Attached review request.

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 68606: Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-09-03 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68606/
---

Review request for Sqoop.


Bugs: SQOOP-3378
https://issues.apache.org/jira/browse/SQOOP-3378


Repository: sqoop-trunk


Description
---

`SQLException` during JDBC operation in direct Netezza import/export signals 
parent thread to fail fast by interrupting it.
We're trying to process the interrupt in the parent (main) thread, but there's 
no guarantee that we're not in some internal call that will process the 
interrupted flag and reset it before we're able to check.

It is also possible that the parent thread has passed the "checking part" when 
it gets interrupted. In case of `NetezzaExternalTableExportMapper` this can 
interrupt the upload of log files.

I'd recommend using some other means of communication between the threads than 
interrupts.


Diffs
-

  
src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java
 5bf21880 
  
src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableImportMapper.java
 306062aa 
  
src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java 
cedfd235 
  
src/test/org/apache/sqoop/mapreduce/db/netezza/TestNetezzaExternalTableExportMapper.java
 PRE-CREATION 
  
src/test/org/apache/sqoop/mapreduce/db/netezza/TestNetezzaExternalTableImportMapper.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68606/diff/1/


Testing
---

added new UTs and checked manual Netezza tests (NetezzaExportManualTest, 
NetezzaImportManualTest)


Thanks,

daniel voros



[jira] [Created] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-09-03 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3378:
---

 Summary: Error during direct Netezza import/export can interrupt 
process in uncontrolled ways
 Key: SQOOP-3378
 URL: https://issues.apache.org/jira/browse/SQOOP-3378
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


SQLException during JDBC operation in direct Netezza import/export signals 
parent thread to fail fast by interrupting it (see 
[here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).

We're [trying to process the interrupt in the 
parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
 (main) thread, but there's no guarantee that we're not in some blocking 
internal call that will process the interrupted flag and reset it before we're 
able to check.

It is also possible that the parent thread has passed the "checking part" when 
it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} this can 
interrupt the upload of log files.

I'd recommend using some other means of communication between the threads than 
interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68569: HiveMiniCluster does not restore hive-site.xml location

2018-09-03 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68569/#review208249
---


Ship it!




Ship It!

- daniel voros


On Aug. 30, 2018, 11:27 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68569/
> ---
> 
> (Updated Aug. 30, 2018, 11:27 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3375
> https://issues.apache.org/jira/browse/SQOOP-3375
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> HiveMiniCluster sets the hive-site.xml location using 
> org.apache.hadoop.hive.conf.HiveConf#setHiveSiteLocation static method during 
> startup but it does not restore the original location during shutdown.
> 
> This makes HCatalogImportTest and HCatalogExportTest fail if they are ran in 
> the same JVM after any test using HiveMiniCluster.
> 
> 
> Diffs
> -
> 
>   src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java 19bb7605c 
> 
> 
> Diff: https://reviews.apache.org/r/68569/diff/1/
> 
> 
> Testing
> ---
> 
> Executed unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Re: Review Request 68541: SQOOP-3104: Create test categories instead of test suites and naming conventions

2018-08-29 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68541/#review208088
---


Ship it!




Great stuff! Checked `test`, `unitTest` and `integrationPlainTest`.

My only concern is forgetting to apply `@Category` on future test classes. We 
wouldn't execute without that, right? Any ideas how to prevent this from 
happening?

- daniel voros


On Aug. 28, 2018, 3:52 p.m., Nguyen Truong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68541/
> ---
> 
> (Updated Aug. 28, 2018, 3:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3104
> https://issues.apache.org/jira/browse/SQOOP-3104
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> We are currently unsing test naming conventions to differentiate between 
> ManualTests, Unit tests and 3rd party tests. Instead of that, I implemented 
> junit categories which will allow us to have more categories in the future. 
> This would also remove the reliance on the test class name.
> 
> Test categories skeleton:
>   SqoopTest _ UnitTest
>   |__ IntegrationTest
>   |__ ManualTest
> 
>   ThirdPartyTest _ CubridTest
>|__ Db2Test
>|__ MainFrameTest
>|__ MysqlTest
>|__ NetezzaTest
>|__ OracleTest
>|__ PostgresqlTest
>|__ SqlServerTest
> 
>   KerberizedTest
> 
> Categories explanation:
> * SqoopTest: Group of the big categories, including:
> - UnitTest: It tests one class only with its dependencies mocked or 
> if the dependency
> is lightweight we can keep it. It must not start a minicluster or an 
> hsqldb database.
> It does not need JCDB drivers.
> - IntegrationTest: It usually tests a whole scenario. It may start up 
> miniclusters,
> hsqldb and connect to external resources like RDBMSs.
> - ManualTest: This should be a deprecated category which should not 
> be used in the future.
> It only exists to mark the currently existing manual tests.
> * ThirdPartyTest: An orthogonal hierarchy for tests that need a JDBC 
> driver and/or a docker
> container/external RDBMS instance to run. Subcategories express what kind 
> of external
> resource the test needs. E.g: OracleTest needs an Oracle RDBMS and Oracle 
> driver on the classpath
> * KerberizedTest: Test that needs Kerberos, which needs to be run on a 
> separate JVM.
> 
> Opinions are very welcomed. Thanks!
> 
> 
> Diffs
> -
> 
>   build.gradle fc7fc0c4c 
>   src/test/org/apache/sqoop/TestConnFactory.java fb6c94059 
>   src/test/org/apache/sqoop/TestIncrementalImport.java 29c477954 
>   src/test/org/apache/sqoop/TestSqoopOptions.java e55682edf 
>   src/test/org/apache/sqoop/accumulo/TestAccumuloUtil.java 631eeff5e 
>   src/test/org/apache/sqoop/authentication/TestKerberosAuthenticator.java 
> f5700ce65 
>   src/test/org/apache/sqoop/db/TestDriverManagerJdbcConnectionFactory.java 
> 244831672 
>   
> src/test/org/apache/sqoop/db/decorator/TestKerberizedConnectionFactoryDecorator.java
>  d3e3fb23e 
>   src/test/org/apache/sqoop/hbase/HBaseKerberizedConnectivityTest.java 
> 3bfb39178 
>   src/test/org/apache/sqoop/hbase/TestHBasePutProcessor.java e78a535f4 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java ba05cabbb 
>   
> src/test/org/apache/sqoop/hive/HiveServer2ConnectionFactoryInitializerTest.java
>  4d2cb2f88 
>   src/test/org/apache/sqoop/hive/TestHiveClientFactory.java a3c2dc939 
>   src/test/org/apache/sqoop/hive/TestHiveMiniCluster.java 419f888c0 
>   src/test/org/apache/sqoop/hive/TestHiveServer2Client.java 02617295e 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java 410724f37 
>   src/test/org/apache/sqoop/hive/TestHiveTypesForAvroTypeMapping.java 
> 276e9eaa4 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 626ad22f6 
>   src/test/org/apache/sqoop/hive/TestTableDefWriterForExternalTable.java 
> f1768ee76 
>   src/test/org/apache/sqoop/io/TestCodecMap.java e71921823 
>   src/test/org/apache/sqoop/io/TestLobFile.java 2bc95f283 
>   src/test/org/apache/sqoop/io/TestNamedFifo.java a93784e08 
>   src/test/org/apache/sqoop/io/TestSplittableBufferedWrite

[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-29 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596026#comment-16596026
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] could you please open a new ticket for that with details?

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-28 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594827#comment-16594827
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] use the `--bindir` option, see 
[here|https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html].

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-28 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594718#comment-16594718
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] it is going to be included in the next release we do from trunk. 
Not sure yet if it's going to be 1.4.8, 1.5.0 or 3.0.0.

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68382: Upgrade Gradle version to 4.9

2018-08-21 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68382/#review207668
---


Ship it!




Thank you for picking this up! I've checked the following:
 - tar.gz contents (and lib/ in particular) are the same when generated with 
`./gradlew tar -x test`
 - publishing of snapshot and released artifacts works with local and remote 
repositories

I couldn't get the ant way of publishing to work with remote repositories but 
comparing to the Maven central I've noticed that we've only released 1.4.7 with 
the classifier `hadoop260`. This is something we might need to revisit when 
deploying the next release; whether it makes sense to add a classifier if we're 
only releasing a single version. (For 1.4.6 there were multiple versions: 
http://central.maven.org/maven2/org/apache/sqoop/sqoop/1.4.6/)

- daniel voros


On Aug. 16, 2018, 2:37 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68382/
> ---
> 
> (Updated Aug. 16, 2018, 2:37 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3364
> https://issues.apache.org/jira/browse/SQOOP-3364
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Apart from the Gradle version bump the change contains the following:
> - html.destination type is modified to file to avoid deprecation warning
> - the task wrapper is replaced with wrapper {} to avoid deprecation warning
> - enableFeaturePreview('STABLE_PUBLISHING') is added to settings.gradle to 
> avoid deprecation warning. This is a change I could not test since we cannot 
> publish to maven repo now. However in a case of a future release we should 
> test it as described here: 
> https://docs.gradle.org/4.9/userguide/publishing_maven.html#publishing_maven:deferred_configuration
> - The HBase test cases failed at first because the regionserver web ui was 
> not able to start up most probably because of a bad version of a Jetty class 
> on the classpath. However we do not need the regionserver web ui for the 
> Sqoop tests so instead of playing around with libraries I disabled it just 
> like we have already disabled the master web ui.
> 
> 
> Diffs
> -
> 
>   build.gradle 709172cc0 
>   gradle/wrapper/gradle-wrapper.jar 99340b4ad18d3c7e764794d300ffd35017036793 
>   gradle/wrapper/gradle-wrapper.properties 90a06cec7 
>   settings.gradle 7d64af500 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 87fce34a8 
> 
> 
> Diff: https://reviews.apache.org/r/68382/diff/1/
> 
> 
> Testing
> ---
> 
> Executed unit and third party test suite successfully.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Re: Review Request 68316: Debug toString() methods of OraOopOracleDataChunk

2018-08-16 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68316/#review207395
---


Ship it!




Ship It!

- daniel voros


On Aug. 15, 2018, 3:21 p.m., Nguyen Truong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68316/
> ---
> 
> (Updated Aug. 15, 2018, 3:21 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3362
> https://issues.apache.org/jira/browse/SQOOP-3362
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The method was currently returning the hash of data chunk object. I 
> implemented the toString() methods inside the subclasses of 
> OraOopOracleDataChunk.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopDBInputSplit.java 948bdbb73 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunk.java 
> eb67fd2e4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunkExtent.java 
> 20b39eea0 
>   
> src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunkPartition.java 
> 59889b82b 
>   
> src/test/org/apache/sqoop/manager/oracle/TestOraOopDBInputSplitGetDebugDetails.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68316/diff/4/
> 
> 
> Testing
> ---
> 
> A test case is added named TestOraOopDBInputSplitGetDebugDetails.
> 
> 
> Thanks,
> 
> Nguyen Truong
> 
>



Re: Review Request 68316: Debug toString() methods of OraOopOracleDataChunk

2018-08-15 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68316/#review207321
---



Hi Nguyen,

Thanks for your contribution! Have you considered using 
ReflectionToStringBuilder from commons-lang3? You could achieve similar results 
by overriding toString() only in OraOopOracleDataChunk without having to worry 
about future fields added to the classes:

```
  @Override
  public String toString() {
return ReflectionToStringBuilder.toString(this, 
ToStringStyle.MULTI_LINE_STYLE);
  }
```

If you decide to keep the current solution, I'd recommend replacing 
`super.getId()` with `getId()` in the toString methods.

Regards,
Daniel

- daniel voros


On Aug. 14, 2018, 12:53 p.m., Nguyen Truong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68316/
> ---
> 
> (Updated Aug. 14, 2018, 12:53 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3362
> https://issues.apache.org/jira/browse/SQOOP-3362
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The method was currently returning the hash of data chunk object. I 
> implemented the toString() methods inside the subclasses of 
> OraOopOracleDataChunk.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopDBInputSplit.java 948bdbb73 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunk.java 
> eb67fd2e4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunkExtent.java 
> 20b39eea0 
>   
> src/java/org/apache/sqoop/manager/oracle/OraOopOracleDataChunkPartition.java 
> 59889b82b 
>   
> src/test/org/apache/sqoop/manager/oracle/TestOraOopDBInputSplitGetDebugDetails.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68316/diff/3/
> 
> 
> Testing
> ---
> 
> No test case is added because the change has already covered.
> 
> 
> Thanks,
> 
> Nguyen Truong
> 
>



[jira] [Updated] (SQOOP-3052) Introduce Gradle based build for Sqoop to make it more developer friendly / open

2018-07-23 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3052:

Summary: Introduce Gradle based build for Sqoop to make it more developer 
friendly / open  (was: Introduce Maven/Gradle/etc. based build for Sqoop to 
make it more developer friendly / open)

> Introduce Gradle based build for Sqoop to make it more developer friendly / 
> open
> 
>
> Key: SQOOP-3052
> URL: https://issues.apache.org/jira/browse/SQOOP-3052
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Attila Szabo
>Assignee: Anna Szonyi
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3052.patch
>
>
> The current trunk version can only be build with Ant/Ivy combination, which 
> has some painful limitations (resolve is slow / needs to be tweaked to use 
> only caches, the current profile / variable based settings are not working in 
> IDEs out of the box, the current solution does not download the related 
> sources, etc.)
> It would be nice to provide a solution, which would give the possibility for 
> the developers to choose between the nowadays well used build infrsturctures 
> (e.g. Maven, Gradle, etc.). For this solution it would be also essential to 
> keep the different build files (if there is more then one) synchronized 
> easily, and the configuration wouldn't diverege by time. Test execution has 
> to be solved also, and should cover all the available test cases.
> In this scenario:
> If we can provide one good working solution is much better, then provide 
> three different ones which become out of sync easily. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67929: Remove Kite dependency from the Sqoop project

2018-07-19 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206241
---


Ship it!




Thanks for the update! Verified on same cluster. Ship it!

- daniel voros


On July 19, 2018, 1:52 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> ---
> 
> (Updated July 19, 2018, 1:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added 
> org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so 
> I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These 
> scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3eb 
>   ivy/libraries.properties 565a8bf50 
>   src/docs/user/hive-notes.txt af97d94b3 
>   src/docs/user/import.txt a2c16d956 
>   src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  050c85488 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
> 02816d77f 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
>  6ebc5a31b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
> 122ff3fc9 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  7e179a27d 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
> 0a91e4a20 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
>  bd07c09f4 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
>  ed045cd14 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
> a4768c932 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987 
>   src/test/org/apache/sqoop/TestMerge.java 2b3280a5a 
>   src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c 
>   src/test/org/apache/sqoop/TestParquetImport.java b1488e8af 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 
>   src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4 
> 
> 
> Diff: https://reviews.apache.org/r/67929/diff/2/
> 
> 
> Testing
> ---
> 
> Ran unit and third party tests.
> 
> 
> File Attachments
> 
> 
> trunkdependencies.graphml
>   
> https://reviews.apache.org/media/uploaded/files/2018/07/18/4df23fec-c7a7-4dc6-8ac1-0872ee6fdadf__trunkdependencies.graphml
> kiteremovaldependencies.graphml
>   
> https://reviews.apache.org/media/uploaded/files/2018/07/18/e8cbb4d3-1da3-4b64-96ea-09f647ece126__kiteremovaldependencies.graphml
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Re: Review Request 67971: SQOOP-3346: Upgrade Hadoop version to 2.8.0

2018-07-19 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67971/#review206236
---


Ship it!




Ship It!

- daniel voros


On July 19, 2018, 9 a.m., Boglarka Egyed wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67971/
> ---
> 
> (Updated July 19, 2018, 9 a.m.)
> 
> 
> Review request for Sqoop, daniel voros, Fero Szabo, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3346
> https://issues.apache.org/jira/browse/SQOOP-3346
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Upgrading Hadoop version from 2.6.0 to 2.8.0 and some related code changes.
> 
> 
> Diffs
> -
> 
>   ivy/libraries.properties 565a8bf50cdd88597a2a502d2fdbce2d5c8585ef 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 
> 666852c2af2f7636bd068c24e5df32173b185603 
>   src/java/org/apache/sqoop/config/ConfigurationHelper.java 
> fb2ab031caef023dfbd8130814d07416dbf4db14 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 
> 6d1e04992c0e1d45a24e22fcd765c286e7414578 
>   src/java/org/apache/sqoop/tool/ImportTool.java 
> f7310b939a667e4434a78bdbc50f9520fe72f8a6 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 
> ba4a4d44f36c155318092bdcc71588c476e84e2d 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerParseMethodsTest.java 
> 833ebe8a14e438daa7fbb2eae13dc0d04bec3bb8 
>   src/test/org/apache/sqoop/orm/TestParseMethods.java 
> 46bb52d562991bc9c3443b8a26c7a7f9996d72d2 
> 
> 
> Diff: https://reviews.apache.org/r/67971/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and 3rd party tests successfully.
> 
> 
> Thanks,
> 
> Boglarka Egyed
> 
>



[jira] [Commented] (SQOOP-3346) Upgrade Hadoop version to 2.8.0

2018-07-19 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549011#comment-16549011
 ] 

Daniel Voros commented on SQOOP-3346:
-

Yes, I agree with you. Don't block this until SQOOP-3305 is done!

> Upgrade Hadoop version to 2.8.0
> ---
>
> Key: SQOOP-3346
> URL: https://issues.apache.org/jira/browse/SQOOP-3346
> Project: Sqoop
>  Issue Type: Sub-task
>Reporter: Boglarka Egyed
>Assignee: Boglarka Egyed
>Priority: Major
>
> Support for AWS temporary credentials has been introduced in Hadoop 2.8.0 
> based on HADOOP-12537 and it would make more sense to test and support this 
> capability too with Sqoop.
> There is [SQOOP-3305|https://reviews.apache.org/r/66300/bugs/SQOOP-3305/] 
> being open for upgrading Hadoop to 3.0.0 however it has several issues 
> described in [https://reviews.apache.org/r/66300/] currently thus I would 
> like to proceed with an "intermediate" upgrade to 2.8.0 to enable development 
> on S3 front. [~dvoros] are you OK with this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67929: Remove Kite dependency from the Sqoop project

2018-07-18 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
---



Hi!

I was trying to run this on a minicluster but got the following error:

```
2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.NoSuchMethodError: 
org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
at 
org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
at 
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
at 
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
at 
org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:653)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
```

This is happening when we have newer version of parquet (1.8.1 IIRC) with older 
Avro (1.7.7 in this case).

Where is parquet coming from?
  - 1.9 is coming from Sqoop since this new patch
  - Hive's hive-exec jar also contains parquet classes shaded with the original 
packaging

Which gets picked seems to be random to me (even changing between reexecution 
of mappers!). Both are in the distributed cache.

Where is avro coming from?
  - There can be multiple versions under Sqoop/Hive but it doesn't really 
matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars there 
will take precedence over user classpath. This can be changed with 
`mapreduce.job.user.classpath.first=true`, but then we'd have to make sure not 
to override anything that Hadoop relies on.

I've come across this issue before and solved it with shading parquet classes. 
Note that this could be harder to do with Sqoop's ant build scripts.

Some other minor observations:
  - Hadoop 3.1.0 still has Avro 1.7.7
  - Hive has been using incompatible versions of Avro and Parquet for a long 
time, but they're not relying on parts of Parquet that require Avro.

Szabolcs, I've been struggling this for too long, and a fresh pair of eyes 
might help spot some other options! Can you please take a look and validate 
what I've found?

Regards,
Daniel

- daniel voros


On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> ---
> 
> (Updated July 16, 2018, 3:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added 
> org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so 
> I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These 
> scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3eb 
>   ivy/libraries.properties 565a8bf50 
>   src/docs/user/hive-notes.txt af97d94b3 
>   src/docs/user/import.txt a2c16d956 
>   src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a 
>   
> src/java/org/apach

Re: Review Request 66300: Upgrade to Hadoop 3.0.0

2018-07-17 Thread daniel voros
)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1567)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1561)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:221)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:313)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:326)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:613)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:409)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:798)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:794)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
... 36 more
```

- daniel voros


On March 27, 2018, 8:50 a.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66300/
> ---
> 
> (Updated March 27, 2018, 8:50 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3305
> https://issues.apache.org/jira/browse/SQOOP-3305
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> 
> 
> Diffs
> -
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/config/ConfigurationHelper.java fb2ab031 
>   src/java/org/apache/sqoop/hive/HiveImport.java 5da00a74 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 6d1e0499 
>   src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 784b5f2a 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
> e68bba90 
>   src/java/org/apache/sqoop/util/SqoopJsonUtil.java adf186b7 
>   src/test/org/apache/sqoop/TestSqoopOptions.java bb7c20dd 
>   src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java 19bb7605 
>   
> src/test/org/apache/sqoop/hive/minicluster/KerberosAuthenticationConfiguration.java
>  549a8c6c 
>   
> src/test/org/apache/sqoop/hive/minicluster/PasswordAuthenticationConfiguration.java
>  79881f7b 
>   src/test/org/apache/sqoop/util/TestSqoopJsonUtil.java fdf972c1 
>   testdata/hcatalog/conf/hive-site.xml edac7aa9 
> 
> 
> Diff: https://reviews.apache.org/r/66300/diff/7/
> 
> 
> Testing
> ---
> 
> Normal and third-party unit tests.
> 
> 
> Thanks,
> 
> daniel voros
> 
>



[jira] [Resolved] (SQOOP-3343) format all DTA.bat SQOOP

2018-07-17 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-3343.
-
Resolution: Invalid

see INFRA-16778

> format all DTA.bat SQOOP
> 
>
> Key: SQOOP-3343
> URL: https://issues.apache.org/jira/browse/SQOOP-3343
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Mohamedvolt 
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (SQOOP-3342) rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order by updated DESC

2018-07-17 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-3342.
-
Resolution: Invalid

see INFRA-16778

> rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order 
> by updated DESC
> -
>
> Key: SQOOP-3342
> URL: https://issues.apache.org/jira/browse/SQOOP-3342
> Project: Sqoop
>  Issue Type: New Feature
>Reporter: Mohamedvolt 
>Priority: Major
>
> rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order 
> by updated DESC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67873: Add Hive support to the new Parquet writing implementation

2018-07-14 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67873/#review206088
---


Ship it!




Looks good, thank you! Ship it!

- daniel voros


On July 10, 2018, 11:26 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67873/
> ---
> 
> (Updated July 10, 2018, 11:26 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3335
> https://issues.apache.org/jira/browse/SQOOP-3335
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> SQOOP-3328 adds a new Parquet reading and writing implementation to Sqoop it 
> does not add support to Hive Parquet imports. The task of this Jira is to add 
> this missing functionality.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/hive/HiveTypes.java ad00535e5 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c53 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java 
> eb6d08f8a 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  3f35faf86 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  feb3bf19b 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 8d318327a 
>   src/java/org/apache/sqoop/tool/ImportTool.java 25c3f7031 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java d8d3af40f 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java 3d115ab3e 
>   src/test/org/apache/sqoop/hive/TestHiveTypesForAvroTypeMapping.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f646 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java ac6db0b14 
>   src/test/org/apache/sqoop/tool/TestHiveServer2OptionValidations.java 
> 4d3f93898 
> 
> 
> Diff: https://reviews.apache.org/r/67873/diff/1/
> 
> 
> Testing
> ---
> 
> Executed unit and third party test cases.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Re: Review Request 67675: SQOOP-3332 Extend Documentation of --resilient flag and add warning message when detected

2018-06-28 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67675/#review205504
---


Ship it!




Ship It!

- daniel voros


On June 28, 2018, 12:29 p.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67675/
> ---
> 
> (Updated June 28, 2018, 12:29 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3332
> https://issues.apache.org/jira/browse/SQOOP-3332
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This is the documentation part of SQOOP-.
> 
> 
> Diffs
> -
> 
>   src/docs/user/connectors.txt f1c7aebe 
>   src/java/org/apache/sqoop/manager/SQLServerManager.java c98ad2db 
>   src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java 
> cf58f631 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerManagerImportTest.java 
> fc1c4895 
> 
> 
> Diff: https://reviews.apache.org/r/67675/diff/3/
> 
> 
> Testing
> ---
> 
> Unit tests, 3rdparty tests, ant docs.
> 
> I've also investigated how export and import works: 
> 
> Import has it's retry mechanism in 
> org.apache.sqoop.mapreduce.db.SQLServerDBRecordReader#nextKeyValue
> In case of error, it re-calculates the db query, thus the implicit 
> requirements
> 
> Export has it's retry loop in 
> org.apache.sqoop.mapreduce.SQLServerAsyncDBExecThread#write
> It doesn't recalculate the query, thus is a lot safer.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing

2018-06-28 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/#review205495
---


Ship it!




Thanks for the updates! Ship it!

- daniel voros


On June 26, 2018, 9:15 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67628/
> ---
> 
> (Updated June 26, 2018, 9:15 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3328
> https://issues.apache.org/jira/browse/SQOOP-3328
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The new implementation uses classes from parquet.hadoop packages.
> TestParquetIncrementalImportMerge has been introduced to cover some gaps we 
> had in the Parquet merge support.
> The test infrastructure is also modified a bit which was needed because of 
> TestParquetIncrementalImportMerge.
> 
> Note that this JIRA does not cover the Hive Parquet import support I will 
> create another JIRA for that.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 
> d9984af369f901c782b1a74294291819e7d13cdd 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 
> 57c2062568778c5bb53cd4118ce4f030e4ff33f2 
>   src/java/org/apache/sqoop/manager/ConnManager.java 
> c80dd5d9cbaa9b114c12b693e9a686d2cbbe51a3 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 
> 3b5421028d3006e790ed4b711a06dbdb4035b8a0 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 
> 17c9ed39b1e613a6df36b54cd5395b80e5f8fb0b 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetConstants.java 
> ae53a96bddc523a52384715dd97705dc3d9db607 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetExportJobConfigurator.java 
> 8d7b87f6d6832ce8d81d995af4c4bd5eeae38e1b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java 
> fa1bc7d1395fbbbceb3cb72802675aebfdb27898 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactory.java
>  ed5103f1d84540ef2fa5de60599e94aa69156abe 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
>  2286a52030778925349ebb32c165ac062679ff71 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetMergeJobConfigurator.java 
> 67fdf6602bcbc6c091e1e9bf4176e56658ce5222 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetJobConfiguratorFactory.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
> 7f21205e1c4be4200f7248d3f1c8513e0c8e490c 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
>  ca02c7bdcaf2fa981e15a6a96b111dec38ba2b25 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
> 2d88a9c8ea4eb32001e1eb03e636d9386719 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  87828d1413eb71761aed44ad3b138535692f9c97 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
> 20adf6e422cc4b661a74c8def114d44a14787fc6 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
>  055e1166b07aeef711cd162052791500368c628d 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
>  9fecf282885f7aeac011a66f7d5d05512624976f 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
> e68bba90d8b08ac3978fcc9ccae612bdf02388e8 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 
> c62ee98c2b22d819c9a994884b254f76eb518b6a 
>   src/java/org/apache/sqoop/tool/ImportTool.java 
> 2c474b7eeeff02b59204e4baca8554d668b6c61e 
>   src/java/org/apache/sqoop/tool/MergeTool.java 
> 4c20f7d151514b26a098dafdc1ee265cbde5ad20 
> 

Re: Review Request 67675: SQOOP-3332 Extend Documentation of --resilient flag and add warning message when detected

2018-06-28 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67675/#review205494
---



Hi Fero,

If I understand correclty, with this patch we're only displaying a warning when 
using --resilient to let the users know they should add --split-by (even if 
they do so?).

In the documentation you're saying omitting --split-by can lead to 
lost/duplicated records. Shouldn't we stop the importing if there's no 
--split-by then? I understand we can't enforce the uniqeness and ascending 
order though, so keeping some kind of warning could make sense too.

What do you think?

Regards,
Daniel

- daniel voros


On June 25, 2018, 3:17 p.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67675/
> ---
> 
> (Updated June 25, 2018, 3:17 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3332
> https://issues.apache.org/jira/browse/SQOOP-3332
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This is the documentation part of SQOOP-.
> 
> 
> Diffs
> -
> 
>   src/docs/user/connectors.txt f1c7aebe 
>   src/java/org/apache/sqoop/manager/SQLServerManager.java c98ad2db 
>   src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java 
> cf58f631 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerManagerImportTest.java 
> fc1c4895 
> 
> 
> Diff: https://reviews.apache.org/r/67675/diff/2/
> 
> 
> Testing
> ---
> 
> Unit tests, 3rdparty tests, ant docs.
> 
> I've also investigated how export and import works: 
> 
> Import has it's retry mechanism in 
> org.apache.sqoop.mapreduce.db.SQLServerDBRecordReader#nextKeyValue
> In case of error, it re-calculates the db query, thus the implicit 
> requirements
> 
> Export has it's retry loop in 
> org.apache.sqoop.mapreduce.SQLServerAsyncDBExecThread#write
> It doesn't recalculate the query, thus is a lot safer.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



[jira] [Commented] (SQOOP-3323) Use hive executable in (non-JDBC) Hive imports

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520178#comment-16520178
 ] 

Daniel Voros commented on SQOOP-3323:
-

Attached review request.

> Use hive executable in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.
> As a first step we could switch to using {{hive}} executable. With HIVE-19728 
> it will be possible (in Hive 3.1) to configure hive to actually run beeline 
> when using the {{hive}} executable. This way we could leave it to the user to 
> decide whether to use the deprecated cli or use beeline instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3336) Splitting on integer column can create more splits than necessary

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520176#comment-16520176
 ] 

Daniel Voros commented on SQOOP-3336:
-

Attached review request.

This also affects splitting on date/timestamp columns, since DateSplitter uses 
the same logic.

> Splitting on integer column can create more splits than necessary
> -
>
> Key: SQOOP-3336
> URL: https://issues.apache.org/jira/browse/SQOOP-3336
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>    Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
>
> Running an import with {{-m 2}} will result in three splits if there are only 
> three consecutive integers in the table ({{\{1, 2, 3\}}}).
> Work is (probably) spread more evenly between mappers this way, but ending up 
> with more files than expected could be an issue.
> Split-limit can also result in more values than asked for in the last chunk 
> (due to the closed interval in the end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3336) Splitting on integer column can create more splits than necessary

2018-06-21 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3336:
---

 Summary: Splitting on integer column can create more splits than 
necessary
 Key: SQOOP-3336
 URL: https://issues.apache.org/jira/browse/SQOOP-3336
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


Running an import with {{-m 2}} will result in three splits if there are only 
three consecutive integers in the table ({{\{1, 2, 3\}}}).

Work is (probably) spread more evenly between mappers this way, but ending up 
with more files than expected could be an issue.

Split-limit can also result in more values than asked for in the last chunk 
(due to the closed interval in the end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3323) Use hive executable in (non-JDBC) Hive imports

2018-06-21 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3323:

Description: 
When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.

As a first step we could switch to using {{hive}} executable. With HIVE-19728 
it will be possible (in Hive 3.1) to configure hive to actually run beeline 
when using the {{hive}} executable. This way we could leave it to the user to 
decide whether to use the deprecated cli or use beeline instead.

  was:
When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.

Summary: Use hive executable in (non-JDBC) Hive imports  (was: Use 
beeline in (non-JDBC) Hive imports)

With HIVE-19728 (will be released in Hive 3.1) it will be possible to map hive 
executable to beeline. I'm updating the goal of this Jira to be using {{hive}} 
executable and let the users decide whether if they want to use beeline instead.

> Use hive executable in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.
> As a first step we could switch to using {{hive}} executable. With HIVE-19728 
> it will be possible (in Hive 3.1) to configure hive to actually run beeline 
> when using the {{hive}} executable. This way we could leave it to the user to 
> decide whether to use the deprecated cli or use beeline instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67524: SQOOP-3333 Change default behavior of the MS SQL connector to non-resilient.

2018-06-18 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67524/#review204918
---



Hi Fero,

Thank you for taking care of this! I think it's always a good idea to avoid 
these nagating options. I've posted a few minor issues/questions.

Regards,
Daniel


src/docs/user/connectors.txt
Lines 154 (patched)
<https://reviews.apache.org/r/67524/#comment287716>

*and in ascending order?



src/java/org/apache/sqoop/manager/ExportJobContext.java
Lines 38 (patched)
<https://reviews.apache.org/r/67524/#comment287720>

This new constructor is always called with outputFormatClass=null now. Are 
you planning on using this later?



src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java
Lines 34 (patched)
<https://reviews.apache.org/r/67524/#comment287717>

*"to be resilient"?



src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java
Lines 115 (patched)
<https://reviews.apache.org/r/67524/#comment287719>

Could you please add some javadoc about the return value?



src/test/org/apache/sqoop/manager/sqlserver/TestSqlServerManagerContextConfigurator.java
Lines 119 (patched)
<https://reviews.apache.org/r/67524/#comment287718>

Could this be a @Before method since it's called from every TC?


- daniel voros


On June 18, 2018, 10:25 a.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67524/
> ---
> 
> (Updated June 18, 2018, 10:25 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-
> https://issues.apache.org/jira/browse/SQOOP-
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This change is about changing the default behavior of the MS SQL connector 
> from resilient to non-resilient. I was aiming for the fewest possible 
> modifications while also removed double negation where previously present.
> 
> I've refactored the context configuration into a separate class.
> 
> I've also changed the documentation of the non-resilient flag and added a 
> note about the implicit requirement of the feature (that the split-by column 
> has to be unique and ordered in ascending order). 
> 
> I plan to expand the documentation more in SQOOP-3332, as the (now named) 
> resilient flag works not just for export, but import as well (queries and 
> tables).
> 
> I've also added new tests that cover what classes get loaded in connection 
> with the resilient option. Also, I've refactored SQL Server import tests and 
> added a few more cases for better coverage. (The query import uses a 
> different method and wasn't covered by these tests at all.)
> 
> 
> Diffs
> -
> 
>   src/docs/user/connectors.txt 7c540718 
>   src/java/org/apache/sqoop/manager/ExportJobContext.java 773cf742 
>   src/java/org/apache/sqoop/manager/SQLServerManager.java b136087f 
>   src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerManagerImportTest.java 
> c83c2c93 
>   
> src/test/org/apache/sqoop/manager/sqlserver/TestSqlServerManagerContextConfigurator.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67524/diff/3/
> 
> 
> Testing
> ---
> 
> Added new unit tests for SqlServerConfigurator.
> unit and 3rd party tests.
> ant docs ran succesfully.
> manual testing.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing

2018-06-18 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/#review204915
---



Hey Szabolcs,

Thank you for submitting this! Verified UTs, opened some minor issues.

Could you please add a few lines of Javadoc to the new classes to make it clear 
what they're used for?

Thanks,
Daniel


src/java/org/apache/sqoop/SqoopOptions.java
Lines 2936 (patched)
<https://reviews.apache.org/r/67628/#comment287711>

Couldn't we store this as a field of SqoopOptions? That way it could have a 
default without this method.



src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
Lines 51 (patched)
<https://reviews.apache.org/r/67628/#comment287712>

Wrong error msg: Is unknown? Or is _not_ set?



src/test/org/apache/sqoop/TestParquetImport.java
Lines 152 (patched)
<https://reviews.apache.org/r/67628/#comment287713>

Why is this expected to fail? Could you please add some Javadoc?


- daniel voros


On June 18, 2018, 9:49 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67628/
> ---
> 
> (Updated June 18, 2018, 9:49 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3328
> https://issues.apache.org/jira/browse/SQOOP-3328
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The new implementation uses classes from parquet.hadoop packages.
> TestParquetIncrementalImportMerge has been introduced to cover some gaps we 
> had in the Parquet merge support.
> The test infrastructure is also modified a bit which was needed because of 
> TestParquetIncrementalImportMerge.
> 
> Note that this JIRA does not cover the Hive Parquet import support I will 
> create another JIRA for that.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af36 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 57c206256 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 3b5421028 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 17c9ed39b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
>  2286a5203 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopMergeParquetReducer.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetJobConfiguratorFactory.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  PRE-CREATION 
>   src/test/org/apache/sqoop/TestBigDecimalExport.java ccea17345 
>   src/test/org/apache/sqoop/TestMerge.java 11806fea6 
>   src/test/org/apache/sqoop/TestParquetExport.java 43dabb57b 
>   src/test/org/apache/sqoop/TestParquetImport.java 27d407aa3 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java f6d591b73 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerHiveImportTest.java 
> e6b086550 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a06b 
>   src/test/org/apache/sqoop/testutil/ImportJobTestCase.java dbefe2097 
>   src/test/org/apache/sqoop/util/ParquetReader.java 56e03a060 
> 
> 
> Diff: https://reviews.apache.org/r/67628/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and third party tests successfully.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Resolved] (SQOOP-2471) Support arrays and structs datatypes with Sqoop Hcatalog integration

2018-05-30 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-2471.
-
Resolution: Duplicate

I believe this has been superseded by SQOOP-2935.

> Support arrays and structs datatypes with Sqoop Hcatalog integration
> 
>
> Key: SQOOP-2471
> URL: https://issues.apache.org/jira/browse/SQOOP-2471
> Project: Sqoop
>  Issue Type: New Feature
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Pavel Benes
>Priority: Critical
>
> Currently sqoop import is not able to handle any complex type. On the other 
> side the hive already has support for the following complex types:
>  - arrays: ARRAY
>  - structs: STRUCT
> Since it is probably not possible to obtain all necessary information about 
> those types from general JDBC database, this feature should somehow use an 
> external information provided by arguments --map-column-java and 
> --map-column-hive. 
> For example it could look like this:
>  --map-column-java item='inventory_item(name text, supplier_id integer,price 
> numeric)'
>  --map-column-hive item='STRUCT decimal>'
> In case no additional information is provided some more general type should 
> be created if possible.
> It should be possible to serialize the complex datatypes values into strings 
> when the Hive target column's type is explicitly set to 'STRING'. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67086: SQOOP-3324 Document SQOOP-816: Sqoop add support for external Hive tables

2018-05-11 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67086/#review202947
---


Ship it!




Ship It!

- daniel voros


On May 11, 2018, 1:58 p.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67086/
> ---
> 
> (Updated May 11, 2018, 1:58 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3324
> https://issues.apache.org/jira/browse/SQOOP-3324
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This is a missing documentation from Sqoop.
> 
> 
> Diffs
> -
> 
>   src/docs/man/hive-args.txt 438c1dc4 
>   src/docs/user/hive-args.txt 75095641 
>   src/docs/user/hive.txt f8f7c27e 
> 
> 
> Diff: https://reviews.apache.org/r/67086/diff/2/
> 
> 
> Testing
> ---
> 
> ant docs completed successfully.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



Re: Review Request 67086: SQOOP-3324 Document SQOOP-816: Sqoop add support for external Hive tables

2018-05-11 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67086/#review202923
---



Thanks a lot for al these documentation issues! Are you going thru the list of 
command line options to see if they're all documented?


src/docs/user/hive.txt
Lines 115 (patched)
<https://reviews.apache.org/r/67086/#comment284983>

nit: I'm sure we have this wrong elsewhere too, but I think we should say 
"switch" or "option" instead of "flag" if it takes an argument.



src/docs/user/hive.txt
Lines 127 (patched)
<https://reviews.apache.org/r/67086/#comment284981>

I think this command misses "import --hive-import" after "sqoop".


- daniel voros


On May 11, 2018, 11:10 a.m., Fero Szabo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67086/
> ---
> 
> (Updated May 11, 2018, 11:10 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3324
> https://issues.apache.org/jira/browse/SQOOP-3324
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This is a missing documentation from Sqoop.
> 
> 
> Diffs
> -
> 
>   src/docs/man/hive-args.txt 438c1dc4 
>   src/docs/user/hive-args.txt 75095641 
>   src/docs/user/hive.txt f8f7c27e 
> 
> 
> Diff: https://reviews.apache.org/r/67086/diff/1/
> 
> 
> Testing
> ---
> 
> ant docs completed successfully.
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>



[jira] [Updated] (SQOOP-3313) Remove Kite dependency

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3313:

Fix Version/s: 3.0.0

> Remove Kite dependency
> --
>
> Key: SQOOP-3313
> URL: https://issues.apache.org/jira/browse/SQOOP-3313
> Project: Sqoop
>  Issue Type: Improvement
>    Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> Having Kite as a dependency makes it hard to release a version of Sqoop 
> compatible with Hadoop 3.
> For details see discussion on dev list in [this thread|http://example.com] 
> and also SQOOP-3305.
> Let's use this ticket to gather features that need to be 
> changed/reimplemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3305) Upgrade to Hadoop 3, Hive 3, and HBase 2

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3305:

Fix Version/s: 3.0.0

> Upgrade to Hadoop 3, Hive 3, and HBase 2
> 
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>    Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3322) Version differences between ivy configurations

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3322:

Fix Version/s: 3.0.0

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 3.0.0
>
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3323) Use beeline in (non-JDBC) Hive imports

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3323:

Affects Version/s: 3.0.0
Fix Version/s: 3.0.0

Thank you!

> Use beeline in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3321) TestHiveImport is failing on Jenkins

2018-05-10 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470349#comment-16470349
 ] 

Daniel Voros commented on SQOOP-3321:
-

Thank you [~fero]! I've attached a patch on the RB.

> TestHiveImport is failing on Jenkins
> 
>
> Key: SQOOP-3321
> URL: https://issues.apache.org/jira/browse/SQOOP-3321
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Boglarka Egyed
>Priority: Major
> Attachments: TEST-org.apache.sqoop.hive.TestHiveImport.txt
>
>
> org.apache.sqoop.hive.TestHiveImport is failing since 
> [SQOOP-3318|https://reviews.apache.org/r/66761/bugs/SQOOP-3318/] has been 
> committed. This test seem to be failing only in the Jenkins environment as it 
> pass on several local machines. There can be some difference in the 
> filesystem which may cause this issue, it shall be investigated. I am 
> attaching the log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 67057: TestHiveImport is failing on Jenkins

2018-05-10 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67057/
---

Review request for Sqoop.


Bugs: SQOOP-3321
https://issues.apache.org/jira/browse/SQOOP-3321


Repository: sqoop-trunk


Description
---

I believe this is due to case sensitivity of file names in Linux (as opposed to 
MacOS). The table name gets converted to lowercase when importing but we're 
referring to it with it's original casing when trying to verify its contents in 
ParquetReader.

Tests are passing after converting these three table names to all lowercase in 
TestHiveImport:

- APPEND_HIVE_IMPORT_AS_PARQUET
- NORMAL_HIVE_IMPORT_AS_PARQUET
- CREATE_OVERWRITE_HIVE_IMPORT_AS_PARQUET


Diffs
-

  src/test/org/apache/sqoop/hive/TestHiveImport.java bc19b697 


Diff: https://reviews.apache.org/r/67057/diff/1/


Testing
---

Run TestHiveImport.


Thanks,

daniel voros



[jira] [Created] (SQOOP-3323) Use beeline in (non-JDBC) Hive imports

2018-05-10 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3323:
---

 Summary: Use beeline in (non-JDBC) Hive imports
 Key: SQOOP-3323
 URL: https://issues.apache.org/jira/browse/SQOOP-3323
 Project: Sqoop
  Issue Type: Improvement
  Components: hive-integration
Reporter: Daniel Voros
Assignee: Daniel Voros


When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3322) Version differences between ivy configurations

2018-05-08 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467438#comment-16467438
 ] 

Daniel Voros commented on SQOOP-3322:
-

Attaching review request.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 67005: Version differences between ivy configurations

2018-05-08 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67005/
---

Review request for Sqoop.


Bugs: SQOOP-3322
https://issues.apache.org/jira/browse/SQOOP-3322


Repository: sqoop-trunk


Description
---

We have multiple ivy configurations defined in ivy.xml.

- The `redist` configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
- The `common` configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
- The `test` configuration is used to set the classpath during junit execution. 
It extends the `common` config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.


Diffs
-

  ivy.xml 6af94d9d 
  ivy/libraries.properties c44b50bc 


Diff: https://reviews.apache.org/r/67005/diff/1/


Testing
---

- compared the results of ivy-resolve-hadoop, ivy-resolve-test, 
ivy-resolve-redist tasks to make sure versions are the same
- checked unit tests just to be on the safe side, test versions weren't changed 
though (all passed apart from known issues in SQOOP-3321)


Thanks,

daniel voros



[jira] [Commented] (SQOOP-3322) Version differences between ivy configurations

2018-05-08 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467355#comment-16467355
 ] 

Daniel Voros commented on SQOOP-3322:
-

One more thing I'd include in this ticket is bumping (defining to be more 
precise, and not just getting via transitive dependencies) jackson-databind 
version from 2.3.1 to 2.9.5 that isn't affected by CVE-2017-7525.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3322) Version differences between ivy configurations

2018-05-07 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3322:

Description: 
We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|1.4|1.9|*1.9*|
|commons-io|1.4|2.4|*2.4*|
|commons-logging|1.1.1|1.2|*1.2*|
|slf4j-api|1.6.1|1.7.7|*1.7.7*|

I'd suggest using the version *in bold* in all three configurations to use the 
latest versions.

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.

  was:
We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|*1.4*|1.9|1.9|
|commons-io|*1.4*|2.4|2.4|
|commons-logging|*1.1.1*|1.2|1.2|
|slf4j-api|*1.6.1*|1.7.7|1.7.7|

I'd suggest using the version *in bold* in all three configurations, based on:
 - keep version from redist (where there is one), since that's the version we 
were shipping with and used in production
 - keep the latest version in case of commons-pool that is not part of the 
redist config

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.


Thanks for commenting [~vasas], I agree! I've updated the description.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3322) Version differences between ivy configurations

2018-05-04 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3322:
---

 Summary: Version differences between ivy configurations
 Key: SQOOP-3322
 URL: https://issues.apache.org/jira/browse/SQOOP-3322
 Project: Sqoop
  Issue Type: Bug
  Components: build
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros


We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|*1.4*|1.9|1.9|
|commons-io|*1.4*|2.4|2.4|
|commons-logging|*1.1.1*|1.2|1.2|
|slf4j-api|*1.6.1*|1.7.7|1.7.7|

I'd suggest using the version *in bold* in all three configurations, based on:
 - keep version from redist (where there is one), since that's the version we 
were shipping with and used in production
 - keep the latest version in case of commons-pool that is not part of the 
redist config

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3317) org.apache.sqoop.validation.RowCountValidator in live RDBMS system

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463723#comment-16463723
 ] 

Daniel Voros commented on SQOOP-3317:
-

Hi [~srikumaran.t], thank you for reporting this!

As far as I can tell, currently the only option for validation is to check for 
an exact match for the number of records. "Percentage tolerant" validation was 
only mentioned in the documentation but is not implemented.

In my opinion this kind of validation (comparing the number of records) doesn't 
make much sense and should only be used as a sanity check, since it doesn't 
guarantee the equality of the contents.

However we could improve the existing implementation by introducing another 
parameter (margin/threshold) to not require an exact match and we could also 
implement "Percentage tolerant".

> org.apache.sqoop.validation.RowCountValidator in live RDBMS system
> --
>
> Key: SQOOP-3317
> URL: https://issues.apache.org/jira/browse/SQOOP-3317
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Sri Kumaran Thirupathy
>Priority: Major
>
> org.apache.sqoop.validation.RowCountValidator is retrieving count from Source 
> after the MR completes. This fails in live RDBMS case.
> org.apache.sqoop.validation.RowCountValidator can retrive count during MR 
> execution phase.  
> Also, How to use Percentage Tolerant? Reference: 
> [https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3321) TestHiveImport is failing on Jenkins

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463608#comment-16463608
 ] 

Daniel Voros commented on SQOOP-3321:
-

[~BoglarkaEgyed] this is failing for me on Linux as well. I believe this is due 
to case sensitivity of file names there (as opposed to MacOS). The table name 
gets converted to lowercase when importing but we're referring to it with it's 
original casing when trying to verify its contents in {{ParquetReader}}.

Tests are passing after converting these three table names to all lowercase in 
TestHiveImport:
 - APPEND_HIVE_IMPORT_AS_PARQUET
 - NORMAL_HIVE_IMPORT_AS_PARQUET
 - CREATE_OVERWRITE_HIVE_IMPORT_AS_PARQUET

Since SQOOP-3318 only changed the tests, I think we should adapt to the 
lowercase names in the tests too. Easiest solution would be to use lowercase 
names. What do you think [~vasas]?

> TestHiveImport is failing on Jenkins
> 
>
> Key: SQOOP-3321
> URL: https://issues.apache.org/jira/browse/SQOOP-3321
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Boglarka Egyed
>Priority: Major
> Attachments: TEST-org.apache.sqoop.hive.TestHiveImport.txt
>
>
> org.apache.sqoop.hive.TestHiveImport is failing since 
> [SQOOP-3318|https://reviews.apache.org/r/66761/bugs/SQOOP-3318/] has been 
> committed. This test seem to be failing only in the Jenkins environment as it 
> pass on several local machines. There can be some difference in the 
> filesystem which may cause this issue, it shall be investigated. I am 
> attaching the log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-05-02 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/
---

(Updated May 2, 2018, 12:12 p.m.)


Review request for Sqoop.


Changes
---

Patch #6 fixes `TestOrcImport#testDatetimeTypeOverrides` (fixed timezone).


Bugs: SQOOP-3311
https://issues.apache.org/jira/browse/SQOOP-3311


Repository: sqoop-trunk


Description
---

Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
by default. This will probably result in increased usage of ACID tables and the 
need to support importing into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would 
be to write out files as ORC and keep using LOAD DATA as we do for all other 
Hive tables (supported since HIVE-17361).

Workaround could be to create table as textfile (as before) and then CTAS from 
that. This would push the responsibility of creating ORC format to Hive. 
However it would result in writing every record twice; in text format and in 
ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
micromanaged) ACID tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making 
"lastmodified" incremental imports work with Hive.


Diffs (updated)
-

  ivy.xml 6af94d9d 
  ivy/libraries.properties c44b50bc 
  src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
  src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
  src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
  src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
  src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
  src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
  src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
  src/test/org/apache/sqoop/TestAllTables.java 56d1f577 
  src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
  src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
  src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 


Diff: https://reviews.apache.org/r/66548/diff/6/

Changes: https://reviews.apache.org/r/66548/diff/5-6/


Testing
---

- added some unit tests
- tested basic Hive import scenarios on a cluster


Thanks,

daniel voros



Re: Review Request 66067: SQOOP-3052: Introduce gradle-based build for Sqoop to make it more developer friendly / open

2018-04-26 Thread daniel voros
xml#L118).
4) SqoopVersion.java is now included. I think it makes sense to keep it. Any 
objections?

Regards,
Daniel

- daniel voros


On April 24, 2018, 2:23 p.m., Anna Szonyi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66067/
> ---
> 
> (Updated April 24, 2018, 2:23 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: Sqoop-3052
> https://issues.apache.org/jira/browse/Sqoop-3052
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> SQOOP-3052: Introduce gradle based build for Sqoop to make it more developer 
> friendly / open
> 
> 
> Diffs
> -
> 
>   .gitignore 68cbe28731e613607c208824443d1edf256d9c8a 
>   COMPILING.txt 3b82250488256871352056e9061ad08fabbd7fc5 
>   build.gradle PRE-CREATION 
>   config/checkstyle/checkstyle-java-header.txt PRE-CREATION 
>   config/checkstyle/checkstyle-noframes.xsl PRE-CREATION 
>   config/checkstyle/checkstyle.xml PRE-CREATION 
>   gradle.properties PRE-CREATION 
>   gradle/customUnixStartScript.txt PRE-CREATION 
>   gradle/customWindowsStartScript.txt PRE-CREATION 
>   gradle/sqoop-package.gradle PRE-CREATION 
>   gradle/sqoop-version-gen.gradle PRE-CREATION 
>   gradle/wrapper/gradle-wrapper.jar PRE-CREATION 
>   gradle/wrapper/gradle-wrapper.properties PRE-CREATION 
>   gradlew PRE-CREATION 
>   gradlew.bat PRE-CREATION 
>   settings.gradle PRE-CREATION 
>   src/scripts/rat-violations.sh 1cfbc1502b24dd1b8b7e7ce21f0b5d1880c06556 
>   testdata/hcatalog/conf/hive-site.xml 
> edac7aa9087a84b7a0c660907794adae684ae313 
> 
> 
> Diff: https://reviews.apache.org/r/66067/diff/10/
> 
> 
> Testing
> ---
> 
> ran all new tasks, except for internal maven publishing
> 
> Notes:
> - To try it out you can call ./gradlew tasks --all to see all the tasks and 
> compare them to current tasks/artifacts.
> - Replaced cobertura with jacoco, as it's easier/cleaner to configure, easier 
> to combine all test results into a single report.
> - Generated pom.xml now has correct dependencies/versions
> - Script generation is currently hardcoded and not based on sqoop help, as 
> previously - though added the possiblity of hooking it in later
> 
> 
> Thanks,
> 
> Anna Szonyi
> 
>



Re: Review Request 66761: SQOOP-3318: Remove Kite dependency from test cases

2018-04-23 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66761/#review201737
---


Ship it!




Great stuff! Do you think we'll need ParquetReader in production code when 
removing Kite from the rest of the codebase? If we will, than it probably makes 
sense to move it under src/java now.

- daniel voros


On April 23, 2018, 12:21 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66761/
> ---
> 
> (Updated April 23, 2018, 12:21 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3318
> https://issues.apache.org/jira/browse/SQOOP-3318
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Some Sqoop tests use Kite to create test data and verify test results.
> 
> Since we want to remove the Kite dependency from Sqoop we should rewrite 
> these test cases not to use Kite anymore.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/util/FileSystemUtil.java 1493e0954 
>   src/test/org/apache/sqoop/TestAllTables.java 56d1f5772 
>   src/test/org/apache/sqoop/TestMerge.java 8eef8d4ac 
>   src/test/org/apache/sqoop/TestParquetExport.java c8bb663e0 
>   src/test/org/apache/sqoop/TestParquetImport.java 379529a8d 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 4e1f249a8 
>   src/test/org/apache/sqoop/util/ParquetReader.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66761/diff/1/
> 
> 
> Testing
> ---
> 
> Executed unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Commented] (SQOOP-3314) Sqoop doesn't display full log on console

2018-04-17 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440999#comment-16440999
 ] 

Daniel Voros commented on SQOOP-3314:
-

Hi [~shailu.lahar], thank you for reporting this!

The {{... 19 more}} part is refering to the previous lines above, it's not 
truncated. I'm afraid the {{--verbose}} is your best bet. The "method specified 
in wallet_location is not supported" message suggests you have misconfigured 
your Oracle wallet. Could you please confirm if it's working outside of Sqoop?

> Sqoop doesn't display full log on console
> -
>
> Key: SQOOP-3314
> URL: https://issues.apache.org/jira/browse/SQOOP-3314
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Shailesh Lahariya
>Priority: Major
>
> I am running a sqoop command (using sqoop 1.4.7) and getting an error. I cant 
> see full error,
> it seems some of the useful information is not being displayed on console, 
> for ex. instead of ...19 more in the log below, it should be given the 
> complete message to help debug the issue.
>  
>  
>  
> 18/04/17 01:59:12 WARN tool.EvalSqlTool: SQL exception executing statement: 
> java.sql.SQLRecoverableException: IO Error: The Network Adapter could not 
> establish the connection
>   at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:774)
>   at 
> oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
>   at 
> oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:39)
>   at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:691)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at 
> org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:329)
>   at 
> org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
>   at org.apache.sqoop.tool.EvalSqlTool.run(EvalSqlTool.java:64)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> Caused by: oracle.net.ns.NetException: The Network Adapter could not 
> establish the connection
>   at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:523)
>   at 
> oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:521)
>   at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:660)
>   at oracle.net.ns.NSProtocol.connect(NSProtocol.java:286)
>   at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1438)
>   at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:518)
>   ... 14 more
> Caused by: oracle.net.ns.NetException: The method specified in 
> wallet_location is not supported. Location: /home/hadoop/wallet/jnetadmin_c
>   at 
> oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:487)
>   at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:143)
>   at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
>   at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
>   ... 19 more
>  
>  
> Also, sharing the command that is producing the above error  (altered it to 
> remove  any confidential info)-
>  
> sqoop eval -D mapred.map.child.java.opts='-Doracle.net.tns_admin=. 
> -Doracle.net.wallet_location=.' -files 
> /home/hadoop/wallet/jnetadmin_c/ewallet.jks,/home/hadoop/wallet/jnetadmin_c/ewallet.jks,$HOME/wallet/sqlnet.ora,$HOME/wallet/tnsnames.ora
>  --username xx --password xx --connect 
> "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=xx)(PORT=2484))(CONNECT_DATA=(SERVICE_NAME=xx)))"
>   --query "select 1 from dual" --verbose --throw-on-error
>  
> Please let me know if there is any option to get more log than it is 
> producing currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66361: Implement HiveServer2 client

2018-04-16 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66361/#review201202
---


Ship it!




Ship It!

- daniel voros


On April 16, 2018, 9:12 a.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66361/
> ---
> 
> (Updated April 16, 2018, 9:12 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3309
> https://issues.apache.org/jira/browse/SQOOP-3309
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This JIRA covers the implementation of the client for HiveServer2 and its 
> integration into the classes which use HiveImport.
> 
> - HiveClient interface is introduced with 2 implementation:
>   - HiveImport: this is the original implementation which uses HiveCLI
>   - HiveServer2Client: the new clients which connects to HS2 using JDBC 
> connection
>   - The common code is extracted to HiveCommon class
> - HiveClient should be instantiated using HiveClientFactory which creates and 
> configures the right HiveClient based on the configuration in SqoopOptions
> - HiveMiniCluster is introduced with a couple of helper classes to enable 
> end-to-end HS2 tests
> - A couple of new options are added to SqoopOptions to be able to configure 
> the connection to HS2
> - Validation is implemented for these new options
> 
> 
> Diffs
> -
> 
>   build.xml 7f68b573c65a61150ca78d158084586c87775d84 
>   ivy.xml 6be4fa20fbbf1f303c69d86942b1874e18a14afc 
>   src/docs/user/hive-args.txt 441f54e8e0cee63595937f4e1811abc2d89f9237 
>   src/docs/user/hive.txt 3dc8bb463d602d525fe5f2d07d52cb97efcbab7e 
>   src/java/org/apache/sqoop/SqoopOptions.java 
> 651cebd69ee7e75d06c75945e3607c4fab7eb11c 
>   src/java/org/apache/sqoop/hive/HiveClient.java PRE-CREATION 
>   src/java/org/apache/sqoop/hive/HiveClientCommon.java PRE-CREATION 
>   src/java/org/apache/sqoop/hive/HiveClientFactory.java PRE-CREATION 
>   src/java/org/apache/sqoop/hive/HiveImport.java 
> c2729119d31f7e585f204f2d31b2051eea71b72b 
>   src/java/org/apache/sqoop/hive/HiveServer2Client.java PRE-CREATION 
>   src/java/org/apache/sqoop/hive/HiveServer2ConnectionFactory.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 
> b7a25b7809e0d50166966a77161dc8ff603fb2d2 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 
> b02e4fe7fda25c7f8171c7db17d15a7987459687 
>   src/java/org/apache/sqoop/tool/CreateHiveTableTool.java 
> d259566180369a55d490144e6f865e728f4f2e61 
>   src/java/org/apache/sqoop/tool/ImportAllTablesTool.java 
> 18f7a0af48d972d5186e9414475e080f1eb765f3 
>   src/java/org/apache/sqoop/tool/ImportTool.java 
> e9920058858653bec7407bf7992eb6445401e813 
>   src/test/org/apache/sqoop/hive/TestHiveClientFactory.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveMiniCluster.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2Client.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 
> 8bdc3beb3677312ec0ee2e612616358bca4ca838 
>   src/test/org/apache/sqoop/hive/minicluster/AuthenticationConfiguration.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/hive/minicluster/HiveMiniCluster.java 
> PRE-CREATION 
>   
> src/test/org/apache/sqoop/hive/minicluster/KerberosAuthenticationConfiguration.java
>  PRE-CREATION 
>   
> src/test/org/apache/sqoop/hive/minicluster/NoAuthenticationConfiguration.java 
> PRE-CREATION 
>   
> src/test/org/apache/sqoop/hive/minicluster/PasswordAuthenticationConfiguration.java
>  PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/HiveServer2TestUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/tool/TestHiveServer2OptionValidations.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/tool/TestImportTool.java 
> 1c0cf4d863692f75bb8831e834fae47fc18b5df5 
> 
> 
> Diff: https://reviews.apache.org/r/66361/diff/5/
> 
> 
> Testing
> ---
> 
> Ran unit and third party tests suite.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Commented] (SQOOP-3312) Can not export column data named `value` from hive to mysql

2018-04-16 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439047#comment-16439047
 ] 

Daniel Voros commented on SQOOP-3312:
-

[~zimmem] I think this is the same as SQOOP-3038, that was fixed in 1.4.7. 
Could you please check if you see the issue with 1.4.7?

> Can not export column data named `value` from hive to mysql
> ---
>
> Key: SQOOP-3312
> URL: https://issues.apache.org/jira/browse/SQOOP-3312
> Project: Sqoop
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.4.6
>Reporter: zimmem zhuang
>Priority: Critical
>
> the hive table 
> {code:java}
> CREATE TABLE if not exists `test_table`(
> `id` bigint, 
> `value` double)
> STORED AS parquet
> {code}
> the mysql table
> {code:java}
> CREATE TABLE if not exists `test_table`(
> `id` bigint, 
> `value` double);
> {code}
> the export command
>  
> {code:java}
> sqoop export --connect "${jdbc_connect_url}" --username test --password *** 
> --table test_table --columns id,value --hcatalog-database default 
> --hcatalog-table test_table
> {code}
> The `value` column will null  after running the command above. But if I 
> change the column name to `value_x` (both hive and mysql), it works corretly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66361: Implement HiveServer2 client

2018-04-13 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66361/#review201101
---



Hey Szabolcs,

I'm trying to run the latest patch on a (non-kerberized) cluster, but I get the 
following:

```
18/04/13 13:50:47 INFO hive.HiveServer2ConnectionFactory: Creating connection 
to HiveServer2 as: hdfs (auth:SIMPLE)
18/04/13 13:50:47 INFO jdbc.Utils: Supplied authorities: hostname:1
18/04/13 13:50:47 INFO jdbc.Utils: Resolved authority: hostname:1
18/04/13 13:50:48 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.RuntimeException: Error executing Hive import.
java.lang.RuntimeException: Error executing Hive import.
at 
org.apache.sqoop.hive.HiveServer2Client.executeHiveImport(HiveServer2Client.java:85)
at 
org.apache.sqoop.hive.HiveServer2Client.importTable(HiveServer2Client.java:63)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:547)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:632)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:232)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:241)
at org.apache.sqoop.Sqoop.main(Sqoop.java:250)

...


Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:17 Invalid 
path ''hdfs://hostname:8020/user/hdfs/asd''
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.applyConstraintsAndGetFiles(LoadSemanticAnalyzer.java:160)
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:225)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
... 26 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission 
denied: user=anonymous, access=EXECUTE, 
inode="/user/hdfs/asd":hdfs:hdfs:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142)
...
```

Note that according to the log message, we're running as 'hdfs' user (current 
OS user), but HDFS checks permission for anonymous.

Could it be the result of 
org/apache/sqoop/hive/HiveServer2ConnectionFactory.java:42 (passing 
username=null)?
```
  public HiveServer2ConnectionFactory(String connectionString) {
this(connectionString, null, null);
  }
```

Also, it might make sense to use the --hs2-user parameter in the non-kerberized 
case as well. Like beeline allows you to override user with `-n username`. What 
do you think?

Regards,
Daniel

- daniel voros


On April 12, 2018, 2:10 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66361/
> ---
> 
> (Updated April 12, 2018, 2:10 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3309
> https://issues.apache.org/jira/browse/SQOOP-3309
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This JIRA covers the implementation of the client for HiveServer2 and its 
> integration into the classes which use HiveImport.
> 
> - HiveClient interface is introduced with 2 implementation:
>   - HiveImport: this is the original implementation which uses HiveCLI
>   - HiveServer2Client: the new clients which connects to HS2 using JDBC 
> connection
>   - The common code is extracted to HiveCommon class
> - HiveClient should be instantiated using HiveClientFactory which creates and 
> configures the right HiveC

[jira] [Resolved] (SQOOP-2878) Sqoop import into Hive transactional tables

2018-04-13 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-2878.
-
Resolution: Duplicate

See SQOOP-3311.

> Sqoop import into Hive transactional tables
> ---
>
> Key: SQOOP-2878
> URL: https://issues.apache.org/jira/browse/SQOOP-2878
> Project: Sqoop
>  Issue Type: Improvement
>Affects Versions: 1.4.6
>Reporter: Rohan More
>Priority: Minor
>
> Hive has introduced support for transactions from version 0.13. For 
> transactional support, the hive table should be bucketed and should be in ORC 
> format.
> This improvement is to import data directly into hive transactional table 
> using sqoop. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-2192) SQOOP IMPORT/EXPORT for the ORC file HIVE TABLE Failing

2018-04-13 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436935#comment-16436935
 ] 

Daniel Voros commented on SQOOP-2192:
-

[~Ankush] please refer to SQOOP-3311 for ORC updates.

> SQOOP IMPORT/EXPORT for the ORC file HIVE TABLE Failing
> ---
>
> Key: SQOOP-2192
> URL: https://issues.apache.org/jira/browse/SQOOP-2192
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.5
> Environment: Hadoop 2.6.0
> Hive 1.0.0
> Sqoop 1.4.5
>Reporter: Sunil Kumar
>Assignee: Venkat Ranganathan
>Priority: Major
>
> We are trying to export RDMB table to Hive table for running Hive  delete, 
> update queries on exported Hive table. Since for the Hive to support delete, 
> update queries on following is required:
> 1. Needs to declare table as having Transaction Property
> 2. Table must be in ORC format
> 3. Tables must to be bucketed
> to do that i have create hive table using hcat:
> create table bookinfo(md5 STRING , isbn STRING , bookid STRING , booktitle 
> STRING , author STRING , yearofpub STRING , publisher STRING , imageurls 
> STRING , imageurlm STRING , imageurll STRING , price DOUBLE , totalrating 
> DOUBLE , totalusers BIGINT , maxrating INT , minrating INT , avgrating DOUBLE 
> , rawscore DOUBLE , norm_score DOUBLE) clustered by (md5) into 10 buckets 
> stored as orc TBLPROPERTIES('transactional'='true');
> then running sqoop import:
> sqoop import --verbose --connect 'RDBMS_JDBC_URL' --driver JDBC_DRIVER 
> --table bookinfo --null-string '\\N' --null-non-string '\\N' --username USER 
> --password PASSWPRD --hcatalog-database hive_test_trans --hcatalog-table 
> bookinfo --hcatalog-storage-stanza "storedas orc" -m 1
> Following exception is comming:
> 15/03/09 16:28:59 ERROR tool.ImportTool: Encountered IOException running 
> import job: org.apache.hive.hcatalog.common.HCatException : 2016 : Error 
> operation not supported : Store into a partition with bucket definition from 
> Pig/Mapreduce is not supported
> at 
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:109)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
> at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:339)
> at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:753)
> at 
> org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
> at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:240)
> at 
> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> Please let any futher details required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3311) Importing as ORC file to support full ACID Hive tables

2018-04-11 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433797#comment-16433797
 ] 

Daniel Voros commented on SQOOP-3311:
-

Attached review request.

> Importing as ORC file to support full ACID Hive tables
> --
>
> Key: SQOOP-3311
> URL: https://issues.apache.org/jira/browse/SQOOP-3311
> Project: Sqoop
>  Issue Type: New Feature
>  Components: hive-integration
>    Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
>
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> Currently the only table format supporting full ACID tables is ORC.
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-04-11 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review200902
---



Patch #1 is an initial patch that contains the most fundamental changes to 
support ORC importing. I'll add documentation and extend the tests with 
thridparty tests etc. but wanted to share to get feedback early on.

- daniel voros


On April 11, 2018, 12:02 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> ---
> 
> (Updated April 11, 2018, 12:02 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
> https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java b7a25b7 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b02e4fe 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c0 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 8bdc3be 
>   src/test/org/apache/sqoop/orm/TestClassWriter.java 0cc07cf 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/1/
> 
> 
> Testing
> ---
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Review Request 66548: Importing as ORC file to support full ACID Hive tables

2018-04-11 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/
---

Review request for Sqoop.


Bugs: SQOOP-3311
https://issues.apache.org/jira/browse/SQOOP-3311


Repository: sqoop-trunk


Description
---

Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
by default. This will probably result in increased usage of ACID tables and the 
need to support importing into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would 
be to write out files as ORC and keep using LOAD DATA as we do for all other 
Hive tables (supported since HIVE-17361).

Workaround could be to create table as textfile (as before) and then CTAS from 
that. This would push the responsibility of creating ORC format to Hive. 
However it would result in writing every record twice; in text format and in 
ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
micromanaged) ACID tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making 
"lastmodified" incremental imports work with Hive.


Diffs
-

  ivy.xml 6be4fa2 
  ivy/libraries.properties c44b50b 
  src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
  src/java/org/apache/sqoop/hive/TableDefWriter.java b7a25b7 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba 
  src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java b02e4fe 
  src/java/org/apache/sqoop/tool/ExportTool.java 060f2c0 
  src/java/org/apache/sqoop/tool/ImportTool.java e992005 
  src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
  src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
  src/test/org/apache/sqoop/hive/TestTableDefWriter.java 8bdc3be 
  src/test/org/apache/sqoop/orm/TestClassWriter.java 0cc07cf 
  src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 


Diff: https://reviews.apache.org/r/66548/diff/1/


Testing
---

- added some unit tests
- tested basic Hive import scenarios on a cluster


Thanks,

daniel voros



[jira] [Updated] (SQOOP-3305) Upgrade to Hadoop 3, Hive 3, and HBase 2

2018-04-06 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3305:

Summary: Upgrade to Hadoop 3, Hive 3, and HBase 2  (was: Upgrade to Hadoop 
3.0.0)

I'm adding Hive and HBase to the summary, since they need to be handled 
together. See review request for details.

> Upgrade to Hadoop 3, Hive 3, and HBase 2
> 
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>    Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3311) Importing as ORC file to support full ACID Hive tables

2018-04-06 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3311:
---

 Summary: Importing as ORC file to support full ACID Hive tables
 Key: SQOOP-3311
 URL: https://issues.apache.org/jira/browse/SQOOP-3311
 Project: Sqoop
  Issue Type: New Feature
  Components: hive-integration
Reporter: Daniel Voros
Assignee: Daniel Voros


Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
by default. This will probably result in increased usage of ACID tables and the 
need to support importing into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would 
be to write out files as ORC and keep using LOAD DATA as we do for all other 
Hive tables (supported since HIVE-17361).

Workaround could be to create table as textfile (as before) and then CTAS from 
that. This would push the responsibility of creating ORC format to Hive. 
However it would result in writing every record twice; in text format and in 
ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
micromanaged) ACID tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making 
"lastmodified" incremental imports work with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66300: Upgrade to Hadoop 3.0.0

2018-04-03 Thread daniel voros


> On March 28, 2018, 3:44 p.m., Szabolcs Vasas wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java
> > Lines 69 (patched)
> > <https://reviews.apache.org/r/66300/diff/1/?file=1988993#file1988993line69>
> >
> > Can we use List interface and diamond operator here?

Fixed, please note that this file was originally copied from Hive.


- daniel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66300/#review200113
---


On March 27, 2018, 8:50 a.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66300/
> ---
> 
> (Updated March 27, 2018, 8:50 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3305
> https://issues.apache.org/jira/browse/SQOOP-3305
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> 
> 
> Diffs
> -
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
>   src/java/org/apache/sqoop/config/ConfigurationHelper.java e07a699 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 6d1e049 
>   src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 784b5f2 
>   src/java/org/apache/sqoop/util/SqoopJsonUtil.java adf186b 
>   src/test/org/apache/sqoop/TestSqoopOptions.java bb7c20d 
>   src/test/org/apache/sqoop/util/TestSqoopJsonUtil.java fdf972c 
>   testdata/hcatalog/conf/hive-site.xml edac7aa 
> 
> 
> Diff: https://reviews.apache.org/r/66300/diff/2/
> 
> 
> Testing
> ---
> 
> Normal and third-party unit tests.
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Re: Review Request 66282: Mock ConnManager field in TestTableDefWriter

2018-03-28 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66282/#review200121
---


Ship it!




Ship It!

- daniel voros


On March 27, 2018, noon, Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66282/
> ---
> 
> (Updated March 27, 2018, noon)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3308
> https://issues.apache.org/jira/browse/SQOOP-3308
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This patch removes the externalColTypes field from TableDefWriter since it 
> was only used for testing purposes.
> TestTableDefWriter is fixed to mock the ConnManager object provided to the 
> TableDefWriter constructor and a minor refactoring is done on the class.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java e1424c383 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 496b5add9 
> 
> 
> Diff: https://reviews.apache.org/r/66282/diff/4/
> 
> 
> Testing
> ---
> 
> ant clean test
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Re: Review Request 66067: SQOOP-3052: Introduce gradle-based build for Sqoop to make it more developer friendly / open

2018-03-28 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66067/#review200102
---



Hey Anna,

I've experimented with running the gradle build in a clean dockerized 
environment and found some minor issues:
 1) All dependencies are downloaded from jcenter, despite having central 
repository defined in build.gradle. This might be a result of my current setup, 
could you please confirm?
 2) Deprecation warnings for '<<' task doLast syntax ("Deprecation warning: The 
Task.leftShift(Closure) method has been deprecated...")
 3) SqoopVersion.java generation happens during task definition and not in 
action (missing doLast?).
 4) `relnotes` task fails if version is SNAPSHOT with: "A problem occurred 
starting process 'command 'cd''".
 5) The `release` task prints the path of tar and rat report but they're 
incorrect. (I've specified version on the command line with "-Pversion=1.5.0")
 6) `ant releaseaudit` now lists gradle files as errors

I've corrected 2,3,4,5,6 in this commit: 
https://github.com/dvoros/sqoop/commit/47e361829b1004bdedd6f5c223332e3fb8b85696

What's the reasoning behind using Gradle 3.5.1? Shouldn't we use 4.x? (I've 
successfully executed a simple build with 4.6)

Regards,
Daniel

- daniel voros


On March 23, 2018, 10:28 a.m., Anna Szonyi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66067/
> ---
> 
> (Updated March 23, 2018, 10:28 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: Sqoop-3052
> https://issues.apache.org/jira/browse/Sqoop-3052
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> SQOOP-3052: Introduce gradle based build for Sqoop to make it more developer 
> friendly / open
> 
> 
> Diffs
> -
> 
>   .gitignore 68cbe28 
>   COMPILING.txt 3b82250 
>   build.gradle PRE-CREATION 
>   buildSrc/customUnixStartScript.txt PRE-CREATION 
>   buildSrc/customWindowsStartScript.txt PRE-CREATION 
>   buildSrc/sqoop-package.gradle PRE-CREATION 
>   buildSrc/sqoop-version-gen.gradle PRE-CREATION 
>   config/checkstyle/checkstyle-java-header.txt PRE-CREATION 
>   config/checkstyle/checkstyle-noframes.xsl PRE-CREATION 
>   config/checkstyle/checkstyle.xml PRE-CREATION 
>   gradle.properties PRE-CREATION 
>   gradle/wrapper/gradle-wrapper.jar PRE-CREATION 
>   gradle/wrapper/gradle-wrapper.properties PRE-CREATION 
>   gradlew PRE-CREATION 
>   gradlew.bat PRE-CREATION 
>   settings.gradle PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66067/diff/6/
> 
> 
> Testing
> ---
> 
> ran all new tasks, except for internal maven publishing
> 
> Notes:
> - To try it out you can call ./gradlew tasks --all to see all the tasks and 
> compare them to current tasks/artifacts.
> - Replaced cobertura with jacoco, as it's easier/cleaner to configure, easier 
> to combine all test results into a single report.
> - Generated pom.xml now has correct dependencies/versions
> - Script generation is currently hardcoded and not based on sqoop help, as 
> previously - though added the possiblity of hooking it in later
> 
> 
> Thanks,
> 
> Anna Szonyi
> 
>



Re: Review Request 66300: Upgrade to Hadoop 3.0.0

2018-03-27 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66300/#review200037
---



Patch #1 is the minimal set of changes required to upgrade to Hadoop 3.0.0 that 
passes all unit tests. It also updates:
 - Hive to 3.0.0-SNAPSHOT since Hive hadoop shims was unable to handle Hadoop 3.
 - HBase 2.0.0-beta2 since Hive 3.0.0-SNAPSHOT depends on HBase 2.0.0-alpha4 at 
the moment.

For the list of other changes and some reasoning behind them see 
https://github.com/dvoros/sqoop/pull/4.

- daniel voros


On March 27, 2018, 8:50 a.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66300/
> ---
> 
> (Updated March 27, 2018, 8:50 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3305
> https://issues.apache.org/jira/browse/SQOOP-3305
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> 
> 
> Diffs
> -
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/config/ConfigurationHelper.java e07a699 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 6d1e049 
>   src/java/org/apache/sqoop/mapreduce/hcat/DerbyPolicy.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java 784b5f2 
>   src/java/org/apache/sqoop/util/SqoopJsonUtil.java adf186b 
>   src/test/org/apache/sqoop/TestSqoopOptions.java bb7c20d 
>   testdata/hcatalog/conf/hive-site.xml edac7aa 
> 
> 
> Diff: https://reviews.apache.org/r/66300/diff/1/
> 
> 
> Testing
> ---
> 
> Normal and third-party unit tests.
> 
> 
> Thanks,
> 
> daniel voros
> 
>



[jira] [Commented] (SQOOP-3305) Upgrade to Hadoop 3.0.0

2018-03-27 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415267#comment-16415267
 ] 

Daniel Voros commented on SQOOP-3305:
-

Attached review request.

> Upgrade to Hadoop 3.0.0
> ---
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>    Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66282: Mock ConnManager field in TestTableDefWriter

2018-03-26 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66282/#review199972
---


Ship it!




Nice addition, ship it!

- daniel voros


On March 26, 2018, 2:02 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66282/
> ---
> 
> (Updated March 26, 2018, 2:02 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3308
> https://issues.apache.org/jira/browse/SQOOP-3308
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> This patch removes the externalColTypes field from TableDefWriter since it 
> was only used for testing purposes.
> TestTableDefWriter is fixed to mock the ConnManager object provided to the 
> TableDefWriter constructor and a minor refactoring is done on the class.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java e1424c383 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 496b5add9 
> 
> 
> Diff: https://reviews.apache.org/r/66282/diff/3/
> 
> 
> Testing
> ---
> 
> ant clean test
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



Review Request 66277: Don't create HTML during Ivy report

2018-03-26 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66277/
---

Review request for Sqoop.


Bugs: SQOOP-3307
https://issues.apache.org/jira/browse/SQOOP-3307


Repository: sqoop-trunk


Description
---

ant clean report invokes the ivy:report task and creates both HTML and GraphML 
reports.
Creation of the HTML reports takes ~7 minutes and results in a ~700MB html 
that's hard to make use of, while the GraphML reporting is fast and is easier 
to read.


Diffs
-

  build.xml d85cf71 


Diff: https://reviews.apache.org/r/66277/diff/1/


Testing
---

`ant clean report`


Thanks,

daniel voros



[jira] [Commented] (SQOOP-3307) Don't create HTML during Ivy report

2018-03-26 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413747#comment-16413747
 ] 

Daniel Voros commented on SQOOP-3307:
-

Attaching review request.

> Don't create HTML during Ivy report
> ---
>
> Key: SQOOP-3307
> URL: https://issues.apache.org/jira/browse/SQOOP-3307
> Project: Sqoop
>  Issue Type: Task
>Affects Versions: 1.4.7
>    Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> {{ant clean report}} invokes the [ivy:report 
> |https://ant.apache.org/ivy/history/2.1.0/use/report.html] task and creates 
> both HTML and GraphML reports.
> Creation of the HTML reports takes ~7 minutes and results in a ~700MB html 
> that's hard to make use of, while the GraphML reporting is fast and is easier 
> to read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3307) Don't create HTML during Ivy report

2018-03-26 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3307:
---

 Summary: Don't create HTML during Ivy report
 Key: SQOOP-3307
 URL: https://issues.apache.org/jira/browse/SQOOP-3307
 Project: Sqoop
  Issue Type: Task
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0


{{ant clean report}} invokes the [ivy:report 
|https://ant.apache.org/ivy/history/2.1.0/use/report.html] task and creates 
both HTML and GraphML reports.

Creation of the HTML reports takes ~7 minutes and results in a ~700MB html 
that's hard to make use of, while the GraphML reporting is fast and is easier 
to read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3305) Upgrade to Hadoop 3.0.0

2018-03-26 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3305:
---

 Summary: Upgrade to Hadoop 3.0.0
 Key: SQOOP-3305
 URL: https://issues.apache.org/jira/browse/SQOOP-3305
 Project: Sqoop
  Issue Type: Task
Reporter: Daniel Voros
Assignee: Daniel Voros


To be able to eventually support the latest versions of Hive, HBase and 
Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
https://hadoop.apache.org/docs/r3.0.0/index.html

In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
setting a fix version yet, since this might mean a major release and to be done 
together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66195: Implement JDBC and Kerberos tools for HiveServer2 support

2018-03-23 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66195/#review199880
---


Ship it!




Ship It!

- daniel voros


On March 21, 2018, 12:48 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66195/
> ---
> 
> (Updated March 21, 2018, 12:48 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3300
> https://issues.apache.org/jira/browse/SQOOP-3300
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The idea of the Sqoop HS2 support is to connect to HS2 using JDBC and execute 
> the Hive commands on this connection. Sqoop should also support Kerberos 
> authentication when building this JDBC connection.
> 
> The goal of this JIRA is to implement the necessary classes for building JDBC 
> connections and authenticating with Kerberos.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/authentication/KerberosAuthenticator.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/db/DriverManagerJdbcConnectionFactory.java 
> PRE-CREATION 
>   src/java/org/apache/sqoop/db/JdbcConnectionFactory.java PRE-CREATION 
>   src/java/org/apache/sqoop/db/decorator/JdbcConnectionFactoryDecorator.java 
> PRE-CREATION 
>   
> src/java/org/apache/sqoop/db/decorator/KerberizedConnectionFactoryDecorator.java
>  PRE-CREATION 
>   src/test/org/apache/sqoop/authentication/TestKerberosAuthenticator.java 
> PRE-CREATION 
>   src/test/org/apache/sqoop/db/TestDriverManagerJdbcConnectionFactory.java 
> PRE-CREATION 
>   
> src/test/org/apache/sqoop/db/decorator/TestKerberizedConnectionFactoryDecorator.java
>  PRE-CREATION 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 
> f96b6587ff3756aa5a696df8b7fc12ef0b0f 
>   
> src/test/org/apache/sqoop/infrastructure/kerberos/MiniKdcInfrastructureRule.java
>  a704d0b07282e54e7c19d7a6725d6d026d037073 
> 
> 
> Diff: https://reviews.apache.org/r/66195/diff/1/
> 
> 
> Testing
> ---
> 
> Executed unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>



[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-03-09 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393199#comment-16393199
 ] 

Daniel Voros commented on SQOOP-3289:
-

Hi [~BoglarkaEgyed],

Thanks for you review! In the meantime I've started fooling around with 
thirdparty tests in Travis. Thought I'll share the current status so you can 
comment on that early on. For the latest results, please check this build: 
https://travis-ci.org/dvoros/sqoop/builds/351353673

cc [~vasas] [~maugli]



> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive imports succeed

2018-03-09 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393081#comment-16393081
 ] 

Daniel Voros commented on SQOOP-3291:
-

Thank you [~venkatnrangan]!

> SqoopJobDataPublisher is invoked before Hive imports succeed
> 
>
> Key: SQOOP-3291
> URL: https://issues.apache.org/jira/browse/SQOOP-3291
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
>
> Job data is published to listeners (defined via sqoop.job.data.publish.class) 
> in case of Hive and HCat imports. Currently this happens before the Hive 
> import completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-03-07 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390860#comment-16390860
 ] 

Daniel Voros commented on SQOOP-3289:
-

Thank you both for your comments! I'm convinced, let's give Travis a shot with 
the CI as well!

[~vasas] I'll start experimenting with thirdparty tests. First thing that came 
to my mind was to run the DB containers on a third-party server and use that 
from Travis. Not sure if that's better or worse from the legal perspective tho. 
(:

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/#review198530
---



Patch #1 changes the following:
 - moves the reporting after Hive import
 - creats isolated classloader for Hive import, since Hive was polluting its 
classloader that caused trouble in Atlas' SqoopHook
 - changes PublishJobData#publishJobData to accept class name as its first 
argument instead of unnecessary Configuration object

- daniel voros


On March 2, 2018, 3:39 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65884/
> ---
> 
> (Updated March 2, 2018, 3:39 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3291
> https://issues.apache.org/jira/browse/SQOOP-3291
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Job data is published to listeners (defined via sqoop.job.data.publish.class) 
> in case of Hive and HCat imports. Currently this happens before the Hive 
> import completes, so it gets reported even if Hive import fails.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
>   src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 
> 
> 
> Diff: https://reviews.apache.org/r/65884/diff/1/
> 
> 
> Testing
> ---
> 
> - created unit test
>  - tested on a cluster with Atlas SqoopHook in place
> 
> 
> Thanks,
> 
> daniel voros
> 
>



[jira] [Commented] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383703#comment-16383703
 ] 

Daniel Voros commented on SQOOP-3291:
-

Attached review request link.

> SqoopJobDataPublisher is invoked before Hive/HCat imports succeed
> -
>
> Key: SQOOP-3291
> URL: https://issues.apache.org/jira/browse/SQOOP-3291
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
>
> Job data is published to listeners (defined via sqoop.job.data.publish.class) 
> in case of Hive and HCat imports. Currently this happens before the Hive 
> import completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread daniel voros

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/
---

Review request for Sqoop.


Bugs: SQOOP-3291
https://issues.apache.org/jira/browse/SQOOP-3291


Repository: sqoop-trunk


Description
---

Job data is published to listeners (defined via sqoop.job.data.publish.class) 
in case of Hive and HCat imports. Currently this happens before the Hive import 
completes, so it gets reported even if Hive import fails.


Diffs
-

  src/java/org/apache/sqoop/hive/HiveImport.java c272911 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
  src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
  src/java/org/apache/sqoop/tool/ImportTool.java e992005 
  src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 


Diff: https://reviews.apache.org/r/65884/diff/1/


Testing
---

- created unit test
 - tested on a cluster with Atlas SqoopHook in place


Thanks,

daniel voros



[jira] [Created] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3291:
---

 Summary: SqoopJobDataPublisher is invoked before Hive/HCat imports 
succeed
 Key: SQOOP-3291
 URL: https://issues.apache.org/jira/browse/SQOOP-3291
 Project: Sqoop
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0


Job data is published to listeners (defined via sqoop.job.data.publish.class) 
in case of Hive and HCat imports. Currently this happens before the Hive import 
completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-02-26 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377022#comment-16377022
 ] 

Daniel Voros commented on SQOOP-3289:
-

Thanks for your response [~maugli]. I definitely agree with you, we should 
automate all tests (including thirdparty+manual integration tests) and static 
analysis checks as part of a CI gate.

AFAIK ASF is pretty flexible in this matter. For example, Spark's running 
checks on a 3rd party Jenkins on PR hooks, while Hive and Hadoop trigger jobs 
in builds.apache.org Jenkins via Jira attached patches.

None of them do the CI via Travis tho. 
[Hive|https://github.com/apache/hive/blob/master/.travis.yml#L45] and 
[Spark|https://github.com/apache/spark/blob/master/.travis.yml#L46] have 
.travis.ymls but they're not even running tests. I guess that's because of the 
50 min limitation on travis-ci.org runs.

I think we should deal with Travis and CI gatekeeping as separate tasks, and 
open a new Jira for the CI part. What do you think?

BTW, I've just found out that we're already running this job on Jira 
attachments, but it seems to fail recently. (: 
https://builds.apache.org/job/PreCommit-SQOOP-Build/

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3289:

Fix Version/s: (was: 1.4.7)
   1.5.0

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374368#comment-16374368
 ] 

Daniel Voros edited comment on SQOOP-3289 at 2/23/18 1:55 PM:
--

Anyone reading this, please drop a note and let me know what you think! I've 
also attached the review board link, if you wish to comment on a specific part 
of the file.


was (Author: dvoros):
Anyone reading this, please drop a note and let me know what you think! I've 
also attached the review board link, if you wish to comment on a specific part 
or the file.

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>    Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.4.7
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >