[jira] [Assigned] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-04 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3134:
---

Assignee: Daniel Voros  (was: Eric Lin)

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-04 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809951#comment-16809951
 ] 

Daniel Voros commented on SQOOP-3134:
-

Submitted PR: https://github.com/apache/sqoop/pull/78

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808645#comment-16808645
 ] 

Daniel Voros commented on SQOOP-3134:
-

Tests have passed for this patch: 
https://travis-ci.org/dvoros/sqoop/builds/515049441

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807898#comment-16807898
 ] 

Daniel Voros commented on SQOOP-3134:
-

[~ericlin] I've attached the change I had in mind. Would you mind if I were to 
take this over?

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3134:

Attachment: SQOOP-3134.1.patch

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
> Attachments: SQOOP-3134.1.patch
>
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2019-04-02 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807883#comment-16807883
 ] 

Daniel Voros commented on SQOOP-3134:
-

Just ran into this. Instead of introducing a new option, this could probably 
also be controlled with {{--class-name}}. It would only need a small change in 
the code path changed by SQOOP-2783 to also check for {{className == null}}.

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Markus Kemper
>Assignee: Eric Lin
>Priority: Major
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SQOOP-3289) Add .travis.yml

2018-11-23 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3289:
---

Assignee: Szabolcs Vasas  (was: Daniel Voros)

Thank you [~vasas] for your effort, this is a lot more then what I've had in 
the old review request, so I've closed that one.

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Szabolcs Vasas
>Priority: Minor
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3289.patch
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-16 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651992#comment-16651992
 ] 

Daniel Voros commented on SQOOP-3378:
-

Thanks for letting me know [~vasas]! I can confirm, this is failing for me on 
trunk as well when running on Linux. It passes on Mac however. I've opened 
SQOOP-3393 to look into this.

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3393) TestNetezzaExternalTableExportMapper hangs

2018-10-16 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3393:
---

 Summary: TestNetezzaExternalTableExportMapper hangs
 Key: SQOOP-3393
 URL: https://issues.apache.org/jira/browse/SQOOP-3393
 Project: Sqoop
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0, 3.0.0
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


Introduced in SQOOP-3378, spotted by [~vasas].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-11 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646190#comment-16646190
 ] 

Daniel Voros commented on SQOOP-3378:
-

Uploaded, thank you [~vasas].

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-10-11 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3378:

Attachment: SQOOP-3378.2.patch

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
> Attachments: SQOOP-3378.2.patch
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3381) Upgrade the Parquet library from 1.6.0 to 1.9.0

2018-10-05 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639659#comment-16639659
 ] 

Daniel Voros commented on SQOOP-3381:
-

With SQOOP-3305 I've decided to hold off until there's an HBase release that 
supports Hadoop 3.x. I don't think Hive 3.1.0 would help in this regard, since 
parquet classes are still shaded in hive-exec:3.1.0.

> Upgrade the Parquet library from 1.6.0 to 1.9.0
> ---
>
> Key: SQOOP-3381
> URL: https://issues.apache.org/jira/browse/SQOOP-3381
> Project: Sqoop
>  Issue Type: Sub-task
>Affects Versions: 1.4.7
>Reporter: Fero Szabo
>Assignee: Fero Szabo
>Priority: Major
> Fix For: 3.0.0
>
>
> As we will need to register a data supplier in the fix for parquet decimal 
> support, we will need a version that contains PARQUET-243.
> We need to upgrade the Parquet library to a version that contains this fix 
> and is compatible with Hadoop. Most probably, the newest version will be 
> adequate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3381) Upgrade the Parquet library from 1.6.0 to 1.9.0

2018-09-12 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612402#comment-16612402
 ] 

Daniel Voros commented on SQOOP-3381:
-

Hey [~fero], thanks for keeping that in mind. What I've seen during the hadoop3 
upgrade, is that Avro is added to the MR classpath from under hadoop. So where 
this could lead to issues is conflicting versions of Avro in hadoop and Parquet 
shipped with Sqoop.

Could you try your patch (having new parquet jar in lib/) on a cluster with 
current Hadoop versions? I don't think we should bother with testing with 
Hadoop 3, we'll face that in the Hadoop 3 patch.

(One more thing to keep in mind, is that parquet-hadoop-bundle is also shaded 
into the hive-exec artifact. However, I think the classes involved in 
PARQUET-243 are not bundled there.)

> Upgrade the Parquet library from 1.6.0 to 1.9.0
> ---
>
> Key: SQOOP-3381
> URL: https://issues.apache.org/jira/browse/SQOOP-3381
> Project: Sqoop
>  Issue Type: Sub-task
>Affects Versions: 1.4.7
>Reporter: Fero Szabo
>Assignee: Fero Szabo
>Priority: Major
> Fix For: 3.0.0
>
>
> As we will need to register a data supplier in the fix for parquet decimal 
> support, we will need a version that contains PARQUET-243.
> We need to upgrade the Parquet library to a version that contains this fix 
> and is compatible with Hadoop. Most probably, the newest version will be 
> adequate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3374) Assigning HDFS path to --bindir is giving error "java.lang.reflect.InvocationTargetException"

2018-09-04 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602719#comment-16602719
 ] 

Daniel Voros commented on SQOOP-3374:
-

[~amjosh911] setting an HDFS location for {{--bindir}} is not supported at the 
moment. What is your use-case that would require to do so? A workaround might 
be putting the generated files on HDFS manually after the Sqoop job finishes.

> Assigning HDFS path to --bindir is giving error 
> "java.lang.reflect.InvocationTargetException"
> -
>
> Key: SQOOP-3374
> URL: https://issues.apache.org/jira/browse/SQOOP-3374
> Project: Sqoop
>  Issue Type: Wish
>  Components: sqoop2-api
>Reporter: Amit Joshi
>Priority: Blocker
>
> When I am trying to assign the HDFS directory path to --bindir in my sqoop 
> command, it is throwing error "java.lang.reflect.InvocationTargetException".
> My sqoop query looks like this:
> sqoop import -connect connection_string --username username --password-file 
> file_path --query 'select * from EDW_PROD.RXCLM_LINE_FACT_DENIED 
> PARTITION(RXCLM_LINE_FACTP201808) where $CONDITIONS' --as-parquetfile 
> --compression-codec org.apache.hadoop.io.compress.SnappyCodec --append 
> --target-dir target_dir *-bindir hdfs://user/projects/* --split-by RX_ID 
> --null-string '/N' --null-non-string '/N' --fields-terminated-by ',' -m 10
>  
> It is creating folder "hdfs:" in my home directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE

2018-09-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602110#comment-16602110
 ] 

Daniel Voros commented on SQOOP-3058:
-

[~kuldeepkulkarn...@gmail.com], I don't think there's a workaround, but please 
note that this issue is only about reporting an extra NPE in case of an error.

I've submitted a patch to throw a more meaningful exception.

> Sqoop import with Netezza --direct fails properly but also produces NPE
> ---
>
> Key: SQOOP-3058
> URL: https://issues.apache.org/jira/browse/SQOOP-3058
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
>
> The [error] is expected however the [npe] seems like a defect, see [test 
> case] below
> [error]
> ERROR:  relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> [npe]
> 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.NullPointerException
> [test case]
> {noformat}
> #
> # STEP 01 - Setup Netezza Table and Data
> #
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DROP TABLE SQOOP_SME1.T1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "INSERT INTO SQOOP_SME1.T1 VALUES (1)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> #
> # STEP 02 - Test Import and Export (baseline)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
>   
> #
> # STEP 03 - Test Import and Export (with SCHEMA in --table option AND 
> --direct)
> #
> /* Notes: This failure seems correct however the NPE after the failure seems 
> like a defect  */
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> 16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: 
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
>   at 
> org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280)
>   at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)
>   at org.netezza.sql.NzConnection.execute(NzConnection.java:2869)
>   at 
> org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126)
>   at 
> org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143)
>   at 
> org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89)
>   at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116)
>   at 
> org.apache.sqoop.mapreduce.netezza.NetezzaExternalTableExportJob.configureOutputFormat(NetezzaExternalTableExportJob.java:128)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
>   at 
> org.apache.sqoop.manager.DirectNetezzaManager.exportTable(DirectNetezzaManager.java:209)
>   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>   at 

[jira] [Assigned] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE

2018-09-03 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3058:
---

Assignee: Daniel Voros

> Sqoop import with Netezza --direct fails properly but also produces NPE
> ---
>
> Key: SQOOP-3058
> URL: https://issues.apache.org/jira/browse/SQOOP-3058
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Markus Kemper
>Assignee: Daniel Voros
>Priority: Major
>
> The [error] is expected however the [npe] seems like a defect, see [test 
> case] below
> [error]
> ERROR:  relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> [npe]
> 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.NullPointerException
> [test case]
> {noformat}
> #
> # STEP 01 - Setup Netezza Table and Data
> #
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DROP TABLE SQOOP_SME1.T1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "INSERT INTO SQOOP_SME1.T1 VALUES (1)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> #
> # STEP 02 - Test Import and Export (baseline)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "SELECT C1 FROM SQOOP_SME1.T1"
> ---
> | C1  | 
> ---
> | 1   | 
> ---
>   
> #
> # STEP 03 - Test Import and Export (with SCHEMA in --table option AND 
> --direct)
> #
> /* Notes: This failure seems correct however the NPE after the failure seems 
> like a defect  */
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "DELETE FROM SQOOP_SME1.T1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> "SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct
> 16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: 
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
> org.netezza.error.NzSQLException: ERROR:  relation does not exist 
> SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
>   at 
> org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280)
>   at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)
>   at org.netezza.sql.NzConnection.execute(NzConnection.java:2869)
>   at 
> org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126)
>   at 
> org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143)
>   at 
> org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89)
>   at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
>   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151)
>   at 
> org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116)
>   at 
> org.apache.sqoop.mapreduce.netezza.NetezzaExternalTableExportJob.configureOutputFormat(NetezzaExternalTableExportJob.java:128)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
>   at 
> org.apache.sqoop.manager.DirectNetezzaManager.exportTable(DirectNetezzaManager.java:209)
>   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>   at 

[jira] [Commented] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-09-03 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602058#comment-16602058
 ] 

Daniel Voros commented on SQOOP-3378:
-

Attached review request.

> Error during direct Netezza import/export can interrupt process in 
> uncontrolled ways
> 
>
> Key: SQOOP-3378
> URL: https://issues.apache.org/jira/browse/SQOOP-3378
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
>
> SQLException during JDBC operation in direct Netezza import/export signals 
> parent thread to fail fast by interrupting it (see 
> [here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).
> We're [trying to process the interrupt in the 
> parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
>  (main) thread, but there's no guarantee that we're not in some blocking 
> internal call that will process the interrupted flag and reset it before 
> we're able to check.
> It is also possible that the parent thread has passed the "checking part" 
> when it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} 
> this can interrupt the upload of log files.
> I'd recommend using some other means of communication between the threads 
> than interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3378) Error during direct Netezza import/export can interrupt process in uncontrolled ways

2018-09-03 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3378:
---

 Summary: Error during direct Netezza import/export can interrupt 
process in uncontrolled ways
 Key: SQOOP-3378
 URL: https://issues.apache.org/jira/browse/SQOOP-3378
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


SQLException during JDBC operation in direct Netezza import/export signals 
parent thread to fail fast by interrupting it (see 
[here|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaJDBCStatementRunner.java#L92]).

We're [trying to process the interrupt in the 
parent|https://github.com/apache/sqoop/blob/c814e58348308b05b215db427412cd6c0b21333e/src/java/org/apache/sqoop/mapreduce/db/netezza/NetezzaExternalTableExportMapper.java#L232]
 (main) thread, but there's no guarantee that we're not in some blocking 
internal call that will process the interrupted flag and reset it before we're 
able to check.

It is also possible that the parent thread has passed the "checking part" when 
it gets interrupted. In case of {{NetezzaExternalTableExportMapper}} this can 
interrupt the upload of log files.

I'd recommend using some other means of communication between the threads than 
interrupts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-29 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596026#comment-16596026
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] could you please open a new ticket for that with details?

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-28 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594827#comment-16594827
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] use the `--bindir` option, see 
[here|https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html].

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2018-08-28 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594718#comment-16594718
 ] 

Daniel Voros commented on SQOOP-3042:
-

[~amjosh911] it is going to be included in the next release we do from trunk. 
Not sure yet if it's going to be 1.4.8, 1.5.0 or 3.0.0.

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3052) Introduce Gradle based build for Sqoop to make it more developer friendly / open

2018-07-23 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3052:

Summary: Introduce Gradle based build for Sqoop to make it more developer 
friendly / open  (was: Introduce Maven/Gradle/etc. based build for Sqoop to 
make it more developer friendly / open)

> Introduce Gradle based build for Sqoop to make it more developer friendly / 
> open
> 
>
> Key: SQOOP-3052
> URL: https://issues.apache.org/jira/browse/SQOOP-3052
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Attila Szabo
>Assignee: Anna Szonyi
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3052.patch
>
>
> The current trunk version can only be build with Ant/Ivy combination, which 
> has some painful limitations (resolve is slow / needs to be tweaked to use 
> only caches, the current profile / variable based settings are not working in 
> IDEs out of the box, the current solution does not download the related 
> sources, etc.)
> It would be nice to provide a solution, which would give the possibility for 
> the developers to choose between the nowadays well used build infrsturctures 
> (e.g. Maven, Gradle, etc.). For this solution it would be also essential to 
> keep the different build files (if there is more then one) synchronized 
> easily, and the configuration wouldn't diverege by time. Test execution has 
> to be solved also, and should cover all the available test cases.
> In this scenario:
> If we can provide one good working solution is much better, then provide 
> three different ones which become out of sync easily. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3346) Upgrade Hadoop version to 2.8.0

2018-07-19 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549011#comment-16549011
 ] 

Daniel Voros commented on SQOOP-3346:
-

Yes, I agree with you. Don't block this until SQOOP-3305 is done!

> Upgrade Hadoop version to 2.8.0
> ---
>
> Key: SQOOP-3346
> URL: https://issues.apache.org/jira/browse/SQOOP-3346
> Project: Sqoop
>  Issue Type: Sub-task
>Reporter: Boglarka Egyed
>Assignee: Boglarka Egyed
>Priority: Major
>
> Support for AWS temporary credentials has been introduced in Hadoop 2.8.0 
> based on HADOOP-12537 and it would make more sense to test and support this 
> capability too with Sqoop.
> There is [SQOOP-3305|https://reviews.apache.org/r/66300/bugs/SQOOP-3305/] 
> being open for upgrading Hadoop to 3.0.0 however it has several issues 
> described in [https://reviews.apache.org/r/66300/] currently thus I would 
> like to proceed with an "intermediate" upgrade to 2.8.0 to enable development 
> on S3 front. [~dvoros] are you OK with this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (SQOOP-3343) format all DTA.bat SQOOP

2018-07-17 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-3343.
-
Resolution: Invalid

see INFRA-16778

> format all DTA.bat SQOOP
> 
>
> Key: SQOOP-3343
> URL: https://issues.apache.org/jira/browse/SQOOP-3343
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Mohamedvolt 
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (SQOOP-3342) rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order by updated DESC

2018-07-17 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-3342.
-
Resolution: Invalid

see INFRA-16778

> rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order 
> by updated DESC
> -
>
> Key: SQOOP-3342
> URL: https://issues.apache.org/jira/browse/SQOOP-3342
> Project: Sqoop
>  Issue Type: New Feature
>Reporter: Mohamedvolt 
>Priority: Major
>
> rformat.batalldata:assignee = currentUser() AND resolution = Unresolved order 
> by updated DESC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3323) Use hive executable in (non-JDBC) Hive imports

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520178#comment-16520178
 ] 

Daniel Voros commented on SQOOP-3323:
-

Attached review request.

> Use hive executable in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.
> As a first step we could switch to using {{hive}} executable. With HIVE-19728 
> it will be possible (in Hive 3.1) to configure hive to actually run beeline 
> when using the {{hive}} executable. This way we could leave it to the user to 
> decide whether to use the deprecated cli or use beeline instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3336) Splitting on integer column can create more splits than necessary

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520176#comment-16520176
 ] 

Daniel Voros commented on SQOOP-3336:
-

Attached review request.

This also affects splitting on date/timestamp columns, since DateSplitter uses 
the same logic.

> Splitting on integer column can create more splits than necessary
> -
>
> Key: SQOOP-3336
> URL: https://issues.apache.org/jira/browse/SQOOP-3336
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
>
> Running an import with {{-m 2}} will result in three splits if there are only 
> three consecutive integers in the table ({{\{1, 2, 3\}}}).
> Work is (probably) spread more evenly between mappers this way, but ending up 
> with more files than expected could be an issue.
> Split-limit can also result in more values than asked for in the last chunk 
> (due to the closed interval in the end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3336) Splitting on integer column can create more splits than necessary

2018-06-21 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3336:
---

 Summary: Splitting on integer column can create more splits than 
necessary
 Key: SQOOP-3336
 URL: https://issues.apache.org/jira/browse/SQOOP-3336
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0, 3.0.0


Running an import with {{-m 2}} will result in three splits if there are only 
three consecutive integers in the table ({{\{1, 2, 3\}}}).

Work is (probably) spread more evenly between mappers this way, but ending up 
with more files than expected could be an issue.

Split-limit can also result in more values than asked for in the last chunk 
(due to the closed interval in the end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3323) Use hive executable in (non-JDBC) Hive imports

2018-06-21 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3323:

Description: 
When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.

As a first step we could switch to using {{hive}} executable. With HIVE-19728 
it will be possible (in Hive 3.1) to configure hive to actually run beeline 
when using the {{hive}} executable. This way we could leave it to the user to 
decide whether to use the deprecated cli or use beeline instead.

  was:
When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.

Summary: Use hive executable in (non-JDBC) Hive imports  (was: Use 
beeline in (non-JDBC) Hive imports)

With HIVE-19728 (will be released in Hive 3.1) it will be possible to map hive 
executable to beeline. I'm updating the goal of this Jira to be using {{hive}} 
executable and let the users decide whether if they want to use beeline instead.

> Use hive executable in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.
> As a first step we could switch to using {{hive}} executable. With HIVE-19728 
> it will be possible (in Hive 3.1) to configure hive to actually run beeline 
> when using the {{hive}} executable. This way we could leave it to the user to 
> decide whether to use the deprecated cli or use beeline instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (SQOOP-2471) Support arrays and structs datatypes with Sqoop Hcatalog integration

2018-05-30 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-2471.
-
Resolution: Duplicate

I believe this has been superseded by SQOOP-2935.

> Support arrays and structs datatypes with Sqoop Hcatalog integration
> 
>
> Key: SQOOP-2471
> URL: https://issues.apache.org/jira/browse/SQOOP-2471
> Project: Sqoop
>  Issue Type: New Feature
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Pavel Benes
>Priority: Critical
>
> Currently sqoop import is not able to handle any complex type. On the other 
> side the hive already has support for the following complex types:
>  - arrays: ARRAY
>  - structs: STRUCT
> Since it is probably not possible to obtain all necessary information about 
> those types from general JDBC database, this feature should somehow use an 
> external information provided by arguments --map-column-java and 
> --map-column-hive. 
> For example it could look like this:
>  --map-column-java item='inventory_item(name text, supplier_id integer,price 
> numeric)'
>  --map-column-hive item='STRUCT decimal>'
> In case no additional information is provided some more general type should 
> be created if possible.
> It should be possible to serialize the complex datatypes values into strings 
> when the Hive target column's type is explicitly set to 'STRING'. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3313) Remove Kite dependency

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3313:

Fix Version/s: 3.0.0

> Remove Kite dependency
> --
>
> Key: SQOOP-3313
> URL: https://issues.apache.org/jira/browse/SQOOP-3313
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> Having Kite as a dependency makes it hard to release a version of Sqoop 
> compatible with Hadoop 3.
> For details see discussion on dev list in [this thread|http://example.com] 
> and also SQOOP-3305.
> Let's use this ticket to gather features that need to be 
> changed/reimplemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3305) Upgrade to Hadoop 3, Hive 3, and HBase 2

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3305:

Fix Version/s: 3.0.0

> Upgrade to Hadoop 3, Hive 3, and HBase 2
> 
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3322) Version differences between ivy configurations

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3322:

Fix Version/s: 3.0.0

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 3.0.0
>
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3323) Use beeline in (non-JDBC) Hive imports

2018-05-10 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3323:

Affects Version/s: 3.0.0
Fix Version/s: 3.0.0

Thank you!

> Use beeline in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3321) TestHiveImport is failing on Jenkins

2018-05-10 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470349#comment-16470349
 ] 

Daniel Voros commented on SQOOP-3321:
-

Thank you [~fero]! I've attached a patch on the RB.

> TestHiveImport is failing on Jenkins
> 
>
> Key: SQOOP-3321
> URL: https://issues.apache.org/jira/browse/SQOOP-3321
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Boglarka Egyed
>Priority: Major
> Attachments: TEST-org.apache.sqoop.hive.TestHiveImport.txt
>
>
> org.apache.sqoop.hive.TestHiveImport is failing since 
> [SQOOP-3318|https://reviews.apache.org/r/66761/bugs/SQOOP-3318/] has been 
> committed. This test seem to be failing only in the Jenkins environment as it 
> pass on several local machines. There can be some difference in the 
> filesystem which may cause this issue, it shall be investigated. I am 
> attaching the log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3323) Use beeline in (non-JDBC) Hive imports

2018-05-10 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3323:
---

 Summary: Use beeline in (non-JDBC) Hive imports
 Key: SQOOP-3323
 URL: https://issues.apache.org/jira/browse/SQOOP-3323
 Project: Sqoop
  Issue Type: Improvement
  Components: hive-integration
Reporter: Daniel Voros
Assignee: Daniel Voros


When doing Hive imports the old way (not via JDBC that was introduced in 
SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall back 
to the {{hive}} executable (a.k.a. [Hive 
Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
that class is not found.

Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
[deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
 (see also HIVE-10511), we should switch to using {{beeline}} to talk to Hive. 
With recent additions (e.g. HIVE-18963) this should be easier than before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3322) Version differences between ivy configurations

2018-05-08 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467438#comment-16467438
 ] 

Daniel Voros commented on SQOOP-3322:
-

Attaching review request.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3322) Version differences between ivy configurations

2018-05-08 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467355#comment-16467355
 ] 

Daniel Voros commented on SQOOP-3322:
-

One more thing I'd include in this ticket is bumping (defining to be more 
precise, and not just getting via transitive dependencies) jackson-databind 
version from 2.3.1 to 2.9.5 that isn't affected by CVE-2017-7525.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3322) Version differences between ivy configurations

2018-05-07 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3322:

Description: 
We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|1.4|1.9|*1.9*|
|commons-io|1.4|2.4|*2.4*|
|commons-logging|1.1.1|1.2|*1.2*|
|slf4j-api|1.6.1|1.7.7|*1.7.7*|

I'd suggest using the version *in bold* in all three configurations to use the 
latest versions.

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.

  was:
We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|*1.4*|1.9|1.9|
|commons-io|*1.4*|2.4|2.4|
|commons-logging|*1.1.1*|1.2|1.2|
|slf4j-api|*1.6.1*|1.7.7|1.7.7|

I'd suggest using the version *in bold* in all three configurations, based on:
 - keep version from redist (where there is one), since that's the version we 
were shipping with and used in production
 - keep the latest version in case of commons-pool that is not part of the 
redist config

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.


Thanks for commenting [~vasas], I agree! I've updated the description.

> Version differences between ivy configurations
> --
>
> Key: SQOOP-3322
> URL: https://issues.apache.org/jira/browse/SQOOP-3322
> Project: Sqoop
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
>
> We have multiple ivy configurations defined in ivy.xml.
>  - The {{redist}} configuration is used to select the artifacts that need to 
> be distributed with Sqoop in its tar.gz.
>  - The {{common}} configuration is used to set the classpath during 
> compilation (also refered to as 'hadoop classpath')
>  -  The {{test}} configuration is used to set the classpath during junit 
> execution. It extends the {{common}} config.
> Some artifacts end up having different versions between these three 
> configurations, which means we're using different versions during 
> compilation/testing/runtime.
> Differences:
> ||Artifact||redist||common (compilation)||test||
> |commons-pool|not in redist|1.5.4|*1.6*|
> |commons-codec|1.4|1.9|*1.9*|
> |commons-io|1.4|2.4|*2.4*|
> |commons-logging|1.1.1|1.2|*1.2*|
> |slf4j-api|1.6.1|1.7.7|*1.7.7*|
> I'd suggest using the version *in bold* in all three configurations to use 
> the latest versions.
> To achieve this we should exclude these artifacts from the transitive 
> dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3322) Version differences between ivy configurations

2018-05-04 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3322:
---

 Summary: Version differences between ivy configurations
 Key: SQOOP-3322
 URL: https://issues.apache.org/jira/browse/SQOOP-3322
 Project: Sqoop
  Issue Type: Bug
  Components: build
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros


We have multiple ivy configurations defined in ivy.xml.
 - The {{redist}} configuration is used to select the artifacts that need to be 
distributed with Sqoop in its tar.gz.
 - The {{common}} configuration is used to set the classpath during compilation 
(also refered to as 'hadoop classpath')
 -  The {{test}} configuration is used to set the classpath during junit 
execution. It extends the {{common}} config.

Some artifacts end up having different versions between these three 
configurations, which means we're using different versions during 
compilation/testing/runtime.

Differences:
||Artifact||redist||common (compilation)||test||
|commons-pool|not in redist|1.5.4|*1.6*|
|commons-codec|*1.4*|1.9|1.9|
|commons-io|*1.4*|2.4|2.4|
|commons-logging|*1.1.1*|1.2|1.2|
|slf4j-api|*1.6.1*|1.7.7|1.7.7|

I'd suggest using the version *in bold* in all three configurations, based on:
 - keep version from redist (where there is one), since that's the version we 
were shipping with and used in production
 - keep the latest version in case of commons-pool that is not part of the 
redist config

To achieve this we should exclude these artifacts from the transitive 
dependencies and define them explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3317) org.apache.sqoop.validation.RowCountValidator in live RDBMS system

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463723#comment-16463723
 ] 

Daniel Voros commented on SQOOP-3317:
-

Hi [~srikumaran.t], thank you for reporting this!

As far as I can tell, currently the only option for validation is to check for 
an exact match for the number of records. "Percentage tolerant" validation was 
only mentioned in the documentation but is not implemented.

In my opinion this kind of validation (comparing the number of records) doesn't 
make much sense and should only be used as a sanity check, since it doesn't 
guarantee the equality of the contents.

However we could improve the existing implementation by introducing another 
parameter (margin/threshold) to not require an exact match and we could also 
implement "Percentage tolerant".

> org.apache.sqoop.validation.RowCountValidator in live RDBMS system
> --
>
> Key: SQOOP-3317
> URL: https://issues.apache.org/jira/browse/SQOOP-3317
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Sri Kumaran Thirupathy
>Priority: Major
>
> org.apache.sqoop.validation.RowCountValidator is retrieving count from Source 
> after the MR completes. This fails in live RDBMS case.
> org.apache.sqoop.validation.RowCountValidator can retrive count during MR 
> execution phase.  
> Also, How to use Percentage Tolerant? Reference: 
> [https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3321) TestHiveImport is failing on Jenkins

2018-05-04 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463608#comment-16463608
 ] 

Daniel Voros commented on SQOOP-3321:
-

[~BoglarkaEgyed] this is failing for me on Linux as well. I believe this is due 
to case sensitivity of file names there (as opposed to MacOS). The table name 
gets converted to lowercase when importing but we're referring to it with it's 
original casing when trying to verify its contents in {{ParquetReader}}.

Tests are passing after converting these three table names to all lowercase in 
TestHiveImport:
 - APPEND_HIVE_IMPORT_AS_PARQUET
 - NORMAL_HIVE_IMPORT_AS_PARQUET
 - CREATE_OVERWRITE_HIVE_IMPORT_AS_PARQUET

Since SQOOP-3318 only changed the tests, I think we should adapt to the 
lowercase names in the tests too. Easiest solution would be to use lowercase 
names. What do you think [~vasas]?

> TestHiveImport is failing on Jenkins
> 
>
> Key: SQOOP-3321
> URL: https://issues.apache.org/jira/browse/SQOOP-3321
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Boglarka Egyed
>Priority: Major
> Attachments: TEST-org.apache.sqoop.hive.TestHiveImport.txt
>
>
> org.apache.sqoop.hive.TestHiveImport is failing since 
> [SQOOP-3318|https://reviews.apache.org/r/66761/bugs/SQOOP-3318/] has been 
> committed. This test seem to be failing only in the Jenkins environment as it 
> pass on several local machines. There can be some difference in the 
> filesystem which may cause this issue, it shall be investigated. I am 
> attaching the log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3314) Sqoop doesn't display full log on console

2018-04-17 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440999#comment-16440999
 ] 

Daniel Voros commented on SQOOP-3314:
-

Hi [~shailu.lahar], thank you for reporting this!

The {{... 19 more}} part is refering to the previous lines above, it's not 
truncated. I'm afraid the {{--verbose}} is your best bet. The "method specified 
in wallet_location is not supported" message suggests you have misconfigured 
your Oracle wallet. Could you please confirm if it's working outside of Sqoop?

> Sqoop doesn't display full log on console
> -
>
> Key: SQOOP-3314
> URL: https://issues.apache.org/jira/browse/SQOOP-3314
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Shailesh Lahariya
>Priority: Major
>
> I am running a sqoop command (using sqoop 1.4.7) and getting an error. I cant 
> see full error,
> it seems some of the useful information is not being displayed on console, 
> for ex. instead of ...19 more in the log below, it should be given the 
> complete message to help debug the issue.
>  
>  
>  
> 18/04/17 01:59:12 WARN tool.EvalSqlTool: SQL exception executing statement: 
> java.sql.SQLRecoverableException: IO Error: The Network Adapter could not 
> establish the connection
>   at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:774)
>   at 
> oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
>   at 
> oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:39)
>   at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:691)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at 
> org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:329)
>   at 
> org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
>   at org.apache.sqoop.tool.EvalSqlTool.run(EvalSqlTool.java:64)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> Caused by: oracle.net.ns.NetException: The Network Adapter could not 
> establish the connection
>   at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:523)
>   at 
> oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:521)
>   at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:660)
>   at oracle.net.ns.NSProtocol.connect(NSProtocol.java:286)
>   at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1438)
>   at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:518)
>   ... 14 more
> Caused by: oracle.net.ns.NetException: The method specified in 
> wallet_location is not supported. Location: /home/hadoop/wallet/jnetadmin_c
>   at 
> oracle.net.nt.CustomSSLSocketFactory.getSSLSocketEngine(CustomSSLSocketFactory.java:487)
>   at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:143)
>   at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
>   at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
>   ... 19 more
>  
>  
> Also, sharing the command that is producing the above error  (altered it to 
> remove  any confidential info)-
>  
> sqoop eval -D mapred.map.child.java.opts='-Doracle.net.tns_admin=. 
> -Doracle.net.wallet_location=.' -files 
> /home/hadoop/wallet/jnetadmin_c/ewallet.jks,/home/hadoop/wallet/jnetadmin_c/ewallet.jks,$HOME/wallet/sqlnet.ora,$HOME/wallet/tnsnames.ora
>  --username xx --password xx --connect 
> "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=xx)(PORT=2484))(CONNECT_DATA=(SERVICE_NAME=xx)))"
>   --query "select 1 from dual" --verbose --throw-on-error
>  
> Please let me know if there is any option to get more log than it is 
> producing currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3312) Can not export column data named `value` from hive to mysql

2018-04-16 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439047#comment-16439047
 ] 

Daniel Voros commented on SQOOP-3312:
-

[~zimmem] I think this is the same as SQOOP-3038, that was fixed in 1.4.7. 
Could you please check if you see the issue with 1.4.7?

> Can not export column data named `value` from hive to mysql
> ---
>
> Key: SQOOP-3312
> URL: https://issues.apache.org/jira/browse/SQOOP-3312
> Project: Sqoop
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.4.6
>Reporter: zimmem zhuang
>Priority: Critical
>
> the hive table 
> {code:java}
> CREATE TABLE if not exists `test_table`(
> `id` bigint, 
> `value` double)
> STORED AS parquet
> {code}
> the mysql table
> {code:java}
> CREATE TABLE if not exists `test_table`(
> `id` bigint, 
> `value` double);
> {code}
> the export command
>  
> {code:java}
> sqoop export --connect "${jdbc_connect_url}" --username test --password *** 
> --table test_table --columns id,value --hcatalog-database default 
> --hcatalog-table test_table
> {code}
> The `value` column will null  after running the command above. But if I 
> change the column name to `value_x` (both hive and mysql), it works corretly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (SQOOP-2878) Sqoop import into Hive transactional tables

2018-04-13 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros resolved SQOOP-2878.
-
Resolution: Duplicate

See SQOOP-3311.

> Sqoop import into Hive transactional tables
> ---
>
> Key: SQOOP-2878
> URL: https://issues.apache.org/jira/browse/SQOOP-2878
> Project: Sqoop
>  Issue Type: Improvement
>Affects Versions: 1.4.6
>Reporter: Rohan More
>Priority: Minor
>
> Hive has introduced support for transactions from version 0.13. For 
> transactional support, the hive table should be bucketed and should be in ORC 
> format.
> This improvement is to import data directly into hive transactional table 
> using sqoop. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-2192) SQOOP IMPORT/EXPORT for the ORC file HIVE TABLE Failing

2018-04-13 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436935#comment-16436935
 ] 

Daniel Voros commented on SQOOP-2192:
-

[~Ankush] please refer to SQOOP-3311 for ORC updates.

> SQOOP IMPORT/EXPORT for the ORC file HIVE TABLE Failing
> ---
>
> Key: SQOOP-2192
> URL: https://issues.apache.org/jira/browse/SQOOP-2192
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.5
> Environment: Hadoop 2.6.0
> Hive 1.0.0
> Sqoop 1.4.5
>Reporter: Sunil Kumar
>Assignee: Venkat Ranganathan
>Priority: Major
>
> We are trying to export RDMB table to Hive table for running Hive  delete, 
> update queries on exported Hive table. Since for the Hive to support delete, 
> update queries on following is required:
> 1. Needs to declare table as having Transaction Property
> 2. Table must be in ORC format
> 3. Tables must to be bucketed
> to do that i have create hive table using hcat:
> create table bookinfo(md5 STRING , isbn STRING , bookid STRING , booktitle 
> STRING , author STRING , yearofpub STRING , publisher STRING , imageurls 
> STRING , imageurlm STRING , imageurll STRING , price DOUBLE , totalrating 
> DOUBLE , totalusers BIGINT , maxrating INT , minrating INT , avgrating DOUBLE 
> , rawscore DOUBLE , norm_score DOUBLE) clustered by (md5) into 10 buckets 
> stored as orc TBLPROPERTIES('transactional'='true');
> then running sqoop import:
> sqoop import --verbose --connect 'RDBMS_JDBC_URL' --driver JDBC_DRIVER 
> --table bookinfo --null-string '\\N' --null-non-string '\\N' --username USER 
> --password PASSWPRD --hcatalog-database hive_test_trans --hcatalog-table 
> bookinfo --hcatalog-storage-stanza "storedas orc" -m 1
> Following exception is comming:
> 15/03/09 16:28:59 ERROR tool.ImportTool: Encountered IOException running 
> import job: org.apache.hive.hcatalog.common.HCatException : 2016 : Error 
> operation not supported : Store into a partition with bucket definition from 
> Pig/Mapreduce is not supported
> at 
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:109)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
> at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:339)
> at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:753)
> at 
> org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
> at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:240)
> at 
> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> Please let any futher details required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3311) Importing as ORC file to support full ACID Hive tables

2018-04-11 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433797#comment-16433797
 ] 

Daniel Voros commented on SQOOP-3311:
-

Attached review request.

> Importing as ORC file to support full ACID Hive tables
> --
>
> Key: SQOOP-3311
> URL: https://issues.apache.org/jira/browse/SQOOP-3311
> Project: Sqoop
>  Issue Type: New Feature
>  Components: hive-integration
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
>
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
> by default. This will probably result in increased usage of ACID tables and 
> the need to support importing into ACID tables with Sqoop.
> Currently the only table format supporting full ACID tables is ORC.
> The easiest and most effective way to support importing into these tables 
> would be to write out files as ORC and keep using LOAD DATA as we do for all 
> other Hive tables (supported since HIVE-17361).
> Workaround could be to create table as textfile (as before) and then CTAS 
> from that. This would push the responsibility of creating ORC format to Hive. 
> However it would result in writing every record twice; in text format and in 
> ORC.
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
> micromanaged) ACID tables can use arbitrary file format.
> Supporting full ACID tables would also be the first step in making 
> "lastmodified" incremental imports work with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3305) Upgrade to Hadoop 3, Hive 3, and HBase 2

2018-04-06 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3305:

Summary: Upgrade to Hadoop 3, Hive 3, and HBase 2  (was: Upgrade to Hadoop 
3.0.0)

I'm adding Hive and HBase to the summary, since they need to be handled 
together. See review request for details.

> Upgrade to Hadoop 3, Hive 3, and HBase 2
> 
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3311) Importing as ORC file to support full ACID Hive tables

2018-04-06 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3311:
---

 Summary: Importing as ORC file to support full ACID Hive tables
 Key: SQOOP-3311
 URL: https://issues.apache.org/jira/browse/SQOOP-3311
 Project: Sqoop
  Issue Type: New Feature
  Components: hive-integration
Reporter: Daniel Voros
Assignee: Daniel Voros


Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID 
by default. This will probably result in increased usage of ACID tables and the 
need to support importing into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would 
be to write out files as ORC and keep using LOAD DATA as we do for all other 
Hive tables (supported since HIVE-17361).

Workaround could be to create table as textfile (as before) and then CTAS from 
that. This would push the responsibility of creating ORC format to Hive. 
However it would result in writing every record twice; in text format and in 
ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. 
micromanaged) ACID tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making 
"lastmodified" incremental imports work with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3305) Upgrade to Hadoop 3.0.0

2018-03-27 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415267#comment-16415267
 ] 

Daniel Voros commented on SQOOP-3305:
-

Attached review request.

> Upgrade to Hadoop 3.0.0
> ---
>
> Key: SQOOP-3305
> URL: https://issues.apache.org/jira/browse/SQOOP-3305
> Project: Sqoop
>  Issue Type: Task
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
>
> To be able to eventually support the latest versions of Hive, HBase and 
> Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
> https://hadoop.apache.org/docs/r3.0.0/index.html
> In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
> setting a fix version yet, since this might mean a major release and to be 
> done together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3307) Don't create HTML during Ivy report

2018-03-26 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413747#comment-16413747
 ] 

Daniel Voros commented on SQOOP-3307:
-

Attaching review request.

> Don't create HTML during Ivy report
> ---
>
> Key: SQOOP-3307
> URL: https://issues.apache.org/jira/browse/SQOOP-3307
> Project: Sqoop
>  Issue Type: Task
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> {{ant clean report}} invokes the [ivy:report 
> |https://ant.apache.org/ivy/history/2.1.0/use/report.html] task and creates 
> both HTML and GraphML reports.
> Creation of the HTML reports takes ~7 minutes and results in a ~700MB html 
> that's hard to make use of, while the GraphML reporting is fast and is easier 
> to read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3307) Don't create HTML during Ivy report

2018-03-26 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3307:
---

 Summary: Don't create HTML during Ivy report
 Key: SQOOP-3307
 URL: https://issues.apache.org/jira/browse/SQOOP-3307
 Project: Sqoop
  Issue Type: Task
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0


{{ant clean report}} invokes the [ivy:report 
|https://ant.apache.org/ivy/history/2.1.0/use/report.html] task and creates 
both HTML and GraphML reports.

Creation of the HTML reports takes ~7 minutes and results in a ~700MB html 
that's hard to make use of, while the GraphML reporting is fast and is easier 
to read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3305) Upgrade to Hadoop 3.0.0

2018-03-26 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3305:
---

 Summary: Upgrade to Hadoop 3.0.0
 Key: SQOOP-3305
 URL: https://issues.apache.org/jira/browse/SQOOP-3305
 Project: Sqoop
  Issue Type: Task
Reporter: Daniel Voros
Assignee: Daniel Voros


To be able to eventually support the latest versions of Hive, HBase and 
Accumulo, we should start by upgrading our Hadoop dependencies to 3.0.0. See 
https://hadoop.apache.org/docs/r3.0.0/index.html

In this ticket I'll collect the necessary changes to do the upgrade. I'm not 
setting a fix version yet, since this might mean a major release and to be done 
together with the upgrade of related components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-03-09 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393199#comment-16393199
 ] 

Daniel Voros commented on SQOOP-3289:
-

Hi [~BoglarkaEgyed],

Thanks for you review! In the meantime I've started fooling around with 
thirdparty tests in Travis. Thought I'll share the current status so you can 
comment on that early on. For the latest results, please check this build: 
https://travis-ci.org/dvoros/sqoop/builds/351353673

cc [~vasas] [~maugli]



> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive imports succeed

2018-03-09 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393081#comment-16393081
 ] 

Daniel Voros commented on SQOOP-3291:
-

Thank you [~venkatnrangan]!

> SqoopJobDataPublisher is invoked before Hive imports succeed
> 
>
> Key: SQOOP-3291
> URL: https://issues.apache.org/jira/browse/SQOOP-3291
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
>
> Job data is published to listeners (defined via sqoop.job.data.publish.class) 
> in case of Hive and HCat imports. Currently this happens before the Hive 
> import completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-03-07 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390860#comment-16390860
 ] 

Daniel Voros commented on SQOOP-3289:
-

Thank you both for your comments! I'm convinced, let's give Travis a shot with 
the CI as well!

[~vasas] I'll start experimenting with thirdparty tests. First thing that came 
to my mind was to run the DB containers on a third-party server and use that 
from Travis. Not sure if that's better or worse from the legal perspective tho. 
(:

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383703#comment-16383703
 ] 

Daniel Voros commented on SQOOP-3291:
-

Attached review request link.

> SqoopJobDataPublisher is invoked before Hive/HCat imports succeed
> -
>
> Key: SQOOP-3291
> URL: https://issues.apache.org/jira/browse/SQOOP-3291
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
>
> Job data is published to listeners (defined via sqoop.job.data.publish.class) 
> in case of Hive and HCat imports. Currently this happens before the Hive 
> import completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3291) SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

2018-03-02 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3291:
---

 Summary: SqoopJobDataPublisher is invoked before Hive/HCat imports 
succeed
 Key: SQOOP-3291
 URL: https://issues.apache.org/jira/browse/SQOOP-3291
 Project: Sqoop
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.5.0


Job data is published to listeners (defined via sqoop.job.data.publish.class) 
in case of Hive and HCat imports. Currently this happens before the Hive import 
completes, so it gets reported even if Hive import fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-02-26 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377022#comment-16377022
 ] 

Daniel Voros commented on SQOOP-3289:
-

Thanks for your response [~maugli]. I definitely agree with you, we should 
automate all tests (including thirdparty+manual integration tests) and static 
analysis checks as part of a CI gate.

AFAIK ASF is pretty flexible in this matter. For example, Spark's running 
checks on a 3rd party Jenkins on PR hooks, while Hive and Hadoop trigger jobs 
in builds.apache.org Jenkins via Jira attached patches.

None of them do the CI via Travis tho. 
[Hive|https://github.com/apache/hive/blob/master/.travis.yml#L45] and 
[Spark|https://github.com/apache/spark/blob/master/.travis.yml#L46] have 
.travis.ymls but they're not even running tests. I guess that's because of the 
50 min limitation on travis-ci.org runs.

I think we should deal with Travis and CI gatekeeping as separate tasks, and 
open a new Jira for the CI part. What do you think?

BTW, I've just found out that we're already running this job on Jira 
attachments, but it seems to fail recently. (: 
https://builds.apache.org/job/PreCommit-SQOOP-Build/

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3289:

Fix Version/s: (was: 1.4.7)
   1.5.0

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374368#comment-16374368
 ] 

Daniel Voros edited comment on SQOOP-3289 at 2/23/18 1:55 PM:
--

Anyone reading this, please drop a note and let me know what you think! I've 
also attached the review board link, if you wish to comment on a specific part 
of the file.


was (Author: dvoros):
Anyone reading this, please drop a note and let me know what you think! I've 
also attached the review board link, if you wish to comment on a specific part 
or the file.

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.4.7
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374368#comment-16374368
 ] 

Daniel Voros commented on SQOOP-3289:
-

Anyone reading this, please drop a note and let me know what you think! I've 
also attached the review board link, if you wish to comment on a specific part 
or the file.

> Add .travis.yml
> ---
>
> Key: SQOOP-3289
> URL: https://issues.apache.org/jira/browse/SQOOP-3289
> Project: Sqoop
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.4.7
>
>
> Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
> Currently if you wish to use Travis for testing your changes, you have to 
> manually add a .travis.yml to your branch. Having it committed to trunk would 
> save us this extra step.
> I currently have an example 
> [{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
>  on my travis branch running unit tests for every commit and every pull 
> request: https://travis-ci.org/dvoros/sqoop/builds
> Later we could add the build status to the project readme as well, see: 
> https://github.com/dvoros/sqoop/tree/travis
> Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3289) Add .travis.yml

2018-02-23 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3289:
---

 Summary: Add .travis.yml
 Key: SQOOP-3289
 URL: https://issues.apache.org/jira/browse/SQOOP-3289
 Project: Sqoop
  Issue Type: Task
  Components: build
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros
 Fix For: 1.4.7


Adding a .travis.yml would enable running builds/tests on travis-ci.org. 
Currently if you wish to use Travis for testing your changes, you have to 
manually add a .travis.yml to your branch. Having it committed to trunk would 
save us this extra step.

I currently have an example 
[{{.travis.yml}}|https://github.com/dvoros/sqoop/blob/93a4c06c1a3da1fd5305c99e379484507797b3eb/.travis.yml]
 on my travis branch running unit tests for every commit and every pull 
request: https://travis-ci.org/dvoros/sqoop/builds

Later we could add the build status to the project readme as well, see: 
https://github.com/dvoros/sqoop/tree/travis

Also, an example of a pull request: https://github.com/dvoros/sqoop/pull/1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-02-22 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372870#comment-16372870
 ] 

Daniel Voros commented on SQOOP-3267:
-

Thank you all!

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3267.1.patch, SQOOP-3267.2.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371674#comment-16371674
 ] 

Daniel Voros commented on SQOOP-3288:
-

Great, thank you [~maugli]!

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371594#comment-16371594
 ] 

Daniel Voros edited comment on SQOOP-3288 at 2/21/18 4:11 PM:
--

[~maugli], oh I see, sorry I didn't get your point!

I believe we won't face the problem you've described, because we're not setting 
the time zone based on the machine that's running the import. We're always 
using the time zone set via 'oracle.sessionTimeZone' and fall back to 'GMT', 
see 
[here|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L415-L418].

I hope this answers your question! Also, let me point out, that the part this 
change affects is only run when getting the next last-value for an incremental 
import ({{ImportTool#initIncrementalConstraints() -> 
SqlManager#getCurrentDbTimestamp() -> OracleManager#getCurTimestampQuery()}}). 
It won't affect how we're dealing with date/time fields anywhere else.


was (Author: dvoros):
[~maugli], oh I've see, sorry I didn't get your point!

I believe we won't face the problem you've described, because we're not setting 
the time zone based on the machine that's running the import. We're always 
using the time zone set via 'oracle.sessionTimeZone' and fall back to 'GMT', 
see 
[here|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L415-L418].

I hope this answers your question! Also, let me point out, that the part this 
change affects is only run when getting the next last-value for an incremental 
import ({{ImportTool#initIncrementalConstraints() -> 
SqlManager#getCurrentDbTimestamp() -> OracleManager#getCurTimestampQuery()}}). 
It won't affect how we're dealing with date/time fields anywhere else.

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371594#comment-16371594
 ] 

Daniel Voros commented on SQOOP-3288:
-

[~maugli], oh I've see, sorry I didn't get your point!

I believe we won't face the problem you've described, because we're not setting 
the time zone based on the machine that's running the import. We're always 
using the time zone set via 'oracle.sessionTimeZone' and fall back to 'GMT', 
see 
[here|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L415-L418].

I hope this answers your question! Also, let me point out, that the part this 
change affects is only run when getting the next last-value for an incremental 
import ({{ImportTool#initIncrementalConstraints() -> 
SqlManager#getCurrentDbTimestamp() -> OracleManager#getCurTimestampQuery()}}). 
It won't affect how we're dealing with date/time fields anywhere else.

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371502#comment-16371502
 ] 

Daniel Voros commented on SQOOP-3288:
-

[~maugli] it's the other way around. We were getting the OS time before, that's 
what this patch is supposed to fix. (Also, please note, that we were not 
relying on the time of the OS running the sqoop job, but the OS running Oracle.)

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371480#comment-16371480
 ] 

Daniel Voros commented on SQOOP-3288:
-

[~maugli], I've attached added the link to the review request. Sorry about the 
fix version.

The problem isn't time-zone awareness only, but getting the OS time instead of 
the Oracle session time. This leaves us with CURRENT_TIMESTAMP and 
LOCALTIMESTAMP, both of which would suit our needs (since we're dropping the 
time zone when displaying or saving the next last-value). I've decided to go 
with CURRENT_TIMESTAMP only because we're using the function with the same name 
for other SQL DBs.

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-20 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3288:

Attachment: SQOOP-3288.1.patch

> Incremental import's upper bound ignores session time zone in Oracle
> 
>
> Key: SQOOP-3288
> URL: https://issues.apache.org/jira/browse/SQOOP-3288
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3288.1.patch
>
>
> At the moment we're using [{{SELECT SYSDATE FROM 
> dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
>  when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while 
> CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
> during incremental imports *when Oracle's time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while 
> the OS is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
> {{2:30}} *Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get 
> imported.
> Please note, that the example above assumes, that the user/administrator 
> who's updating the Oracle table will use the current session time of Oracle 
> when setting the "last updated" column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
> connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle

2018-02-19 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3288:
---

 Summary: Incremental import's upper bound ignores session time 
zone in Oracle
 Key: SQOOP-3288
 URL: https://issues.apache.org/jira/browse/SQOOP-3288
 Project: Sqoop
  Issue Type: Bug
  Components: connectors/oracle
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros


At the moment we're using [{{SELECT SYSDATE FROM 
dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
 when getting current time from Oracle.

SYSDATE returns the underlying operating system's current time, while 
CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
during incremental imports *when Oracle's time zone is different from the OS*.

Consider the following scenario when Oracle is configured to {{+0:00}}, while 
the OS is {{+5:00}}:
||Oracle time||OS time||Event||
|2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
|2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
{{2:30}} *Won't be imported!*|
|3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|

This way records updated within 5 hours after the last sqoop import won't get 
imported.

Please note, that the example above assumes, that the user/administrator who's 
updating the Oracle table will use the current session time of Oracle when 
setting the "last updated" column of the table.

I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-02-16 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367417#comment-16367417
 ] 

Daniel Voros commented on SQOOP-3267:
-

Attached patch #2. This introduces the --hbase-null-incremental-mode option as 
discussed in the comments above.

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch, SQOOP-3267.2.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-02-16 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3267:

Attachment: SQOOP-3267.2.patch

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch, SQOOP-3267.2.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-15 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365535#comment-16365535
 ] 

Daniel Voros commented on SQOOP-3283:
-

Thanks for the review [~maugli] and [~fero]!

[~maugli] thanks for pointing that out. Knowing it's set in the build.xml to a 
default, it makes no sense to have the getCurrentUser() as it's return value 
will never be used. I think I'll open a new ticket. How would you feel about 
dropping the whole env.USER default thing and using 'sqoop' for default user? 
That would be in line with the dockerized thirdparty tests.

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Fix For: 1.5.0
>
> Attachments: SQOOP-3283.1.patch, SQOOP-3283.2.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-13 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362431#comment-16362431
 ] 

Daniel Voros commented on SQOOP-3283:
-

Attached patch #2 after review from [~fero]. This fallbacks to user.name system 
property instead of whoami and throw if unable to find user.

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Attachments: SQOOP-3283.1.patch, SQOOP-3283.2.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-13 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3283:

Attachment: SQOOP-3283.2.patch

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Attachments: SQOOP-3283.1.patch, SQOOP-3283.2.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-06 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353790#comment-16353790
 ] 

Daniel Voros commented on SQOOP-3283:
-

Adding review request link.

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Attachments: SQOOP-3283.1.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-06 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353780#comment-16353780
 ] 

Daniel Voros commented on SQOOP-3283:
-

Attached patch #1. This uses try-with-resources instead of try-catch and 
removes the {{waitFor()}} fall altogether.

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Attachments: SQOOP-3283.1.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-06 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3283:

Attachment: SQOOP-3283.1.patch

> MySQL thirdparty tests hang if there's no USER environment variable
> ---
>
> Key: SQOOP-3283
> URL: https://issues.apache.org/jira/browse/SQOOP-3283
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/mysql, test
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Minor
> Attachments: SQOOP-3283.1.patch
>
>
> {{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
> {{whoami}} in a subprocess if there's no USER environment variable (happened 
> to me while running tests from Docker). However, it waits for the Process 
> variable to become null, that never happens:
> {code:java}
> // wait for whoami to exit.
> while (p != null) {
>   try {
> int ret = p.waitFor();
> if (0 != ret) {
>   LOG.error("whoami exited with error status " + ret);
>   // suppress original return value from this method.
>   return null;
> }
>   } catch (InterruptedException ie) {
> continue; // loop around.
>   }
> }
> {code}
> We could get rid of the while loop since {{Process#waitFor()}} blocks while 
> it completes.
> Note, that it's easy to workaround the issue by setting the USER environment 
> variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3283) MySQL thirdparty tests hang if there's no USER environment variable

2018-02-06 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3283:
---

 Summary: MySQL thirdparty tests hang if there's no USER 
environment variable
 Key: SQOOP-3283
 URL: https://issues.apache.org/jira/browse/SQOOP-3283
 Project: Sqoop
  Issue Type: Bug
  Components: connectors/mysql, test
Affects Versions: 1.4.7
Reporter: Daniel Voros
Assignee: Daniel Voros


{{org.apache.sqoop.manager.mysql.MySQLTestUtils#getCurrentUser()}} executes 
{{whoami}} in a subprocess if there's no USER environment variable (happened to 
me while running tests from Docker). However, it waits for the Process variable 
to become null, that never happens:
{code:java}
// wait for whoami to exit.
while (p != null) {
  try {
int ret = p.waitFor();
if (0 != ret) {
  LOG.error("whoami exited with error status " + ret);
  // suppress original return value from this method.
  return null;
}
  } catch (InterruptedException ie) {
continue; // loop around.
  }
}
{code}
We could get rid of the while loop since {{Process#waitFor()}} blocks while it 
completes.

Note, that it's easy to workaround the issue by setting the USER environment 
variable when running the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-01-24 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337614#comment-16337614
 ] 

Daniel Voros commented on SQOOP-3267:
-

I'm linking this to SQOOP-3276, where we can implement the null-string part.

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SQOOP-3276) Enable import of null columns to HBase

2018-01-24 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned SQOOP-3276:
---

Assignee: Daniel Voros

Hi [~elkarel]!

Thanks for reporting this!

We've just discussed the need for such an option in SQOOP-3267. We've agreed on 
adding a new option (--hbase-null-string) for this not to overload the existing 
two (--null-string and --null-non-string).

However, nothing's set in stone, please let me know if you have any concerns 
with this approach!

> Enable import of null columns to HBase
> --
>
> Key: SQOOP-3276
> URL: https://issues.apache.org/jira/browse/SQOOP-3276
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hbase-integration
>Reporter: Karel
>Assignee: Daniel Voros
>Priority: Major
>  Labels: hbase, null-values, sqoop
>
> It would be very useful to to have these options for hbase import:
> --null-string '' 
> --null-non-string '0' 
> The workaround is using coalesce in the query, but that's hard if you import 
> many tables with many columns. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-01-24 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337485#comment-16337485
 ] 

Daniel Voros commented on SQOOP-3267:
-

All right, [~vasas], agreed. (: I'll submit a patch with the 
--hbase-null-incremental-mode option shortly.

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3280) Sqoop import with time data type from mysql is not working as expected.

2018-01-24 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337400#comment-16337400
 ] 

Daniel Voros commented on SQOOP-3280:
-

Hi [~nitish.khanna]!

Thank you for reporting this and for the very detailed description!

I'm afraid the issue here is (at least in part) in MySQL Connector/J. Being a 
JDBC driver, it uses {{java.sql.Time}} that was designed to hold time of day 
values and not durations (as MySQL TIME does). The parsing of TIME columns to 
{{java.sql.Time}} has went through some changes in the recent versions of the 
MySQL Connector/J:

Here's a little comparison of different versions:
||MySQL Connector/J version||Max TIME value to parse into java.sql.Time||
|5.0.3|99:59:59|
|5.1.6|23:59:59|
|5.1.45|24:59:59|
|6.0.2|23:59:59|

Note that the exception thrown for unsupported values has changed in 6.x to:
{quote}The value '24:0:0' is an invalid TIME value. JDBC Time objects represent 
a wall-clock time and not a duration as MySQL treats them. If you are treating 
this type as a duration, consider retrieving this value as a string and dealing 
with it according to your requirements.
{quote}
It is possible to retrieve TIME columns as String in 6.x, but it isn't (for 
bigger values) in 5.x. Knowing this, we could move forward with the 6.x 
connector and retrieving TIME columns as Strings. I'm afraid it would have some 
unexpected side-effects, I'll try to take a look.

In the meantime, there seem to be two possible workarounds for the issue:
 - moving back to 5.0.3, if supporting two-digit times is enough for you. (Of 
course, be careful, this might break something else, I'm not familiar with 
differences between 5.x versions)
 - using [Free-form 
import|https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports]
 for this table and casting the TIME column to string as described in the [last 
comment here|https://bugs.mysql.com/bug.php?id=36051]: {{select concat('', 
timevalue) as timevalue from repro_time;}}

> Sqoop import with time data type from mysql is not working as expected.
> ---
>
> Key: SQOOP-3280
> URL: https://issues.apache.org/jira/browse/SQOOP-3280
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Niitsh Khanna
>Priority: Major
>
> Hi Team,
> Hope you are doing good !!!
> ##
> Problem Statement
> ##
> Sqoop import with time data type from mysql is not working as expected.
> ##
> Detailed Problem Statement
> ##
> We are trying to import the time datatype from Mysql via Sqoop and it is not 
> working as expected and as mentioned in the Mysql document the value time 
> data type can import.
> If we set the time(hour) more than 24 then it doesn't work fine but if we set 
> the hour less than 24 then it imports well.
> Now if we see the mysql 
> document(https://dev.mysql.com/doc/refman/5.7/en/time.html) for Time data 
> range which is '-838:59:59' to '838:59:59' but Sqoop is not working as per 
> this range set.
> Note:- I am creating 2 scenarios (working and non-working) to give more 
> details on this with replication steps that will help you to replicate this 
> in house.
>  
> ##
> Replication Steps > Working Scenario
> ##
> Step 1:- Create table in Mysql.
> --
> mysql> create table repro_time( timevalue time);
> Query OK, 0 rows affected (0.08 sec)
> mysql> insert into repro_time values('24:24:24');
> Query OK, 1 row affected (0.06 sec)
> mysql> select * from repro_time;
> +---+
> | timevalue |
> +---+
> | 24:24:24 |
> +---+
> 1 row in set (0.01 sec)
> Step 2:- Sqoop import into HDFS
> -
> [root@host-10-17-101-232 ~]# export 
> MYCONN=jdbc:mysql://host-10-17-101-231.coe.cloudera.com/test
> [root@host-10-17-101-232 ~]# export MYUSER=*
> [root@host-10-17-101-232 ~]# export MYPSWD=*
> [root@host-10-17-101-232 ~]# sqoop import --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table repro_time --target-dir 
> '/user/root/repro_time' --delete-target-dir -m 1
> Bytes Read=0
>  File Output Format Counters 
>  Bytes Written=9
> 18/01/22 21:41:57 INFO mapreduce.ImportJobBase: Transferred 9 bytes in 
> 17.5695 seconds (0.5123 bytes/sec)
> 18/01/22 21:41:57 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> [root@host-10-17-101-232 ~]# hadoop fs -cat repro_time/p*
> 00:24:24
> Note:- We set the hour as 24 so that's why it has sett 00 over here which is 
> normal behaviour.
> ##
> Replication Steps > Non-Working Scenario
> ##
> Step1:- Create table in Mysql
> --
> mysql> create table repro_time( 

[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-01-23 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335982#comment-16335982
 ] 

Daniel Voros commented on SQOOP-3267:
-

[~vasas], thanks for your reply!

The current implementation does in fact issue separate Put commands, for every 
column, but I've already addressed this issue in  [^SQOOP-3267.1.patch] (see 
first comment on this issue). There is still a slight overhead in adding all 
columns to the Put command but it's way better then 1 Put/column.

Despite this I still agree with you on providing the option to configure the 
behavior (ignore/delete/null-string). However, knowing it's only a small 
performance overhead, we could as well make null-string the default.

All right, let's introduce --hbase-null-string and do not overload 
--null-string.

If you agree with making it the default, I think I'd implement the null-string 
mode in this ticket and open a new Jira for the configurability of 
ignore/delete/null-string and --hbase-null-string.

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2018-01-21 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333491#comment-16333491
 ] 

Daniel Voros commented on SQOOP-3267:
-

[~vasas], [~maugli] thank you both for your replies!

I agree with you on keeping the history. I see two ways to do that.

Option A) is a sort-of workaround. Let users know that Sqoop will delete all 
previous versions of columns when updating them to NULL in the source table and 
ask them to set KEEP_DELETED_CELLS on the underlying HBase table if they still 
want to preserve history.

Option B) is inserting an empty string for NULL values. In detail:
 - Whenever we're importing a NULL value (whether if it's from a new row or an 
updated one) we insert a special string (let's call it NULL_STRING). Note, that 
it would be better to insert NULL, but HBase lacks the notion of NULL.
 - The value of NULL_STRING is "" (empty string) by default but is 
configurable. (Probably via the already existing {{--null-string}} argument.)
 - This behavior should NOT depend on the incremental mode ("append" or 
"lastmodified").

Two notes:
 - Option B) is similar to what was introduced in Phoenix in PHOENIX-1578. 
(When using the STORE_NULLS=true table option, there's no way to tell a NULL 
value from an empty string in Phoenix.)
 - I've also tested how Hive treats NULL values when storing a table in HBase. 
Empty (or deleted) columns are displayed as NULL when reading the table, but 
updating a column to NULL in Hive fails (see HIVE-3336).

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2017-12-06 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280029#comment-16280029
 ] 

Daniel Voros commented on SQOOP-3267:
-

[~maugli] thanks for your response. Is the intention behind append mode to keep 
the history? I thought it's the mode to use when importing an append-only table 
where you're only creating new records but never change the existing ones. Thus 
I thought changes (and so deletes) never happen when you're using append mode 
with the correct last-value. Am I missing something here?

I think it's usually a bad idea to delete only the last version of a column, 
since then a simple "get" in hbase might return an inconsistent state (one that 
never existed on the source side). If we are to keep history we should probably 
put null (or empty string) values instead of deleting.

Please let me know what you think!

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2017-12-05 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278356#comment-16278356
 ] 

Daniel Voros commented on SQOOP-3267:
-

Thank you [~BoglarkaEgyed]!

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2017-12-05 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278274#comment-16278274
 ] 

Daniel Voros commented on SQOOP-3267:
-

Hey [~BoglarkaEgyed],

Thanks for your response. I've created the review request and linked it to the 
issue.

Yeah, I was wondering why I couldn't assign this. Please add me to the list.

Regards,
Daniel

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2017-12-04 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated SQOOP-3267:

Attachment: SQOOP-3267.1.patch

Attaching patch #1. This uses {{Delete#deleteColumns()}} instead of 
{{Delete#deleteColumn()}} to delete all revisions of a column and uses a single 
Put per row.

> Incremental import to HBase deletes only last version of column
> ---
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
>  Issue Type: Bug
>  Components: hbase-integration
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SQOOP-3267) Incremental import to HBase deletes only last version of column

2017-12-01 Thread Daniel Voros (JIRA)
Daniel Voros created SQOOP-3267:
---

 Summary: Incremental import to HBase deletes only last version of 
column
 Key: SQOOP-3267
 URL: https://issues.apache.org/jira/browse/SQOOP-3267
 Project: Sqoop
  Issue Type: Bug
  Components: hbase-integration
Affects Versions: 1.4.7
Reporter: Daniel Voros


Deletes are supported since SQOOP-3149, but we're only deleting the last 
version of a column when the corresponding cell was set to NULL in the source 
table.

This can lead to unexpected and misleading results if the row has been 
transferred multiple times, which can easily happen if it's being modified on 
the source side.

Also SQOOP-3149 is using a new Put command for every column instead of a single 
Put per row as before. This could probably lead to a performance drop for wide 
tables (for which HBase is otherwise usually recommended).

[~jilani], [~anna.szonyi] could you please comment on what you think would be 
the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)