Re: [ANNOUNCE] New Sqoop PMC member - Szabolcs Vasas

2018-11-06 Thread Markus Kemper
Awesome news!

Thanks, Markus

> On Nov 6, 2018, at 04:48, Boglarka Egyed  wrote:
> 
> On behalf of the Apache Sqoop PMC, I am pleased to welcome Szabolcs Vasas as
> a new Sqoop PMC Member. Please join me in congratulating him!
> 
> Szabolcs has countless code contributions[1] as well as provides thorough 
> reviews
> for others constantly[2]. He continuously offers help to new contributors 
> enabling
> the project to grow and he also demonstrated a huge interest to shape the 
> project
> with helping and scoping previous and upcoming releases.
> 
> Szabolcs's hard work is much appreciated and we look forward to his
> continued contributions!
> 
> 1: https://s.apache.org/nzgU
> 2: https://reviews.apache.org/users/vasas/reviews/
> 
> Kind Regards,
> Bogi


Re: [ANNOUNCE] Apache Sqoop 1.4.7 released

2018-01-31 Thread Markus Kemper
Thank you Attila!


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Wed, Jan 31, 2018 at 8:20 AM, Attila Szabó <mau...@apache.org> wrote:

> The Apache Sqoop team is pleased to announce the release of Sqoop 1.4.7.
>
> This is the sixth release of Apache Sqoop as a TLP project. Sqoop is a tool 
> designed
> for efficiently transferring bulk data between Apache Hadoop and structured 
> datastores,
> such as relational databases.
>
> The release is available here:http://www.apache.org/dyn/closer.cgi/sqoop/
>
> The full change log is available 
> here:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311320=12329683
>
> Your help and feedback is more than welcome. For more information on how to 
> report problems
> and to get involved, visit the project website at http://sqoop.apache.org/.
>
> The Apache Sqoop Team
>
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>  Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> <#m_4264704754596709160_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


[jira] [Commented] (SQOOP-3014) Sqoop with HCatalog import loose precision for large numbers that does not fit into double

2017-08-31 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149309#comment-16149309
 ] 

Markus Kemper commented on SQOOP-3014:
--

Adding additional test case (below) with the following comments:
* Import into HDFS seems to be working as expected
* The hcatalog load phase is not working as noted by others
** The issue does not appear to be specific to HDFS ORC files

*Test Case*
{noformat}
###
# STEP 01 - CREATE SQL TABLE AND DATA
###

export MYCONN=jdbc:oracle:thin:@sqoop.apache.com:1521/db11g
export MYUSER=sqoop
export MYPSWD=sqoop
sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table sqoop_3014"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table sqoop_3014 (c1 integer, c2 decimal(22,5), c3 varchar(40))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into sqoop_3014 values (1, 454018528782.42006329, 
'454018528782.42006329')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into sqoop_3014 values (2, 87658675864540185.123456789123456789, 
'87658675864540185.123456789123456789')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from sqoop_3014"

--
| C1   | C2   | C3   | 
--
| 1| 454018528782.42006   | 454018528782.42006329 | 
| 2| 87658675864540185.12346 | 
87658675864540185.123456789123456789 | 
--

###
# STEP 02 - IMPORT DATA INTO HDFS (--as-textfile)
###

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
SQOOP_3014 --target-dir /user/root/sqoop_3014 --delete-target-dir --num-mappers 
1 --as-textfile --verbose
hdfs dfs -cat /user/root/sqoop_3014/part*

Output:
1,454018528782.42006,454018528782.42006329
2,87658675864540185.12346,87658675864540185.123456789123456789

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from sqoop_3014 where \$CONDITIONS" --target-dir 
/user/root/sqoop_3014 --delete-target-dir --num-mappers 1 --as-textfile 
--verbose
hdfs dfs -cat /user/root/sqoop_3014/part*

Output:
1,454018528782.42006,454018528782.42006329
2,87658675864540185.12346,87658675864540185.123456789123456789

###
# STEP 03 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as 
textfile")
###

beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_text purge;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 
--hcatalog-table sqoop_3014_text --create-hcatalog-table 
--hcatalog-storage-stanza "stored as textfile" --num-mappers 1 --verbose
beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_text; show 
create table sqoop_3014_text;"

Output:
+-+-+---+--+
| sqoop_3014_text.c1  | sqoop_3014_text.c2  |  sqoop_3014_text.c3   
|
+-+-+---+--+
| 1   | 454018528782.42004  | 454018528782.42006329 
|
| 2   | 87658675864540192   | 
87658675864540185.123456789123456789  |
+-+-+---+--+

###
# STEP 04 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as orc")
###

beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_orc purge;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 
--hcatalog-table sqoop_3014_orc --create-hcatalog-table 
--hcatalog-storage-stanza "stored as orc" --num-mappers 1 --verbose
beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_orc; show create 
table sqoop_3014_orc;"

Output:
+-+-+---+--+
| sqoop_3014_text.c1  | sqoop_3014_text.c2  |  sqoop_3014_text.c3   
|
+-+-+---+--+
| 1   

[jira] [Commented] (SQOOP-3066) Introduce an option + env variable to enable/disable SQOOP-2737 feature

2017-08-29 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145336#comment-16145336
 ] 

Markus Kemper commented on SQOOP-3066:
--

The option added for this is: \-\-oracle-escaping-disabled true|false

> Introduce an option + env variable to enable/disable SQOOP-2737 feature
> ---
>
> Key: SQOOP-3066
> URL: https://issues.apache.org/jira/browse/SQOOP-3066
> Project: Sqoop
>  Issue Type: Improvement
>Affects Versions: 1.4.6
>Reporter: Attila Szabo
>Assignee: Attila Szabo
>Priority: Critical
> Fix For: 1.4.7
>
> Attachments: SQOOP-3066.patch
>
>
> After [SQOOP-2737] several users ran into that their SQOOP commands are 
> failing due to the fact the original commands were not phrased cases 
> sensitive table/column/schema names in mind.
> There had been also another outcome of this feature, that the "--split-by" 
> option does not accept Oracle functions anymore (e.g. MOD(col_name,4) ), as 
> after correct escaping+quoting it would be handled by Oracle as a db col 
> name, instead of an expression to evaluate.
> My goal here is to introduce an option to turn on/off the (fully proper and 
> industry standard) escaping implemented in [SQOOP-2737], and also add an 
> environment variable support for that, thus users would have the backward 
> compatible fallback plan, and not changing their command.
> I do also plan to implement an argument for supporting split-by with database 
> functions/expressions, and thus the escaping and the split-by expressions 
> features would become independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Getting upper bound in --incremental mode

2017-07-20 Thread Markus Kemper
Hey Jagrut,

Are you using the Sqoop1 Metastore job tool (assuming yes)?
Are you wanting to override the current stored --last-value when executing
the Sqoop job?



Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <jagrutsha...@gmail.com>
wrote:

> Hi Markus - The question was that --incremental with --lastmodified option
> always takes the current time as the upper bound, and this gets stored as
> the --last-value for the next run.
>
> In certain cases, it is desirable that the upper bound should come from
> the actual column values, and that should get set for the --last-value for
> next run.
> -
> Jagrut
>
>
>
> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <mar...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Can you elaborate more about the problem you are facing and what you mean
>> by (Is this possible to set while running sqoop?).
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>>
>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <jagrutsha...@gmail.com>
>> wrote:
>>
>> > Hi Tony - I was under the assumption that append mode will not work for
>> > timestamp column. But I gave it a try after your reply, and it works.
>> And
>> > it gets the upper bound from the database itself. Thanks.
>> >
>> > --
>> > Jagrut
>> >
>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <t...@phdata.io> wrote:
>> >
>> >> Does `--incremental append` work for you?
>> >>
>> >> > You should specify append mode when importing a table where new rows
>> >> are continually being added with increasing row id values
>> >>
>> >> Tony
>> >>
>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsha...@gmail.com>
>> >> wrote:
>> >> >
>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> >> 1.4.2)
>> >> > generates a query like:
>> >> > WHERE column >= last_modified_time and column < current_time
>> >> >
>> >> > The --last-value is set to the current_time and gets used for the
>> next
>> >> run.
>> >> >
>> >> > Here, the upper bound is always set to the current_time. In some
>> cases,
>> >> > this upper bound is required to be taken from the database table
>> column
>> >> > itself. So, the query is required of the form:
>> >> > WHERE column >= last_modified_time and column <
>> >> max_time_in_db_table_column
>> >> >
>> >> > And the --last-value for next run needs to be set as
>> >> > the max_time_in_db_table_column (and not the current_time).
>> >> >
>> >> > Is this possible to set while running sqoop?  If no, is there any
>> >> > workaround suggested for this?
>> >> >
>> >> > Thanks a lot.
>> >> > --
>> >> > Jagrut
>> >>
>> >>
>> >
>>
>
>
>
> --
> Jagrut
>
>


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2017-07-20 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094664#comment-16094664
 ] 

Markus Kemper commented on SQOOP-3042:
--

As a possible workaround the Sqoop Environment script could be used to manage 
the contents of the compile dir this.

Sample Code
{noformat}
cat /etc/sqoop/conf/sqoop-env.sh 
echo
echo SQOOP COMPILE DIR DIRECTORY COUNT
ls -ltd /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR FILES COUNT
ls -ltR /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR OLDEST DIRECTORY
ls -ltd /tmp/sqoop*/compile/* | tail -1
echo SQOOP COMPILE DIR TMPWATCH 60d DELETE TEST
tmpwatch --test 60d /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR TMPWATCH 60d DELETE
tmpwatch 60d /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR DIRECTORY COUNT
ls -ltd /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR FILES COUNT
ls -ltR /tmp/sqoop*/compile/* | wc
echo SQOOP COMPILE DIR OLDEST DIRECTORY
ls -ltd /tmp/sqoop*/compile/* | tail -1
echo
{noformat}

*Sample Output*
{noformat}
$ sqoop version
SQOOP COMPILE DIR DIRECTORY COUNT
   1410   12690  142458
SQOOP COMPILE DIR FILES COUNT
   6853   27702  247866
SQOOP COMPILE DIR OLDEST DIRECTORY
drwxrwxr-x 2 user1 user1  4096 Dec  6  2016 
/tmp/sqoop-user1/compile/734bf9f8090d24a8757303c44faf2825
SQOOP COMPILE DIR TMPWATCH 60d DELETE TEST
  0   0   0
SQOOP COMPILE DIR TMPWATCH 60d DELETE
  0   0   0
SQOOP COMPILE DIR DIRECTORY COUNT
   1410   12690  142458
SQOOP COMPILE DIR FILES COUNT
   6853   27702  247866
SQOOP COMPILE DIR OLDEST DIRECTORY
drwxrwxr-x 2 user1 user1  4096 Dec  6  2016 
/tmp/sqoop-user1/compile/734bf9f8090d24a8757303c44faf2825

Warning: 
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/bin/../lib/sqoop/../accumulo 
does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/07/20 06:16:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1
Sqoop 1.4.6-cdh5.10.1
git commit id 
Compiled by jenkins on Mon Mar 20 02:34:12 PDT 2017
{noformat}

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Getting upper bound in --incremental mode

2017-07-19 Thread Markus Kemper
Hey Jagrut,

Can you elaborate more about the problem you are facing and what you mean
by (Is this possible to set while running sqoop?).


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <jagrutsha...@gmail.com>
wrote:

> Hi Tony - I was under the assumption that append mode will not work for
> timestamp column. But I gave it a try after your reply, and it works. And
> it gets the upper bound from the database itself. Thanks.
>
> --
> Jagrut
>
> On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <t...@phdata.io> wrote:
>
>> Does `--incremental append` work for you?
>>
>> > You should specify append mode when importing a table where new rows
>> are continually being added with increasing row id values
>>
>> Tony
>>
>> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsha...@gmail.com>
>> wrote:
>> >
>> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> 1.4.2)
>> > generates a query like:
>> > WHERE column >= last_modified_time and column < current_time
>> >
>> > The --last-value is set to the current_time and gets used for the next
>> run.
>> >
>> > Here, the upper bound is always set to the current_time. In some cases,
>> > this upper bound is required to be taken from the database table column
>> > itself. So, the query is required of the form:
>> > WHERE column >= last_modified_time and column <
>> max_time_in_db_table_column
>> >
>> > And the --last-value for next run needs to be set as
>> > the max_time_in_db_table_column (and not the current_time).
>> >
>> > Is this possible to set while running sqoop?  If no, is there any
>> > workaround suggested for this?
>> >
>> > Thanks a lot.
>> > --
>> > Jagrut
>>
>>
>


[jira] [Created] (SQOOP-3212) Sqoop1 (export + --export-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)

2017-07-17 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3212:


 Summary: Sqoop1 (export + --export-dir + --table) using DB2 with 
case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704)
 Key: SQOOP-3212
 URL: https://issues.apache.org/jira/browse/SQOOP-3212
 Project: Sqoop
  Issue Type: Bug
 Environment: $ sqoop version
17/07/17 15:22:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1
Reporter: Markus Kemper


Sqoop1 (export + --export-dir + --table) using DB2 with case-sensitive-table 
fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), test case below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:db2://host.domain.com:5/SQOOP
export MYUSER=sqoop;
export MYPSWD=sqoop;

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_default"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_default (c1_default int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_default values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1_default from t1_default"

---
| C1_DEFAULT  | 
---
| 1   | 
---

#
# STEP 02 - Import and Export Data (baseline) using (--as-textfile, 
--as-avrodatafile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-textfile
hdfs dfs -cat /user/root/t1_default/part*
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --export-dir /user/root/t1_default --num-mappers 1 
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1_default from t1_default"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_default"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_default values (1)"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-avrodatafile
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --export-dir /user/root/t1_default --num-mappers 1 
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1_default from t1_default"

Output
17/07/17 14:57:22 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 22.3116 
seconds (0.0896 bytes/sec)
17/07/17 14:57:22 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1
~
17/07/17 14:58:37 INFO mapreduce.ExportJobBase: Transferred 138 bytes in 
21.4796 seconds (6.4247 bytes/sec)
17/07/17 14:58:37 INFO mapreduce.ExportJobBase: Exported 1 records.
~
---
| C1_DEFAULT  | 
---
| 1   | 
| 1   | 
---
~
17/07/17 15:04:18 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 
24.5847 seconds (10.6977 bytes/sec)
17/07/17 15:04:18 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
17/07/17 15:06:18 INFO mapreduce.ExportJobBase: Transferred 653 bytes in 
22.1808 seconds (29.4398 bytes/sec)
17/07/17 15:06:18 INFO mapreduce.ExportJobBase: Exported 1 records.
~
---
| C1_DEFAULT  | 
---
| 1   | 
| 1   | 
---

#
# STEP 03 - Create Table and Data with case-sensitive DB2 table name
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table \"t1_lower\""
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table \"t1_lower\" (\"c1_lower\" int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into \"t1_lower\" values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select \"c1_lower\" from \"t1_lower\""
sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | 
egrep -i "t1"

Output:
---
| c1_lower| 
---
| 1   | 
---
~
T1_DEFAULT
t1_lower

#
# STEP 04 - Import and Export Data with case-sensitive DB2 table name using 
(--as-textfile, --as-avrodatafile
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from \"t1_lower\" where \$CONDITIONS" --target-dir 
/user/root/t1_lower --delete-target-dir --num-mappers 1 --as-textfile
hdfs dfs -cat /user/root/t1_lower/part

[jira] [Updated] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)

2017-07-17 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3211:
-
Description: 
Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table 
fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case 
below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:db2://host.domain.com:5/SQOOP
export MYUSER=sqoop;
export MYPSWD=sqoop;

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_default"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_default (c1_default int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_default values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1_default from t1_default"

---
| C1_DEFAULT  | 
---
| 1   | 
---

#
# STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, 
--as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-textfile
hdfs dfs -cat /user/root/t1_default/part*
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-avrodatafile
avro-tools tojson --pretty 
'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro'
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-parquetfile
hdfs dfs -ls /user/root/t1_default/*.parquet
parquet-tools cat --json 
'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet'

Output:
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 
seconds (0.0661 bytes/sec)
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1
~
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 
22.8116 seconds (11.5292 bytes/sec)
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
{
  "C1_DEFAULT" : {
"int" : 1
  }
}
~
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 
29.3071 seconds (43.9484 bytes/sec)
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
-rw-r--r--   3 root root449 2017-07-17 14:06 
/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet
~
{"C1_DEFAULT":1}

#
# STEP 03 - Create Table and Data with case-sensitive DB2 table name
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table \"t1_lower\""
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table \"t1_lower\" (\"c1_lower\" int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into \"t1_lower\" values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select \"c1_lower\" from \"t1_lower\""
sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | 
egrep -i "t1"

Output:
---
| c1_lower| 
---
| 1   | 
---
~
T1_DEFAULT
t1_lower

#
# STEP 04 - Import Data with case-sensitive DB2 table name using 
(--as-textfile, --as-avrodatafile, --as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-textfile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-avrodatafile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-parquetfile

Output:
17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM t1_lower AS t WHERE 1=0
17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: 
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53
at com.ibm.db2.jcc.am.fd.a(f

[jira] [Updated] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)

2017-07-17 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3211:
-
Description: 
Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table 
fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case 
below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:db2://host.domain.com:5/SQOOP
export MYUSER=sqoop;
export MYPSWD=sqoop;

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1 from t1"

---
| C1  | 
---
| 1   | 
---

#
# STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, 
--as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-textfile
hdfs dfs -cat /user/root/t1_default/part*
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-avrodatafile
avro-tools tojson --pretty 
'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro'
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-parquetfile
hdfs dfs -ls /user/root/t1_default/*.parquet
parquet-tools cat --json 
'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet'

Output:
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 
seconds (0.0661 bytes/sec)
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1
~
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 
22.8116 seconds (11.5292 bytes/sec)
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
{
  "C1_DEFAULT" : {
"int" : 1
  }
}
~
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 
29.3071 seconds (43.9484 bytes/sec)
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
-rw-r--r--   3 root root449 2017-07-17 14:06 
/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet
~
{"C1_DEFAULT":1}

#
# STEP 03 - Create Table and Data with case-sensitive DB2 table name
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table \"t1_lower\""
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table \"t1_lower\" (\"c1_lower\" int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into \"t1_lower\" values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select \"c1_lower\" from \"t1_lower\""
sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | 
egrep -i "t1"

Output:
---
| c1_lower| 
---
| 1   | 
---
~
T1_DEFAULT
t1_lower

#
# STEP 04 - Import Data with case-sensitive DB2 table name using 
(--as-textfile, --as-avrodatafile, --as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-textfile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-avrodatafile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-parquetfile

Output:
17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM t1_lower AS t WHERE 1=0
17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: 
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53
at com.ibm.db2.jcc.am.fd.a(fd.java:739)
~
17/07/17 14:14:47 INFO manager.SqlManager:

[jira] [Created] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)

2017-07-17 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3211:


 Summary: Sqoop1 (import + --target-dir + --table) using DB2 with 
case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, 
SQLSTATE=42704)
 Key: SQOOP-3211
 URL: https://issues.apache.org/jira/browse/SQOOP-3211
 Project: Sqoop
  Issue Type: Bug
 Environment: $ sqoop version
17/07/17 14:22:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1
Reporter: Markus Kemper


Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table 
fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case 
below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:db2://host.domain.com:5/SQOOP
export MYUSER=sqoop;
export MYPSWD=sqoop;

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select c1 from t1"

---
| C1  | 
---
| 1   | 
---

#
# STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, 
--as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
--as-textfile
hdfs dfs -cat /user/root/t1/part*

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-avrodatafile
avro-tools tojson --pretty 
'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro'
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 
1 --as-parquetfile
hdfs dfs -ls /user/root/t1_default/*.parquet
parquet-tools cat --json 
'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet'


Output:
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 
seconds (0.0661 bytes/sec)
17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1
~
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 
22.8116 seconds (11.5292 bytes/sec)
17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
{
  "C1_DEFAULT" : {
"int" : 1
  }
}
~
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 
29.3071 seconds (43.9484 bytes/sec)
17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
-rw-r--r--   3 root root449 2017-07-17 14:06 
/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet
~
{"C1_DEFAULT":1}

#
# STEP 03 - Create Table and Data with case-sensitive DB2 table name
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table \"t1_lower\""
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table \"t1_lower\" (\"c1_lower\" int)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into \"t1_lower\" values (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select \"c1_lower\" from \"t1_lower\""
sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | 
egrep -i "t1"

Output:
---
| c1_lower| 
---
| 1   | 
---
~
T1_DEFAULT
t1_lower

#
# STEP 04 - Import Data with case-sensitive DB2 table name using 
(--as-textfile, --as-avrodatafile, --as-parquetfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-textfile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-avrodatafile
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
\"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 
1 --as-parquetfile

Output:
17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM t1_lower AS t WHERE 1=0
17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: 
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2

[jira] [Created] (SQOOP-3210) Sqoop1 (import + --hive-import + --map-column-hive) using column aliases fails with error (Import failed: No column by the name)

2017-07-17 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3210:


 Summary: Sqoop1 (import + --hive-import + --map-column-hive) using 
column aliases fails with error (Import failed: No column by the name)
 Key: SQOOP-3210
 URL: https://issues.apache.org/jira/browse/SQOOP-3210
 Project: Sqoop
  Issue Type: Improvement
 Environment: $ sqoop version
17/07/17 12:28:51 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1
Reporter: Markus Kemper


Sqoop1 (import + --hive-import + --map-column-hive) using column aliases fails 
with error (Import failed: No column by the name), see test case below.  

Please enable the ability to use column aliases.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

[example]
export MYCONN=jdbc:mysql://host.domain.com:3306/db1
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
-
| c1  | c2 | c3 | 
-
| 1   | 2017-07-17 | some data  | 
-

#
# STEP 02 - Verify Hive Import (baseline)
#

beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir 
/data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' 
--lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 
--hive-table t1 
beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1"

Output:
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| c1| int|  |
| c2| string |  |
| c3| string |  |
+---++--+--+
++-++--+
| t1.c1  |t1.c2|   t1.c3|
++-++--+
| 1  | 2017-07-17  | some data  |
++-++--+

#
# STEP 03 - Verify Hive Import with (--map-column-hive)
#

beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir 
/data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' 
--lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 
--hive-table t1 --map-column-hive "c2=date,c3=varchar(10)"  
beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1"

+---+--+--+--+
| col_name  |  data_type   | comment  |
+---+--+--+--+
| c1| int  |  |
| c2| date |  |
| c3| varchar(10)  |  |
+---+--+--+--+
++-++--+
| t1.c1  |t1.c2|   t1.c3|
++-++--+
| 1  | 2017-07-17  | some data  |
++-++--+

#
# STEP 04 - Verify Hive Import with (--map-column-hive) and (database.table) 
notation
#

beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir 
/data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' 
--lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 
--hive-table t1 --map-column-hive "db1.c2=date,db1.c3=varchar(10)"  
beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1"

Output:
17/07/17 12:10:47 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 20.602 
seconds (1.1164 bytes/sec)
17/07/17 12:10:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.
17/07/17 12:10:47 INFO manager.SqlManager: Executing SQL statement: select * 
from t1 where  (1 = 0) 
17/07/17 12:10:47 INFO manager.SqlManager: Executing SQL statement: select * 
from t1 where  (1 = 0) 
17/07/17 12:10:47 ERROR tool.ImportTool: Import failed: No column by the name 
db1.c3found while importing data

#
# STEP 05 - Verify Hive Import with (--map-column-hive) and (co

[jira] [Created] (SQOOP-3188) Sqoop1 (import + --target-dir) with empty directory (/usr/lib/hive) fails with error (java.lang.NoClassDefFoundError: org/json/JSONObject)

2017-05-10 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3188:


 Summary: Sqoop1 (import + --target-dir) with empty directory 
(/usr/lib/hive) fails with error (java.lang.NoClassDefFoundError: 
org/json/JSONObject)
 Key: SQOOP-3188
 URL: https://issues.apache.org/jira/browse/SQOOP-3188
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (import + --target-dir) with empty directory (/usr/lib/hive) fails with 
error (java.lang.NoClassDefFoundError: org/json/JSONObject), see test case 
below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:mysql://mysql.sqoop.com:3306/sqoop
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
-
| c1  | c2 | c3 | 
-
| 1   | 2017-05-10 | some data  | 
-

#
# STEP 02 - Import Data into HDFS 
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/10 13:46:24 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 22.65 
seconds (1.0155 bytes/sec)
17/05/10 13:46:24 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1,2017-05-10,some data

#
# STEP 03 - Create Bogus Hive Directory and Attempt to Import into HDFS
#

mkdir /usr/lib/hive
chmod 777 /usr/lib/hive
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1

Output:
17/05/10 13:47:44 INFO mapreduce.ImportJobBase: Beginning import of t1
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
at 
org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:776)
at 
org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:388)
at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:374)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:256)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at 
org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 15 more

#
# STEP 04 - Remove Bogus Hive Directory and Attempt to Import into HDFS 
#

rm -rf /usr/lib/hive
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/10 13:52:30 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 22.6361 
seconds (1.0161 bytes/sec)
17/05/10 13:52:30 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1,2017-05-10,some data
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3186) Add Sqoop1 (import + --incremental + --check-column) support for functions/expressions

2017-05-08 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3186:


 Summary: Add Sqoop1 (import + --incremental + --check-column) 
support for functions/expressions
 Key: SQOOP-3186
 URL: https://issues.apache.org/jira/browse/SQOOP-3186
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Add Sqoop1 (import + --incremental + --check-column) support for 
functions/expressions, for example:

*Example*
{noformat}
sqoop import \
--connect $MYCONN --username $MYUSER --password $MYPSWD \
--table T1 --target-dir /path/directory --merge-key C1 \
--incremental lastmodified  --last-value '2017-01-01 00:00:00.0' \
--check-column nvl(C4,to_date('2017-01-01 00:00:00')
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SQOOP-3184) Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations

2017-05-06 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3184:
-
Description: 
Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported 
and/or limitations

===
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
7.2.10. Incremental Imports
===

  was:
Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key 
option

===
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
7.2.10. Incremental Imports
Table 5. Incremental import arguments:

Sqoop supports two types of incremental imports: append and lastmodified. You 
can use the --incremental argument to specify the type of incremental import to 
perform.
--check-column (col)
--incremental (mode)
--last-value (value)

An alternate table update strategy supported by Sqoop is called lastmodified 
mode. You should use this when rows of the source table may be updated, and 
each such update will set the value of a last-modified column to the current 
timestamp. Rows where the check column holds a timestamp more recent than the 
timestamp specified with --last-value are imported.
===


> Sqoop1 (doc import + --incremental) does not cover HDFS file formats 
> supported and/or limitations
> -
>
> Key: SQOOP-3184
> URL: https://issues.apache.org/jira/browse/SQOOP-3184
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
>
> Sqoop1 (doc import + --incremental) does not cover HDFS file formats 
> supported and/or limitations
> ===
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
> 7.2.10. Incremental Imports
> ===



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3184) Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations

2017-05-06 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3184:


 Summary: Sqoop1 (doc import + --incremental) does not cover HDFS 
file formats supported and/or limitations
 Key: SQOOP-3184
 URL: https://issues.apache.org/jira/browse/SQOOP-3184
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key 
option

===
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
7.2.10. Incremental Imports
Table 5. Incremental import arguments:

Sqoop supports two types of incremental imports: append and lastmodified. You 
can use the --incremental argument to specify the type of incremental import to 
perform.
--check-column (col)
--incremental (mode)
--last-value (value)

An alternate table update strategy supported by Sqoop is called lastmodified 
mode. You should use this when rows of the source table may be updated, and 
each such update will set the value of a last-modified column to the current 
timestamp. Rows where the check column holds a timestamp more recent than the 
timestamp specified with --last-value are imported.
===



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3183) Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option

2017-05-06 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3183:


 Summary: Sqoop1 (doc import + --incremental lastmodified) does not 
cover the --merge-key option
 Key: SQOOP-3183
 URL: https://issues.apache.org/jira/browse/SQOOP-3183
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key 
option

===
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
7.2.10. Incremental Imports
Table 5. Incremental import arguments:

Sqoop supports two types of incremental imports: append and lastmodified. You 
can use the --incremental argument to specify the type of incremental import to 
perform.
--check-column (col)
--incremental (mode)
--last-value (value)

An alternate table update strategy supported by Sqoop is called lastmodified 
mode. You should use this when rows of the source table may be updated, and 
each such update will set the value of a last-modified column to the current 
timestamp. Rows where the check column holds a timestamp more recent than the 
timestamp specified with --last-value are imported.
===



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SQOOP-3182) Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Can't parse input data: 'PAR1')

2017-05-06 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3182:
-
Summary: Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) 
fails with (Can't parse input data: 'PAR1')  (was: Sqoop1 (import + 
--incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 
'PAR1'))

> Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with 
> (Can't parse input data: 'PAR1')
> 
>
> Key: SQOOP-3182
> URL: https://issues.apache.org/jira/browse/SQOOP-3182
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
>
> Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with 
> (Can't parse input data: 'PAR1').  See test case below.
> *Test Case*
> {noformat}
> #
> # STEP 01 - Create Table and Data
> #
> export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g;
> export MYUSER=sqoop
> export MYPSWD=sqoop
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> Output:
> -
> | C1   | C2  | C3 | C4  | 
> -
> | 1| 2017-05-06 06:59:02.0 | NEW ROW 1  | 2017-05-06 
> 06:59:02 | 
> -
> #
> # STEP 02 - Import Data into HDFS 
> #
> hdfs dfs -rm -r /user/root/t1
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 
> --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile 
> --map-column-java C2=String,C4=String --num-mappers 1 --verbose 
> hdfs dfs -ls /user/root/t1/*.parquet
> parquet-tools cat --json 
> 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet'
> Output:
> 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 
> 23.6174 seconds (113.8988 bytes/sec)
> 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 17/05/06 07:01:34 INFO tool.ImportTool:   --last-value 2017-05-06 07:01:09.0
> ~
> -rw-r--r--   3 root root   1144 2017-05-06 07:01 
> /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet
> ~
> {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 
> 06:59:02"}
> #
> # STEP 03 - Insert New Row and Update Existing Row
> #
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 order by c1"
> Output:
> -
> | C1   | C2  | C3 | C4  | 
> -
> | 1| 2017-05-06 06:59:02.0 | UPDATE 1   | 2017-05-06 
> 07:04:40 | 
> | 2| 2017-05-06 07:04:38.0 | NEW ROW 2  | 2017-05-06 
> 07:04:38 | 
> -
> #
> # STEP 04 - Import Data into HDFS and Merge changes 
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 
> --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile 
> --map-column-java C2=String,C4=String --num-mappers 1 --verbose 
> Output:
> 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 
> 27.4934 seconds (99.1148 bytes/sec)
> 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retri

[jira] [Created] (SQOOP-3182) Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1')

2017-05-06 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3182:


 Summary: Sqoop1 (import + --incremental + --merge-key + 
--as-parquet) fails with (Can't parse input data: 'PAR1')
 Key: SQOOP-3182
 URL: https://issues.apache.org/jira/browse/SQOOP-3182
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't 
parse input data: 'PAR1').  See test case below.

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
-
| C1   | C2  | C3 | C4  | 
-
| 1| 2017-05-06 06:59:02.0 | NEW ROW 1  | 2017-05-06 
06:59:02 | 
-

#
# STEP 02 - Import Data into HDFS 
#

hdfs dfs -rm -r /user/root/t1
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/root/t1 --incremental lastmodified --check-column C4 
--merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile 
--map-column-java C2=String,C4=String --num-mappers 1 --verbose 
hdfs dfs -ls /user/root/t1/*.parquet
parquet-tools cat --json 
'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet'

Output:
17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 23.6174 
seconds (113.8988 bytes/sec)
17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records.
17/05/06 07:01:34 INFO tool.ImportTool:   --last-value 2017-05-06 07:01:09.0
~
-rw-r--r--   3 root root   1144 2017-05-06 07:01 
/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet
~
{"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 
06:59:02"}

#
# STEP 03 - Insert New Row and Update Existing Row
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

Output:
-
| C1   | C2  | C3 | C4  | 
-
| 1| 2017-05-06 06:59:02.0 | UPDATE 1   | 2017-05-06 
07:04:40 | 
| 2| 2017-05-06 07:04:38.0 | NEW ROW 2  | 2017-05-06 
07:04:38 | 
-

#
# STEP 04 - Import Data into HDFS and Merge changes 
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/root/t1 --incremental lastmodified --check-column C4 
--merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile 
--map-column-java C2=String,C4=String --num-mappers 1 --verbose 

Output:
17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 
27.4934 seconds (99.1148 bytes/sec)
17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records.
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@121fdcee
17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run 
merge job.
17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: 
4bc6b65cd0194b81938f4660974ee392_T1
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through 
URL: 
jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is 
java.net.FactoryURLClassLoader@121fdcee
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1
17/05/06 07:06:43 ERROR tool.ImportTool: Import failed:

[jira] [Created] (SQOOP-3181) Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Could not find class .)

2017-05-06 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3181:


 Summary: Sqoop1 (import + --incremental + --merge-key + 
--as-parquetfile) fails with (Could not find class .)
 Key: SQOOP-3181
 URL: https://issues.apache.org/jira/browse/SQOOP-3181
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with 
(Could not find class .).  See test case below

*Test Case*
{noformat}
#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
-
| C1   | C2  | C3 | C4  | 
-
| 1| 2017-05-06 06:59:02.0 | NEW ROW 1  | 2017-05-06 
06:59:02 | 
-

#
# STEP 02 - Import Data into HDFS 
#

hdfs dfs -rm -r /user/root/t1
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/root/t1 --incremental lastmodified --check-column C4 
--merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile 
--map-column-java C2=String,C4=String --num-mappers 1 --verbose 
hdfs dfs -ls /user/root/t1/*.parquet
parquet-tools cat --json 
'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet'

Output:
17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 23.6174 
seconds (113.8988 bytes/sec)
17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records.
17/05/06 07:01:34 INFO tool.ImportTool:   --last-value 2017-05-06 07:01:09.0
~
-rw-r--r--   3 root root   1144 2017-05-06 07:01 
/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet
~
{"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 
06:59:02"}

#
# STEP 03 - Insert New Row and Update Existing Row
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

Output:
-
| C1   | C2  | C3 | C4  | 
-
| 1| 2017-05-06 06:59:02.0 | UPDATE 1   | 2017-05-06 
07:04:40 | 
| 2| 2017-05-06 07:04:38.0 | NEW ROW 2  | 2017-05-06 
07:04:38 | 
-

#
# STEP 04 - Import Data into HDFS and Merge changes 
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/root/t1 --incremental lastmodified --check-column C4 
--merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile 
--map-column-java C2=String,C4=String --num-mappers 1 --verbose 

Output:
17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 
27.4934 seconds (99.1148 bytes/sec)
17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records.
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@121fdcee
17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run 
merge job.
17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: 
4bc6b65cd0194b81938f4660974ee392_T1
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through 
URL: 
jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is 
java.net.FactoryURLClassLoader@121fdcee
17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1
17/05/06 07:06:43 ERROR tool.ImportTool: Import failed: java.io.IOException: 
Could not load jar 
/tmp/sqoop-root/compi

[jira] [Updated] (SQOOP-3179) Add Sqoop1 (import + --incremental lastmodified + --check-column) support for NULLs

2017-05-05 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3179:
-
Description: 
When using Sqoop (import + --incremental lastmodified + --check-column) if the 
check column contains NULLs the rows are ignored.

Please consider adding the ability to include check column NULL values

{noformat}
#
# STEP 01 - Create RDBMS Table and Data
#

export MYCONN=jdbc:oracle:thin:@myoracle.mydomain.com:1521/db11g
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(15), c4 timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, sysdate, 'data row 1', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

--
| C1   | C2  | C3  | C4  | 
--
| 1| 2017-05-04 15:17:33.0 | data row 1  | 2017-05-04 
15:17:33 | 
--

#
# STEP 02 - Create Sqoop Incremental Import Job
#

sqoop job --create inc_import_hdfs -- import --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental 
lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 
00:00:00' --as-textfile --map-column-java C2=String,C4=String 
--fields-terminated-by '\001' --lines-terminated-by '\n' --num-mappers 1 
--verbose --hive-drop-import-delims

#
# STEP 03 - Execute Job and Verify data in HDFS
#

hdfs dfs -rm -r /user/root/t1
sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 24.6352 
seconds (2.2326 bytes/sec)
17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33

#
# STEP 04 - Insert New Rows (one with NULL --check-column)
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (2, sysdate, 'data row 2', NULL)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (3, sysdate, 'data row 3', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

Output:
--
| C1   | C2  | C3  | C4  | 
--
| 1| 2017-05-04 15:17:33.0 | data row 1  | 2017-05-04 
15:17:33 | 
| 2| 2017-05-04 15:27:19.0 | data row 2  | (null)  
| 
| 3| 2017-05-04 15:27:22.0 | data row 3  | 2017-05-04 
15:27:22 | 
--

#
# STEP 05 - Execute Job and Verify data in HDFS (row with NULL --check-column 
is not imported)
#

sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 21.6227 
seconds (2.5436 bytes/sec)
17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33
32017-05-04 15:27:22.0data row 32017-05-04 15:27:22

#
# STEP 06 - Update Row with NULL --check-column, Execute Job and Verify data in 
HDFS
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"update t1 set c4 = sysdate where c1 = 2"
sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 27.2132 
seconds (2.0211 bytes/sec)
17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33
22017-05-04 15:27:19.0data row 22017-05-04 15:31:55 <===
32017-05-04 15:27:22.0data row 32017-05-04 15:27:22
{noformat}

  was:
When using Sqoop (import + --incremental lastmodified + --check-column) if the 
check column contains NULLs the rows are ignored.

Please consider adding the ability to include ch

[jira] [Created] (SQOOP-3179) Add Sqoop1 (import + --incremental lastmodified + --check-column) support for NULLs

2017-05-04 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3179:


 Summary: Add Sqoop1 (import + --incremental lastmodified + 
--check-column) support for NULLs
 Key: SQOOP-3179
 URL: https://issues.apache.org/jira/browse/SQOOP-3179
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


When using Sqoop (import + --incremental lastmodified + --check-column) if the 
check column contains NULLs the rows are ignored.

Please consider adding the ability to include check column NULL values

{noformat}
#
# STEP 01 - Create RDBMS Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g
export MYUSER=sqoop
export MYPSWD=welcome1

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(15), c4 timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, sysdate, 'data row 1', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

--
| C1   | C2  | C3  | C4  | 
--
| 1| 2017-05-04 15:17:33.0 | data row 1  | 2017-05-04 
15:17:33 | 
--

#
# STEP 02 - Create Sqoop Incremental Import Job
#

sqoop job --create inc_import_hdfs -- import --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental 
lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 
00:00:00' --as-textfile --map-column-java C2=String,C4=String 
--fields-terminated-by '\001' --lines-terminated-by '\n' --num-mappers 1 
--verbose --hive-drop-import-delims

#
# STEP 03 - Execute Job and Verify data in HDFS
#

hdfs dfs -rm -r /user/root/t1
sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 24.6352 
seconds (2.2326 bytes/sec)
17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33

#
# STEP 04 - Insert New Rows (one with NULL --check-column)
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (2, sysdate, 'data row 2', NULL)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (3, sysdate, 'data row 3', sysdate)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 order by c1"

Output:
--
| C1   | C2  | C3  | C4  | 
--
| 1| 2017-05-04 15:17:33.0 | data row 1  | 2017-05-04 
15:17:33 | 
| 2| 2017-05-04 15:27:19.0 | data row 2  | (null)  
| 
| 3| 2017-05-04 15:27:22.0 | data row 3  | 2017-05-04 
15:27:22 | 
--

#
# STEP 05 - Execute Job and Verify data in HDFS (row with NULL --check-column 
is not imported)
#

sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 21.6227 
seconds (2.5436 bytes/sec)
17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33
32017-05-04 15:27:22.0data row 32017-05-04 15:27:22

#
# STEP 06 - Update Row with NULL --check-column, Execute Job and Verify data in 
HDFS
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"update t1 set c4 = sysdate where c1 = 2"
sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER 
--password $MYPSWD
hdfs dfs -cat /user/root/t1/part*

Output:
17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 27.2132 
seconds (2.0211 bytes/sec)
17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
12017-05-04 15:17:33.0data row 12017-05-04 15:17:33
22017-05-04 15:27:19.0data row 22017-05-04 15:31:55 <===
32017-05-04 15:27:22.0data row 32017-05-04 15:27:22
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3175) Sqoop1 (import + --as-parquetfile) writes data to wrong Hive table if same table name exists in Hive default database

2017-04-22 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3175:


 Summary: Sqoop1 (import + --as-parquetfile) writes data to wrong 
Hive table if same table name exists in Hive default database
 Key: SQOOP-3175
 URL: https://issues.apache.org/jira/browse/SQOOP-3175
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop1 (import + --as-parquetfile) writes data to wrong Hive table if same 
table name exists in Hive default database.  The test case below demonstrates 
this issue.

*Test Case*
{noformat}

# Issue: Data files written to the wrong table with Parquet


#
# STEP 01 - Create Table and Data
#

export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/sqoop
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'new row 1')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

-
| c1  | c2 | c3 | 
-
| 1   | 2017-04-22 | new row 1  | 
-

#
# STEP 02 - Create HDFS Structures
#

beeline -u jdbc:hive2:// -e "drop database db1 cascade; drop database db2 
cascade; drop database db3 cascade;"
sudo -u hdfs hdfs dfs -rm -r /data/tmp
sudo -u hdfs hdfs dfs -rm -r /data/dbs
sudo -u hdfs hdfs dfs -mkdir -p /data/tmp
sudo -u hdfs hdfs dfs -chmod 777 /data/tmp
sudo -u hdfs hdfs dfs -mkdir -p /data/dbs
sudo -u hdfs hdfs dfs -chmod 777 /data/dbs

#
# STEP 03 - Create Hive Databases
#

beeline -u jdbc:hive2:// -e "create database db1 location '/data/dbs/db1'; 
create database db2 location '/data/dbs/db2';"
beeline -u jdbc:hive2:// -e "show databases; describe database default; 
describe database db1; describe database db2;"
beeline -u jdbc:hive2:// -e "use default; show tables; use db1; show tables; 
use db2; show tables;"
hdfs dfs -ls -R /user/hive/warehouse /data

+--++--+-+-+-+--+
| db_name  |comment | location 
| owner_name  | owner_type  | parameters  |
+--++--+-+-+-+--+
| default  | Default Hive database  | hdfs://nameservice1/user/hive/warehouse  
| public  | ROLE| |
+--++--+-+-+-+--+
+--+--+---+-+-+-+--+
| db_name  | comment  | location  | owner_name  | 
owner_type  | parameters  |
+--+--+---+-+-+-+--+
| db1  |  | hdfs://nameservice1/data/dbs/db1  | root| USER  
  | |
+--+--+---+-+-+-+--+
+--+--+---+-+-+-+--+
| db_name  | comment  | location  | owner_name  | 
owner_type  | parameters  |
+--+--+---+-+-+-+--+
| db2  |  | hdfs://nameservice1/data/dbs/db2  | root| USER  
  | |
+--+--+---+-+-+-+--+
~
+---+--+
| tab_name  |
+---+--+
+---+--+
+---+--+
| tab_name  |
+---+--+
+---+--+
+---+--+
| tab_name  |
+---+--+
+---+--+
~
drwxrwxrwx   - root supergroup  0 2017-04-22 16:22 /data/dbs/db1
drwxrwxrwx   - root supergroup  0 2017-04-22 16:22 /data/dbs/db2

#
# STEP 04 - Import RDBMS Table, Create Hive Table in Database (default) and 
Load Data (using --as-textfile)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /data/tmp/t1 --hive-import --hive-database default --hive-table t1 
--as-textfile --num-mappers 1
hdfs dfs -ls -R /user/hive/

[jira] [Created] (SQOOP-3166) Sqoop1 (import + --query + aggregate function + --split-by -m >=2) fails with error (unknown column)

2017-04-03 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3166:


 Summary: Sqoop1 (import + --query + aggregate function + 
--split-by -m >=2) fails with error (unknown column)
 Key: SQOOP-3166
 URL: https://issues.apache.org/jira/browse/SQOOP-3166
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


This issue appears to be RDBMS generic.

*Test Case*
{noformat}

# Description:
# 
# 1. Sqoop import fails with Unknown Column error with the following conditions
# 1.1. Using --query + sql aggregate function() + --split-by + --num-mappers >1 
fails
# 2. The Sqoop documentation does not seem to clarify requirements for "select 
list" and "--split-by" 
#
# Documentation
# 
http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.10.0/SqoopUserGuide.html#_controlling_parallelism
# 7.2.4. Controlling Parallelism


#
# STEP 01 - [ORACLE] Create Data
#

export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop view v1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create view v1 as select c1 as \"ID\", c2, c3 from t1" 
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(c1) from t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(id) from v1"


| COUNT(C1)| 

| 1| 

~

| COUNT(ID)| 

| 1| 


#
# STEP 02 - [ORACLE] Import from Table/View with --num-mappes 2 (reproduction)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(c1) from t1 where \$CONDITIONS" --target-dir /user/root/t1 
--delete-target-dir --num-mappers 2 --split-by C1 --verbose
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(id) from v1 where \$CONDITIONS" --target-dir /user/root/t1 
--delete-target-dir --num-mappers 2 --split-by ID --verbose

Output:
17/04/03 09:09:11 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT 
MIN(C1), MAX(C1) FROM (select count(c1) from t1 where  (1 = 1) ) t1
17/04/03 09:09:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/root/.staging/job_1490976836761_0069
17/04/03 09:09:11 WARN security.UserGroupInformation: 
PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: 
java.sql.SQLSyntaxErrorException: ORA-00904: "C1": invalid identifier
~
17/04/03 09:10:01 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT 
MIN(ID), MAX(ID) FROM (select count(id) from v1 where  (1 = 1) ) t1
17/04/03 09:10:01 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/root/.staging/job_1490976836761_0070
17/04/03 09:10:01 WARN security.UserGroupInformation: 
PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: 
java.sql.SQLSyntaxErrorException: ORA-00904: "ID": invalid identifier

#
# STEP 03 - [ORACLE] Import from Table/View with --num-mappes 1 (workaround)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(c1) from t1 where \$CONDITIONS" --target-dir /user/root/t1 
--delete-target-dir --num-mappers 1 --split-by C1 --verbose
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(id) from v1 where \$CONDITIONS" --target-dir /user/root/t1 
--delete-target-dir --num-mappers 1 --split-by ID --verbose

Output:
17/04/03 09:07:11 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 21.5799 
seconds (0.0927 bytes/sec)
17/04/03 09:07:11 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
17/04/03 09:08:13 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 20.4732 
seconds (0.0977 bytes/sec)
17/04/03 09:08:13 INFO mapreduce.ImportJobBase: Retrieved 1 records.

#
# STEP 04 - [MYSQL] Create Data
#

export MYCONN=jdbc:mysql://mysql1.cloudera.com:3306/db_coe
export MYUSER=sqoop
export M

[jira] [Updated] (SQOOP-3160) Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)

2017-03-22 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3160:
-
Description: 
Sqoop (import + --table) with Oracle table_name having '$' fails with error 
(ORA-00942 or java.lang.NoClassDefFoundError), see test case below.

*Test Case*
{noformat}
#
# STEP 01 - Create Data
#

export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t\$1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t\$1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t\$1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t\$1"

Output:
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:31:18.0 | some data  | 
---

#
# STEP 02 - Import with '$' (--table T$1)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T$1 --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1

Output:
17/03/21 07:22:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T" t WHERE 1=0
17/03/21 07:22:30 ERROR manager.SqlManager: Error executing statement: 
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

#
# STEP 03 - Import with '$' (--table T\$1)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T\$1 --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1

Output:
17/03/22 07:30:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T$1" t WHERE 1=0
17/03/22 07:30:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce

17/03/22 07:30:38 INFO mapreduce.Job: Task Id : 
attempt_1488818259270_0124_m_00_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at 
org.apache.sqoop.mapreduce.db.DBRecordReader.createValue(DBRecordReader.java:197)
at 
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:230)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 14 more
Caused by: java.lang.NoClassDefFoundError: T$1$1
at T$1.init0(T$1.java:43)
at T$1.(T$1.java:63)
... 19 more

#
# STEP 04 - Import with '$' (--table "T$1")
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T$1" --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1

Output:
17/03/21 07:25:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T" t WHERE 1=0
17/03/21 07:25:08 ERROR manager.SqlManager: Error executing statement: 
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

#
# STEP 5 - Import with '$' (--table "T\$1")

[jira] [Created] (SQOOP-3160) Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)

2017-03-22 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3160:


 Summary: Sqoop (import + --table) with Oracle table_name having 
'$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)
 Key: SQOOP-3160
 URL: https://issues.apache.org/jira/browse/SQOOP-3160
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop (export + --table) with Oracle table_name having '$' fails with error 
(ORA-00942 or java.lang.NoClassDefFoundError), see test case below.

*Test Case*
{noformat}
#
# STEP 01 - Create Data
#

export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:30:02.0 | some data  | 
---

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t\$1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t\$1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t\$1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t\$1"

Output:
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:31:18.0 | some data  | 
---

#
# STEP 02 - Import and Export without '$' (baseline)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
hdfs dfs -cat /user/root/t1/part*

Output:
17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Transferred 34 bytes in 21.6129 
seconds (1.5731 bytes/sec)
17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1,2017-03-21 13:30:02.0,some data

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--export-dir /user/root/t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Transferred 163 bytes in 
21.4938 seconds (7.5836 bytes/sec)
17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Exported 1 records.
~
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:30:02.0 | some data  | 
| 1| 2017-03-21 13:30:02.0 | some data  | 
---

#
# STEP 03 - Export with '$' (--table T$1)
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T$1 --export-dir /user/root/t1 --num-mappers 1

Output:
17/03/21 06:49:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T" t WHERE 1=0
17/03/21 06:49:28 ERROR manager.SqlManager: Error executing statement: 
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

#
# STEP 04 - Export with '$' (--table T\$1)
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T\$1 --export-dir /user/root/t1 --num-mappers 1

Output:
17/03/22 07:27:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T$1" t WHERE 1=0
17/03/22 07:27:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
17/03/22 07:27:27 INFO mapreduce.ExportJobBase: Beginning export of T$1

2017-03-22 07:27:54,215 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at 
org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:72)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at 
org.apache.sqoop.mapreduce.Auto

[jira] [Created] (SQOOP-3159) Sqoop (export + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)

2017-03-22 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3159:


 Summary: Sqoop (export + --table) with Oracle table_name having 
'$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)
 Key: SQOOP-3159
 URL: https://issues.apache.org/jira/browse/SQOOP-3159
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Sqoop (export + --table) with Oracle table_name having '$' fails with error 
(ORA-00942 or java.lang.NoClassDefFoundError), see test case below.

*Test Case*
{noformat}
#
# STEP 01 - Create Data
#

export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:30:02.0 | some data  | 
---

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t\$1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t\$1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t\$1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t\$1"

Output:
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:31:18.0 | some data  | 
---

#
# STEP 02 - Import and Export without '$' (baseline)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
hdfs dfs -cat /user/root/t1/part*

Output:
17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Transferred 34 bytes in 21.6129 
seconds (1.5731 bytes/sec)
17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
1,2017-03-21 13:30:02.0,some data

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--export-dir /user/root/t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Transferred 163 bytes in 
21.4938 seconds (7.5836 bytes/sec)
17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Exported 1 records.
~
---
| C1   | C2  | C3 | 
---
| 1| 2017-03-21 13:30:02.0 | some data  | 
| 1| 2017-03-21 13:30:02.0 | some data  | 
---

#
# STEP 03 - Export with '$' (--table T$1)
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T$1 --export-dir /user/root/t1 --num-mappers 1

Output:
17/03/21 06:49:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T" t WHERE 1=0
17/03/21 06:49:28 ERROR manager.SqlManager: Error executing statement: 
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

#
# STEP 04 - Export with '$' (--table T\$1)
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T\$1 --export-dir /user/root/t1 --num-mappers 1

Output:
17/03/22 07:27:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM "T$1" t WHERE 1=0
17/03/22 07:27:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
17/03/22 07:27:27 INFO mapreduce.ExportJobBase: Beginning export of T$1

2017-03-22 07:27:54,215 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at 
org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:72)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at 
org.apache.sqoop.mapreduce.Auto

Re: New Sqoop Committer - Anna Szonyi

2017-03-16 Thread Markus Kemper
Awesome news, congrats Anna!


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Mar 16, 2017 at 2:50 PM, Attila Szabó <mau...@apache.org> wrote:

> Nice job Anna!
>
>  Very much well deserved!!!
>
> Congrats,
>
> Attila
>
> On Mar 16, 2017 7:47 PM, "Abraham Fine" <af...@apache.org> wrote:
>
>> On behalf of the Apache Sqoop PMC, I am pleased to welcome Anna Szonyi
>> as a new committer to Apache Sqoop. Please join me in congratulating her
>> for this accomplishment!
>>
>> Anna has made a number of contributions to the Sqoop test and build code
>> in addition to many bug fixes. Anna has also contributed to our
>> community by performing code reviews and helping users on our mailing
>> list. You can see many of her contributions here:
>> https://s.apache.org/PfGo
>>
>> We are excited too see what Anna contributes next!
>>
>> Thanks,
>> Abraham Fine
>>
>


[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2017-02-13 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864368#comment-15864368
 ] 

Markus Kemper commented on SQOOP-3134:
--

Linking SQOOP-2783

> Add option to configure Avro schema output file name with (import + 
> --as-avrodatafile) 
> ---
>
> Key: SQOOP-3134
> URL: https://issues.apache.org/jira/browse/SQOOP-3134
> Project: Sqoop
>  Issue Type: Improvement
>    Reporter: Markus Kemper
>
> Please consider adding an option to configure the Avro schema output file 
> name that is created with Sqoop (import + --as-avrodatafile), example cases 
> below.
> {noformat}
> #
> # STEP 01 - Create Data
> #
> export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 date, c3 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | c1  | c2 | c3 | 
> -
> | 1   | 2017-02-13 | some data  | 
> -
> #
> # STEP 02 - Import + --table + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
> --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
> 20.6988 seconds (19.9529 bytes/sec)
> 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 
> -rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
> configure this file name
> -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java
> #
> # STEP 03 - Import + --query + --as-avrodatafile
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir 
> /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile 
> ls -l ./*
> Output:
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
> 25.2757 seconds (17.7245 bytes/sec)
> 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ~
> -rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < 
> want option to configure this file name
> -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)

2017-02-13 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3134:


 Summary: Add option to configure Avro schema output file name with 
(import + --as-avrodatafile) 
 Key: SQOOP-3134
 URL: https://issues.apache.org/jira/browse/SQOOP-3134
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Please consider adding an option to configure the Avro schema output file name 
that is created with Sqoop (import + --as-avrodatafile), example cases below.

{noformat}
#
# STEP 01 - Create Data
#

export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 date, c3 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

-
| c1  | c2 | c3 | 
-
| 1   | 2017-02-13 | some data  | 
-

#
# STEP 02 - Import + --table + --as-avrodatafile
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1 
--as-avrodatafile 
ls -l ./*

Output:
17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 
20.6988 seconds (19.9529 bytes/sec)
17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.

-rw-r--r-- 1 root root   492 Feb 13 12:14 ./t1.avsc < want option to 
configure this file name
-rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java

#
# STEP 03 - Import + --query + --as-avrodatafile
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1 where \$CONDITIONS" --split-by c1 --target-dir /user/root/t1 
--delete-target-dir --num-mappers 1 --as-avrodatafile 
ls -l ./*

Output:
17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 
25.2757 seconds (17.7245 bytes/sec)
17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.
~
-rw-r--r-- 1 root root   527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < want 
option to configure this file name
-rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SQOOP-3133) Sqoop (export + --non-resilient) with SQLServer ignores Sqoop option (-Dsqoop.export.statements.per.transaction)

2017-02-13 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3133:


 Summary: Sqoop (export + --non-resilient) with SQLServer ignores 
Sqoop option (-Dsqoop.export.statements.per.transaction)
 Key: SQOOP-3133
 URL: https://issues.apache.org/jira/browse/SQOOP-3133
 Project: Sqoop
  Issue Type: Bug
  Components: docs
Reporter: Markus Kemper


Please consider updating the doc to clarify this limitation.  

When using Sqoop (export + --non-resilient) with SQLServer it ignores Sqoop 
option (-Dsqoop.export.statements.per.transaction) which means that each export 
task will execute within a single RDBMS transaction instead of committing on 
interval as it proceeds.

This could be considered a desirable behavior for rolling back a failure during 
export however it may not be ideal for performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SQOOP-3130) Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers requested

2017-02-06 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854178#comment-15854178
 ] 

Markus Kemper commented on SQOOP-3130:
--

>From reviewing this issue further another possible workaround is to set the 
>(mapreduce.input.fileinputformat.split.minsize) to help control the count of 
>map tasks used when exporting Avro data files.  This method is not likely to 
>be 100% deterministic if the volume of data is larger that the boundaries of 
>the "minsize" and count of map tasks requested however it is more ideal than 
>one map task per Avro data file.  The example below demonstrates this 
>workaround.

*Use (mapreduce.input.fileinputformat.split.minsize) without \--num-mappers*
{noformat}
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_text"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(*) from t1_text"
sqoop export -Dmapreduce.input.fileinputformat.split.minsize=134217728 
--connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT 
--export-dir /user/root/external/t1_text --input-fields-terminated-by ','

17/02/06 05:56:25 INFO input.FileInputFormat: Total input paths to process : 10
17/02/06 05:56:25 INFO mapreduce.JobSubmitter: number of splits:4 <=== 
uses default --num-mappers value (4)

17/02/06 05:57:11 INFO mapreduce.ExportJobBase: Transferred 3.519 MB in 47.9893 
seconds (75.0887 KB/sec)
17/02/06 05:57:11 INFO mapreduce.ExportJobBase: Exported 10 records.
{noformat}

*Use (mapreduce.input.fileinputformat.split.minsize) with \--num-mappers*
{noformat}
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_text"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(*) from t1_text"
sqoop export -Dmapreduce.input.fileinputformat.split.minsize=134217728 
--connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT 
--export-dir /user/root/external/t1_text --input-fields-terminated-by ',' 
--num-mappers 2

17/02/06 05:59:12 INFO input.FileInputFormat: Total input paths to process : 10
17/02/06 05:59:12 INFO mapreduce.JobSubmitter: number of splits:2 <=== 
uses requested --num-mappers value (2) 

17/02/06 05:59:42 INFO mapreduce.ExportJobBase: Transferred 3.5189 MB in 
32.3481 seconds (111.3925 KB/sec)
17/02/06 05:59:42 INFO mapreduce.ExportJobBase: Exported 10 records.
{noformat}

> Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers 
> requested
> -
>
> Key: SQOOP-3130
> URL: https://issues.apache.org/jira/browse/SQOOP-3130
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Markus Kemper
>
> When using Sqoop (export + --export-dir + Avro files) it is not obeying 
> --num-mappers requested, instead it is creating a map task per Avro data 
> file.  
> One known workaround for this issue is to use the Sqoop --hcatalog options.
> Please see the test case below demonstrating the issue and workaround.
> *Test Case*
> {noformat}
> #
> # STEP 01 - Create Data
> #
> for i in {1..10}; do d=`date +"%Y-%m-%d %H:%M:%S" --date="+$i days"`; 
> echo "$i,$d,row data" >> ./data.csv; done
> ls -l ./*;
> wc data.csv 
> hdfs dfs -mkdir -p /user/root/external/t1
> hdfs dfs -put ./data.csv /user/root/external/t1/data.csv
> hdfs dfs -ls -R /user/root/external/t1/
> Output:
> -rw-r--r-- 1 root root 3488895 Feb  1 11:20 ./data.csv
> ~
> 10  30 3488895 data.csv
> ~
> -rw-r--r--   3 root root3488895 2017-02-01 11:26 
> /user/root/external/t1/data.csv
> #
> # STEP 02 - Create RDBMS Table and Export Data
> #
> export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g;
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1_text"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1_text (c1 int, c2 date, c3 varchar(10))"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_TEXT --export-dir /user/root/external/t1 --input-fields-terminated-by ',' 
> --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select count(*) from t1_text"
> Output:
> 17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Transferred 3.3274 MB in 
> 24.8037 seconds (137.3688 KB/sec)
> 17/02/01 11:33:31 INF

[jira] [Created] (SQOOP-3130) Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers requested

2017-02-01 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3130:


 Summary: Sqoop (export + --export-dir + Avro files) is not obeying 
--num-mappers requested
 Key: SQOOP-3130
 URL: https://issues.apache.org/jira/browse/SQOOP-3130
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


When using Sqoop (export + --export-dir + Avro files) it is not obeying 
--num-mappers requested, instead it is creating a map task per Avro data file.  

One known workaround for this issue is to use the Sqoop --hcatalog options.

Please see the test case below demonstrating the issue and workaround.

*Test Case*
{noformat}
#
# STEP 01 - Create Data
#

for i in {1..10}; do d=`date +"%Y-%m-%d %H:%M:%S" --date="+$i days"`; echo 
"$i,$d,row data" >> ./data.csv; done
ls -l ./*;
wc data.csv 
hdfs dfs -mkdir -p /user/root/external/t1
hdfs dfs -put ./data.csv /user/root/external/t1/data.csv
hdfs dfs -ls -R /user/root/external/t1/

Output:
-rw-r--r-- 1 root root 3488895 Feb  1 11:20 ./data.csv
~
10  30 3488895 data.csv
~
-rw-r--r--   3 root root3488895 2017-02-01 11:26 
/user/root/external/t1/data.csv

#
# STEP 02 - Create RDBMS Table and Export Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_text"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_text (c1 int, c2 date, c3 varchar(10))"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_TEXT --export-dir /user/root/external/t1 --input-fields-terminated-by ',' 
--num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(*) from t1_text"

Output:
17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Transferred 3.3274 MB in 
24.8037 seconds (137.3688 KB/sec)
17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Exported 10 records.
~~

| COUNT(*) | 

| 10   | 


#
# STEP 03 - Import Data as Text Creating 10 HDFS Files
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_TEXT --target-dir /user/root/external/t1_text --delete-target-dir 
--num-mappers 10 --split-by C1 --as-textfile
hdfs dfs -ls /user/root/external/t1_text/part*

Output:
17/02/01 11:38:26 INFO mapreduce.ImportJobBase: Transferred 3.518 MB in 57.0517 
seconds (63.1434 KB/sec)
17/02/01 11:38:26 INFO mapreduce.ImportJobBase: Retrieved 10 records.
~
-rw-r--r--   3 root root 358894 2017-02-01 11:38 
/user/root/external/t1_text/part-m-0
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-1
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-2
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-3
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-4
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-5
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-6
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-7
-rw-r--r--   3 root root 37 2017-02-01 11:38 
/user/root/external/t1_text/part-m-8
-rw-r--r--   3 root root 370001 2017-02-01 11:38 
/user/root/external/t1_text/part-m-9

#
# STEP 04 - Export 10 Text Formatted Data Using 2 Splits
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_text"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(*) from t1_text"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_TEXT --export-dir /user/root/external/t1_text --input-fields-terminated-by 
',' --num-mappers 2
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select count(*) from t1_text"

Output:

| COUNT(*) | 

| 0| 

~
17/02/01 11:47:26 INFO input.FileInputFormat: Total input paths to process : 10
17/02/01 11:47:26 INFO mapreduce.JobSubmitter: number of splits:2

17/02/01 11:47:55 INFO mapreduce.ExportJobBase: Transferred 3.5189 MB in 
31.7104 seconds (113.6324 KB/sec)
17/02/01 11:47:55 INFO mapreduce.ExportJobBase: Exported 10 records.
~

| COUNT(*) | 

| 10  

[jira] [Created] (SQOOP-3129) Sqoop Metastore job command does not honor -D generic arguments

2017-01-26 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3129:


 Summary: Sqoop Metastore job command does not honor -D generic 
arguments
 Key: SQOOP-3129
 URL: https://issues.apache.org/jira/browse/SQOOP-3129
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Not 100% sure that this is a defect but, it seems like one to me.

The Sqoop job help suggests that generic args are supported however I am not 
able to get them to work, see test case below

*Sqoop Help*
{noformat}
$ sqoop job -help

Output:
17/01/17 08:13:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.3
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [] [TOOL-ARGS]]
{noformat}

*Test Case*
{noformat}
#
# STEP 01 - Setup Table and Data
#

export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"


| c1  | c2 | 

| 1   | some data  | 


#
# STEP 02 - Import Data Baseline (default)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1
yarn application -status application_1481911879790_0093 | head -6

17/01/02 07:03:46 INFO client.RMProxy: Connecting to ResourceManager at 
host1.cloudera.com/NN.NN.NNN.NNN:8032
Application Report : 
Application-Id : application_1481911879790_0093
Application-Name : t1.jar
Application-Type : MAPREDUCE
User : root
Queue : root.users.root

#
# STEP 03 - Import Data (-Dmapreduce.job.queuename=MY_SQOOP_QUEUE 
-Dmapreduce.job.name=MY_SQOOP_JOB)
#

sqoop import -Dmapreduce.job.queuename=MY_SQOOP_QUEUE 
-Dmapreduce.job.name=MY_SQOOP_JOB --connect $MYCONN --username $MYUSER 
--password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir 
--num-mappers 1
yarn application -status application_1481911879790_0094 | head -6

17/01/02 07:07:00 INFO client.RMProxy: Connecting to ResourceManager at 
host1.cloudera.com/NN.NN.NNN.NNN:8032
Application Report : 
Application-Id : application_1481911879790_0094
Application-Name : MY_SQOOP_JOB
Application-Type : MAPREDUCE
User : root
Queue : root.MY_SQOOP_QUEUE

#
# STEP 04 - Import Data using Sqoop1 Metastore Job (default)
#

sqoop job --create my_sqoop_meta_job -- import --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 
--delete-target-dir --num-mappers 1
sqoop job --list
Available jobs:
  my_sqoop_meta_job

sqoop job --exec my_sqoop_meta_job
yarn application -status application_1481911879790_0095 | head -6
17/01/02 07:16:13 INFO client.RMProxy: Connecting to ResourceManager at 
host1.cloudera.com/NN.NN.NNN.NNN:8032
Application Report : 
Application-Id : application_1481911879790_0095
Application-Name : t1.jar
Application-Type : MAPREDUCE
User : root
Queue : root.users.root

#
# STEP 06 -  Review Sqoop Job Help
#

$ sqoop job -help

Output:
17/01/17 08:13:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.3
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [] [TOOL-ARGS]]

Job management arguments:
   --create Create a new saved job
   --delete Delete a saved job
   --exec   Run a saved job
   --help   Print usage instructions
   --list   List saved jobs
   --meta-connect Specify JDBC connect string for the
metastore
   --show   Show the parameters for a saved job
   --verbosePrint more information while working

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf  specify an application configuration file
-D 

[jira] [Commented] (SQOOP-3089) Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with insufficient message

2016-12-27 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781033#comment-15781033
 ] 

Markus Kemper commented on SQOOP-3089:
--

Linking SQOOP-3095

> Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with 
> insufficient message
> 
>
> Key: SQOOP-3089
> URL: https://issues.apache.org/jira/browse/SQOOP-3089
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
>
> The ask in this JIRA is to enhance the debug message to be more informative 
> about how to map the data and successfully perform the operation.  Full test 
> case below
> *Sqoop Debug (current)*
> 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
> type 101
> *Sqoop Debug (requested)*
> 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
> type 101, please try using --map-column-java =
> {noformat}
> #
> # STEP 01 - Setup Table and Data
> #
> export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1_oracle"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1_oracle (c1 int, c2 binary_double)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1_oracle values (1, 1.1)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1_oracle"
> Output:
> ---
> | C1   | C2   | 
> ---
> | 1| 1.1  | 
> ---
> #
> # STEP 02 - Import Data as Parquet (reproduction)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
> --num-mappers 1 --as-parquetfile --verbose
> Output:
> 16/12/21 12:16:08 INFO manager.SqlManager: Executing SQL statement: SELECT 
> t.* FROM T1_ORACLE t WHERE 1=0
> 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0]
> 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 
> 0]
> 16/12/21 12:16:08 DEBUG manager.OracleManager$ConnCache: Caching released 
> connection for 
> jdbc:oracle:thin:@host-10-17-101-252.coe.cloudera.com:1521/orcl12c/sqoop
> 16/12/21 12:16:08 DEBUG util.ClassLoaderStack: Restoring classloader: 
> java.net.FactoryURLClassLoader@55465b1f
> 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
> type 101
> #
> # STEP 03 - Import Data as Parquet using option (--map-column-java C2=Double)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
> --num-mappers 1 --as-parquetfile --verbose --map-column-java C2=Double
> Output:
> 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Transferred 1.6406 KB in 
> 77.989 seconds (21.5415 bytes/sec)
> 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> #
> # STEP 04 - View Data using Parquet-tools
> #
> hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
> parquet-tools schema -d 
> hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet
> parquet-tools cat --json 
> hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet
> Output:
> -rw-r--r--   3 user1 user1612 2016-12-21 12:34 
> /user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet
> ---
> message T1_ORACLE {
>   optional binary C1 (UTF8);
>   optional double C2;
> }
> creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
> extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
> import of 
> T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","double"],"default":null,"columnName":"C2","

[jira] [Commented] (SQOOP-3094) Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE

2016-12-27 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781030#comment-15781030
 ] 

Markus Kemper commented on SQOOP-3094:
--

Linking SQOOP-3095 and SQOOP-3089

> Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE
> --
>
> Key: SQOOP-3094
> URL: https://issues.apache.org/jira/browse/SQOOP-3094
> Project: Sqoop
>  Issue Type: Improvement
>    Reporter: Markus Kemper
>
> Some users are not able to easily implement the Sqoop option 
> (--map-column-java, Example 1 below).  
> The ask here is to find a way to natively support the Oracle BINARY_DOUBLE 
> type in Sqoop natively without using --map-column-java (see Example 2 below)
> *Example 1*
> {noformat}
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
> --as-avrodatafile --map-column-java C2=String
> hdfs dfs -ls /user/user1/t1/*.avro
> avro-tools tojson --pretty 
> 'hdfs://namenode.cloudera.com/user/user1/t1/part-m-0.avro'
> Output:
> 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Transferred 320 bytes in 
> 39.1563 seconds (8.1724 bytes/sec)
> 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ---
> -rw-r--r--   3 user1 user1320 2016-12-27 10:28 
> /user/user1/t1/part-m-0.avro
> ---
> {
>   "C1" : {
> "string" : "1"
>   },
>   "C2" : {
> "string" : "1.1"
>   }
> }
> {noformat}
> *Example 2*
> {noformat}
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
> --as-parquetfile 
> Output:
> 16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT 
> t.* FROM T1 t WHERE 1=0
> 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0]
> 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 
> 0]
> 16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: 
> java.net.FactoryURLClassLoader@55465b1f
> 16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
> type 101
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 
> --as-parquetfile  --direct
> Output:
> 16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has 
> been derived from:
>   oracleConnectionUserName = sqoop
>   tableStr = T1
>   as:
>   owner : SQOOP
>   table : T1
> 16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: 
> **
> *** Using Data Connector for Oracle and Hadoop ***
> **
> 
> 16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT 
> C1,C2 FROM T1 WHERE 1=0
> 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1
> 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2
> 16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: 
> java.net.FactoryURLClassLoader@7e087bf5
> 16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
> type 101
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-3095) Add (import + --as-parquetfile) with Oracle BINARY_DOUBLE

2016-12-27 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3095:


 Summary: Add (import + --as-parquetfile) with Oracle BINARY_DOUBLE
 Key: SQOOP-3095
 URL: https://issues.apache.org/jira/browse/SQOOP-3095
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Some users are not able to easily implement the Sqoop option 
(--map-column-java, Example 1 below).

The ask here is to find a way to natively support the Oracle BINARY_DOUBLE type 
in Sqoop natively without using --map-column-java (see Example 2 below)

*Example 1*
{noformat}
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
--as-parquetfile --map-column-java C2=String
hdfs dfs -ls /user/user1/t1/*.parquet
parquet-tools cat --json 
'hdfs://namenode.cloudera.com/user/user1/t1/9c4742a1-68e2-4fd7-8264-6e84110ac409.parquet'

Output:
16/12/27 10:34:05 INFO mapreduce.ImportJobBase: Transferred 1.5537 KB in 
42.1369 seconds (37.7579 bytes/sec)
16/12/27 10:34:05 INFO mapreduce.ImportJobBase: Retrieved 1 records.
---
-rw-r--r--   3 user1 user1565 2016-12-27 10:34 
/user/user1/t1/9c4742a1-68e2-4fd7-8264-6e84110ac409.parquet
---
{"C1":"1","C2":"1.1"}
{noformat}

*Example 2*
{noformat}
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
--as-parquetfile 

Output:
16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM T1 t WHERE 1=0
16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0]
16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0]
16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@55465b1f
16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 
--as-parquetfile  --direct

Output:
16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has 
been derived from:
oracleConnectionUserName = sqoop
tableStr = T1
as:
owner : SQOOP
table : T1
16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: 
**
*** Using Data Connector for Oracle and Hadoop ***
**

16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT 
C1,C2 FROM T1 WHERE 1=0
16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1
16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2
16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@7e087bf5
16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-3094) Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE

2016-12-27 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3094:


 Summary: Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE
 Key: SQOOP-3094
 URL: https://issues.apache.org/jira/browse/SQOOP-3094
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Some users are not able to easily implement the Sqoop option 
(--map-column-java, Example 1 below).  

The ask here is to find a way to natively support the Oracle BINARY_DOUBLE type 
in Sqoop natively without using --map-column-java (see Example 2 below)

*Example 1*
{noformat}
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
--as-avrodatafile --map-column-java C2=String
hdfs dfs -ls /user/user1/t1/*.avro
avro-tools tojson --pretty 
'hdfs://namenode.cloudera.com/user/user1/t1/part-m-0.avro'

Output:
16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Transferred 320 bytes in 
39.1563 seconds (8.1724 bytes/sec)
16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Retrieved 1 records.
---
-rw-r--r--   3 user1 user1320 2016-12-27 10:28 
/user/user1/t1/part-m-0.avro
---
{
  "C1" : {
"string" : "1"
  },
  "C2" : {
"string" : "1.1"
  }
}
{noformat}

*Example 2*
{noformat}
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 
--as-parquetfile 

Output:
16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM T1 t WHERE 1=0
16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0]
16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0]
16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@55465b1f
16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 
--as-parquetfile  --direct

Output:
16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has 
been derived from:
oracleConnectionUserName = sqoop
tableStr = T1
as:
owner : SQOOP
table : T1
16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: 
**
*** Using Data Connector for Oracle and Hadoop ***
**

16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT 
C1,C2 FROM T1 WHERE 1=0
16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1
16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2
16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@7e087bf5
16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-3089) Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with insufficient message

2016-12-21 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3089:


 Summary: Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE 
fails with insufficient message
 Key: SQOOP-3089
 URL: https://issues.apache.org/jira/browse/SQOOP-3089
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


The ask in this JIRA is to enhance the debug message to be more informative 
about how to map the data and successfully perform the operation.  Full test 
case below

*Sqoop Debug (current)*
16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101

*Sqoop Debug (requested)*
16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101, please try using --map-column-java =

{noformat}
#
# STEP 01 - Setup Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_oracle (c1 int, c2 binary_double)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values (1, 1.1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
---
| C1   | C2   | 
---
| 1| 1.1  | 
---

#
# STEP 02 - Import Data as Parquet (reproduction)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile --verbose

Output:
16/12/21 12:16:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM T1_ORACLE t WHERE 1=0
16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0]
16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0]
16/12/21 12:16:08 DEBUG manager.OracleManager$ConnCache: Caching released 
connection for 
jdbc:oracle:thin:@host-10-17-101-252.coe.cloudera.com:1521/orcl12c/sqoop
16/12/21 12:16:08 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@55465b1f
16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL 
type 101

#
# STEP 03 - Import Data as Parquet using option (--map-column-java C2=Double)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile --verbose --map-column-java C2=Double

Output:
16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Transferred 1.6406 KB in 77.989 
seconds (21.5415 bytes/sec)
16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Retrieved 1 records.

#
# STEP 04 - View Data using Parquet-tools
#

hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
parquet-tools schema -d 
hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet
parquet-tools cat --json 
hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet

Output:
-rw-r--r--   3 user1 user1612 2016-12-21 12:34 
/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet
---
message T1_ORACLE {
  optional binary C1 (UTF8);
  optional double C2;
}
creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
import of 
T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","double"],"default":null,"columnName":"C2","sqlType":"101"}],"tableName":"T1_ORACLE"}
---
{"C1":"1","C2":1.1}

#
# STEP 05 - Import Data as Parquet using option (--map-column-java C2=String)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile --verbose --map-column-java C2=String

Output:
16/12/21 12:24:36 INFO mapreduce.ImportJobBase: Transferred 1.6221 KB in 
48.9326 seconds (33.9447 bytes/sec)
16/12/21 12:24:36 INFO mapreduce.ImportJobBase: Retrieved

[jira] [Created] (SQOOP-3088) Sqoop export with Parquet data failure does not contain the MapTask error

2016-12-21 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3088:


 Summary: Sqoop export with Parquet data failure does not contain 
the MapTask error
 Key: SQOOP-3088
 URL: https://issues.apache.org/jira/browse/SQOOP-3088
 Project: Sqoop
  Issue Type: Bug
  Components: tools
Reporter: Markus Kemper


*Test Case*
{noformat}
#
# STEP 01 - Setup Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_oracle (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values (1, 'data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
-
| C1   | C2 | 
-
| 1| data   | 
-

#
# STEP 02 - Import Data as Parquet
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile 

Output:
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 
seconds (33.1478 bytes/sec)
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.

#
# STEP 03 - Verify Parquet Data
#

hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
parquet-tools schema -d 
hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet

Output:
-rw-r--r--   3 user1 user1597 2016-12-21 07:11 
/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
---
message T1_ORACLE {
  optional binary C1 (UTF8);
  optional binary C2 (UTF8);
}
creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
import of 
T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"}
file schema: T1_ORACLE

C1: OPTIONAL BINARY O:UTF8 R:0 D:1
C2: OPTIONAL BINARY O:UTF8 R:0 D:1
row group 1: RC:1 TS:85

C1:  BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED
C2:  BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED

#
# STEP 04 - Export Parquet Data
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose

Output:
[sqoop debug]
16/12/21 07:15:06 INFO mapreduce.Job:  map 0% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job:  map 100% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with 
state FAILED due to: Task failed task_1481911879790_0026_m_00
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8
Job Counters 
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=32125
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=32125
Total vcore-seconds taken by all map tasks=32125
Total megabyte-seconds taken by all map tasks=32896000
16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is 
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 
seconds (0 bytes/sec)
16/12/21 07:15:40 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records.
16/12/21 07:15:40 DEBUG util.ClassLoaderSt

[jira] [Updated] (SQOOP-3086) Sqoop export with Parquet data fails with error (parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file)

2016-12-21 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3086:
-
Description: 
*Test Case*
{noformat}
#
# STEP 01 - Setup Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_oracle (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values (1, 'data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
-
| C1   | C2 | 
-
| 1| data   | 
-

#
# STEP 02 - Import Data as Parquet
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile 

Output:
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 
seconds (33.1478 bytes/sec)
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.

#
# STEP 03 - Verify Parquet Data
#

hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
parquet-tools schema -d 
hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet

Output:
-rw-r--r--   3 user1 user1597 2016-12-21 07:11 
/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
---
message T1_ORACLE {
  optional binary C1 (UTF8);
  optional binary C2 (UTF8);
}
creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
import of 
T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"}
file schema: T1_ORACLE

C1: OPTIONAL BINARY O:UTF8 R:0 D:1
C2: OPTIONAL BINARY O:UTF8 R:0 D:1
row group 1: RC:1 TS:85

C1:  BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED
C2:  BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED

#
# STEP 04 - Export Parquet Data
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose

Output:
[sqoop debug]
16/12/21 07:15:06 INFO mapreduce.Job:  map 0% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job:  map 100% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with 
state FAILED due to: Task failed task_1481911879790_0026_m_00
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8
Job Counters 
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=32125
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=32125
Total vcore-seconds taken by all map tasks=32125
Total megabyte-seconds taken by all map tasks=32896000
16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is 
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 
seconds (0 bytes/sec)
16/12/21 07:15:40 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records.
16/12/21 07:15:40 DEBUG util.ClassLoaderStack: Restoring classloader: 
java.net.FactoryURLClassLoader@577cfae6
16/12/21 07:15:40 ERROR tool.ExportTool: Error during export: Export job failed!

[yarn debug]
2016-12-21 07:15:38,911 DEBUG [

[jira] [Created] (SQOOP-3086) Sqoop export with Parquet data fails with error (parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file)

2016-12-21 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3086:


 Summary: Sqoop export with Parquet data fails with error 
(parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in 
file)
 Key: SQOOP-3086
 URL: https://issues.apache.org/jira/browse/SQOOP-3086
 Project: Sqoop
  Issue Type: Bug
  Components: tools
Reporter: Markus Kemper


*Test Case*
{noformat}
#
# STEP 01 - Setup Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
export MYUSER=sqoop
export MYPSWD=cloudera

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_oracle (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values (1, 'data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
-
| C1   | C2 | 
-
| 1| data   | 
-

#
# STEP 02 - Import Data as Parquet
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile 

Output:
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 
seconds (33.1478 bytes/sec)
16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.

#
# STEP 03 - Verify Parquet Data
#

hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
parquet-tools schema -d 
hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet

Output:
-rw-r--r--   3 user1 user1597 2016-12-21 07:11 
/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
---
message T1_ORACLE {
  optional binary C1 (UTF8);
  optional binary C2 (UTF8);
}
creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
import of 
T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"}
file schema: T1_ORACLE

C1: OPTIONAL BINARY O:UTF8 R:0 D:1
C2: OPTIONAL BINARY O:UTF8 R:0 D:1
row group 1: RC:1 TS:85

C1:  BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED
C2:  BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED

#
# STEP 04 - Export Parquet Data
#

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose

Output:
[sqoop debug]
16/12/21 07:15:06 INFO mapreduce.Job:  map 0% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job:  map 100% reduce 0%
16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with 
state FAILED due to: Task failed task_1481911879790_0026_m_00
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8
Job Counters 
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=32125
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=32125
Total vcore-seconds taken by all map tasks=32125
Total megabyte-seconds taken by all map tasks=32896000
16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is 
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 
seconds (0 bytes/sec)
16/12/21 07:15:40 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
16/12/21 07:15:40 INFO mapreduce.ExportJobBase

[jira] [Commented] (SQOOP-3078) Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE

2016-12-21 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767115#comment-15767115
 ] 

Markus Kemper commented on SQOOP-3078:
--

Additional Test Case
* This issue seems to be present with Parquet as well
* Please review the test case below to make sure that it is a valid reproduction

Test Case (parquet)
{noformat}
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/root/t1_oracle_parquet --delete-target-dir 
--num-mappers 1 --as-parquetfile --map-column-java C2=Double,C3=Double 
--hive-import --hive-database default --hive-table t1_oracle_parquet
beeline -u jdbc:hive2:// -e "use default; select * from t1_oracle_parquet;"

Output:
16/12/20 13:49:06 INFO mapreduce.ImportJobBase: Transferred 2.584 KB in 52.0475 
seconds (50.8382 bytes/sec)
16/12/20 13:49:06 INFO mapreduce.ImportJobBase: Retrieved 1 records.
---
+---+---+---+---+--+
| t1_oracle_parquet.c1  | t1_oracle_parquet.c2  | t1_oracle_parquet.c3  | 
t1_oracle_parquet.c4  |
+---+---+---+---+--+
| one.one   | 1.1   | 1.1   | 
one.one   |
+---+---+---+---+--+

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --hcatalog-database default --hcatalog-table t1_oracle --num-mappers 
1 --map-column-java C2=Double,C3=Double

Output:
-
| C1 | C2   | C3   | C4 | 
-
-
16/12/20 13:51:17 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:450)
{noformat}

> Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE
> 
>
> Key: SQOOP-3078
> URL: https://issues.apache.org/jira/browse/SQOOP-3078
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Reporter: Markus Kemper
>
> According to the documentation BINARY_DOUBLE should be supported thus this 
> seems like a possible defect.
> {noformat}
> #
> # DOC - Type Binary Double is supported
> #
> http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.8.0/SqoopUserGuide.html#_supported_data_types
> 25.8.2.4. Supported Data Types
> The following Oracle data types are supported by the Data Connector for 
> Oracle and Hadoop:
> BINARY_DOUBLE 
> #
> # STEP 01 - Setup Oracle Table and Data
> #
> export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
> export MYUSER=sqoop
> export MYPSWD=sqoop
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1_oracle"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1_oracle (c1 varchar(10), c2 binary_double, c3 binary_double, 
> c4 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "delete from t1_oracle"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1_oracle"
> Output:
> -
> | C1 | C2   | C3   | C4 | 
> -
> | one.one| 1.1  | 1.1  | one.one| 
> -
> #
> # STEP 02 - Verify Import (--target-dir) and Export (--export-dir)
> #
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers 
> 1 --fields-terminated-by ','
> hdfs dfs -cat /user/root/t1_oracle/part*
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --export-dir /user/root/t1_oracle --num-mappers 1 
> --input-fields-termin

[jira] [Commented] (SQOOP-2943) Make sqoop able to import to Parquet file format in case of HDFS encryption zones are turned on

2016-12-12 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742649#comment-15742649
 ] 

Markus Kemper commented on SQOOP-2943:
--

Linking SQOOP-2880

> Make sqoop able to import to Parquet file format in case of HDFS encryption 
> zones are turned on
> ---
>
> Key: SQOOP-2943
> URL: https://issues.apache.org/jira/browse/SQOOP-2943
> Project: Sqoop
>  Issue Type: Improvement
>Affects Versions: 1.4.7
>Reporter: Attila Szabo
> Fix For: 1.4.7
>
>
> If HDFS encryption zones are turned on, and the user tries to import into 
> Parquet format, where the target location is in a very different encryption 
> zone, than the zone of the /tmp/ location (typical use case for that 
> encrypted hive warehouse directory), even the mapper jobs are executed 
> successfully, and the partial results stored on the temp storage correctly, 
> the MergeOutputMapper class of Kite SDK dies with an HDFS related exception 
> ("can't be moved into an encryption zone").
> The problem does not appear in case of clear text output formats.
> Please make Sqoop able to solve this scenario as well!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-3078) Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE

2016-12-12 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3078:


 Summary: Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE 
fails with NPE
 Key: SQOOP-3078
 URL: https://issues.apache.org/jira/browse/SQOOP-3078
 Project: Sqoop
  Issue Type: Bug
  Components: hive-integration
Reporter: Markus Kemper


According to the documentation BINARY_DOUBLE should be supported thus this 
seems like a possible defect.

{noformat}
#
# DOC - Type Binary Double is supported
#

http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.8.0/SqoopUserGuide.html#_supported_data_types
25.8.2.4. Supported Data Types
The following Oracle data types are supported by the Data Connector for Oracle 
and Hadoop:
BINARY_DOUBLE 

#
# STEP 01 - Setup Oracle Table and Data
#

export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
export MYUSER=sqoop
export MYPSWD=sqoop

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_oracle (c1 varchar(10), c2 binary_double, c3 binary_double, c4 
varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
-
| C1 | C2   | C3   | C4 | 
-
| one.one| 1.1  | 1.1  | one.one| 
-

#
# STEP 02 - Verify Import (--target-dir) and Export (--export-dir)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers 1 
--fields-terminated-by ','
hdfs dfs -cat /user/root/t1_oracle/part*
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --export-dir /user/root/t1_oracle --num-mappers 1 
--input-fields-terminated-by ','
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
one.one,1.1,1.1,one.one
---
-
| C1 | C2   | C3   | C4 | 
-
| one.one| 1.1  | 1.1  | one.one| 
| one.one| 1.1  | 1.1  | one.one| 
-

#
# STEP 03 - Verify Import (--hive-import) and Export (--export-dir)
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers 1 
--hive-import --hive-database default --hive-table t1_oracle 
--fields-terminated-by ','
beeline -u jdbc:hive2:// -e "use default; select * from t1_oracle;"
hdfs dfs -cat /user/hive/warehouse/t1_oracle/part*
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_ORACLE --export-dir /user/hive/warehouse/t1_oracle --num-mappers 1 
--input-fields-terminated-by ','
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"

Output:
+---+---+---+---+--+
| t1_oracle.c1  | t1_oracle.c2  | t1_oracle.c3  | t1_oracle.c4  |
+---+---+---+---+--+
| one.one   | 1.1   | 1.1   | one.one   |
+---+---+---+---+--+
---
one.one,1.1,1.1,one.one
---
-
| C1 | C2   | C3   | C4 | 
-
| one.one| 1.1  | 1.1  | one.one| 
| one.one| 1.1  | 1.1  | one.one| 
-

#
# STEP 04 - Verify Import (--hive-import) and Export (--hcatalog)
# This is the defect reproduction
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_oracle"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_oracle"
sqoop export

[jira] [Created] (SQOOP-3077) Add support for (import + --hcatalog + --as-avrodatafile) with RDBMS type TIMESTAMP

2016-12-10 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3077:


 Summary: Add support for (import + --hcatalog + --as-avrodatafile) 
with RDBMS type TIMESTAMP
 Key: SQOOP-3077
 URL: https://issues.apache.org/jira/browse/SQOOP-3077
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Please consider adding support for --hcatalog import and TIMESTAMPS, the Avro 
Specification suggest that Logical Types support TIMESTAMPS.

Avro Doc:
https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types

{noformat}
#
# STEP 01 - Setup Table and Data
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_dates"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_dates (c1_int integer, c2_date date, c3_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select dbms_metadata.get_ddl('TABLE', 'T1_DATES', 'SQOOP') from dual"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_dates values (1, current_date, current_timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_dates"

Output:

| DBMS_METADATA.GET_DDL('TABLE','T1_DATES','SQOOP') | 

| 
  CREATE TABLE "SQOOP"."T1_DATES" 
   ("C1_INT" NUMBER(*,0), 
"C2_DATE" DATE, 
"C3_TIMESTAMP" TIMESTAMP (6)
   ) SEGMENT CREATION DEFERRED 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
 NOCOMPRESS LOGGING
  TABLESPACE "SQOOP"  | 

---

| C1_INT   | C2_DATE | C3_TIMESTAMP | 

| 1| 2016-12-10 15:48:23.0 | 2016-12-10 15:48:23.707327 | 


#
# STEP 02 - Import with Text Format
#

beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' 
--num-mappers 1 --map-column-hive c2_date=date,c3_timestamp=timestamp
beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from 
t1_dates_text;"

+-+--+
| createtab_stmt  |
+-+--+
| CREATE TABLE `t1_dates_text`(   |
|   `c1_int` decimal(38,0),   |
|   `c2_date` date,   |
|   `c3_timestamp` timestamp) |
| ROW FORMAT SERDE|
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| STORED AS INPUTFORMAT   |
|   'org.apache.hadoop.mapred.TextInputFormat'|
| OUTPUTFORMAT|
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION|
|   'hdfs://nameservice1/user/hive/warehouse/t1_dates_text'   |
| TBLPROPERTIES ( |
|   'transient_lastDdlTime'='1481386391') |
+-+--+
--
+---++-+--+
| t1_dates_text.c1_int  | t1_dates_text.c2_date  | t1_dates_text.c3_timestamp  |
+---++-+--+
| 1 | 2016-12-10 | 2016-12-10 15:48:23.707327  |
+---++-+--+

#
# STEP 03 - Import with Avro Format (default)
#

beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_DATES --hcatalog-database default --hcatalog-table t1_dates_avro 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as avro' 
--num-mappers 1
beeline -u jdbc:hive2:// -e "use default; show create table t1_dates_avro; 
select * from t1_dates_avro;"

+--+--+
|  createtab_stmt  |
+

[jira] [Created] (SQOOP-3076) Modify default support with (import + --hcatalog + --as-textfile) and (DATE,TIMESTAMP)

2016-12-10 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3076:


 Summary: Modify default support with (import + --hcatalog + 
--as-textfile) and (DATE,TIMESTAMP)
 Key: SQOOP-3076
 URL: https://issues.apache.org/jira/browse/SQOOP-3076
 Project: Sqoop
  Issue Type: Improvement
Reporter: Markus Kemper


Please consider modifying the default behavior when RDBMS types are supported 
in Hive as seen in the test case below for (DATE and TIMESTAMP) 

Test Case
{noformat}
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_dates"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_dates (c1_int integer, c2_date date, c3_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select dbms_metadata.get_ddl('TABLE', 'T1_DATES', 'SQOOP') from dual"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_dates values (1, current_date, current_timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_dates"

Output:

| DBMS_METADATA.GET_DDL('TABLE','T1_DATES','SQOOP') | 

| 
  CREATE TABLE "SQOOP"."T1_DATES" 
   ("C1_INT" NUMBER(*,0), 
"C2_DATE" DATE, 
"C3_TIMESTAMP" TIMESTAMP (6)
   ) SEGMENT CREATION DEFERRED 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
 NOCOMPRESS LOGGING
  TABLESPACE "SQOOP"  | 

---

| C1_INT   | C2_DATE | C3_TIMESTAMP | 

| 1| 2016-12-10 15:48:23.0 | 2016-12-10 15:48:23.707327 | 


#
# STEP 02 - Demonstrate Current Default Behavior
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' 
--num-mappers 1
beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from 
t1_dates_text;"

+---++--+--+
|   col_name|   data_type| comment  |
+---++--+--+
| c1_int| decimal(38,0)  |  |
| c2_date   | string |  |
| c3_timestamp  | string |  |
+---++--+--+
---
+---++-+--+
| t1_dates_text.c1_int  | t1_dates_text.c2_date  | t1_dates_text.c3_timestamp  |
+---++-+--+
| 1 | 2016-12-10 15:48:23.0  | 2016-12-10 15:48:23.707327  |
+---++-+--+

#
# STEP 03 - Demonstrate Suggested Default Behavior
#

beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' 
--num-mappers 1 --map-column-hive c2_date=date,c3_timestamp=timestamp
beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from 
t1_dates_text;"

+---++--+--+
|   col_name|   data_type| comment  |
+---++--+--+
| c1_int| decimal(38,0)  |  |
| c2_date   | date   |  |
| c3_timestamp  | timestamp  |  |
+---++--+--+
--
+---++-+--+
| t1_dates_text.c1_int  | t1_dates_text.c2_date  | t1_dates_text.c3_timestamp  |
+---++-+--+
| 1 | 2016-12-10 | 2016-12-10 15:48:23.707327  |
+---++-+--+
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)

2016-12-05 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721860#comment-15721860
 ] 

Markus Kemper commented on SQOOP-3068:
--

Created as a result of investigations in SQOOP-3047

> Enhance error (tool.ImportTool: Encountered IOException running import job: 
> java.io.IOException: Expected schema) to suggest workaround 
> (--map-column-java)
> ---
>
> Key: SQOOP-3068
> URL: https://issues.apache.org/jira/browse/SQOOP-3068
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Markus Kemper
>
> Please consider enhancing the error to include more detail and suggest 
> workaround (--map-columns-java).  
> Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch 
> with the json schema that Hive produces vs. the json schema that Sqoop 
> generates.  The test case below demonstrates how to reproduce the issue as 
> well as workaround it.
> SETUP (create parquet table with Sqoop import and Beeline CTAS)
> {noformat}
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "describe t1"
> -
> | Field| Type | Null | Key | Default  
> | Extra| 
> -
> | c_int| int(11)  | YES | | (null)
>|  | 
> | c_date   | date | YES | | (null)
>|  | 
> | c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP 
>| on update CURRENT_TIMESTAMP | 
> -
> STEP 02 : Insert and Select Row
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date(), current_timestamp())"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> --
> | c_int   | c_date | c_timestamp | 
> --
> | 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
> --
> STEP 03 : Create Hive Tables
> beeline -u jdbc:hive2:// -e "use default; drop table t1"
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
> --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 
> 'stored as parquet' --num-mappers 1
> beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet 
> as select * from t1_text;show create table t1;"
> ++--+
> |   createtab_stmt   |
> ++--+
> | CREATE TABLE `t1`( |
> |   `c_int` int, |
> |   `c_date` string, |
> |   `c_timestamp` string)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
> | LOCATION   |
> |   'hdfs://:8020/user/hive/warehouse/t1' |
> | TBLPROPERTIES (|
> |   'COLUMN_STATS_ACCURATE'='true',  |
> |   'numFiles'='1',  |
> |   'numRows'='2',   |
> |   'rawDataSize'='6',  

[jira] [Created] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)

2016-12-05 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3068:


 Summary: Enhance error (tool.ImportTool: Encountered IOException 
running import job: java.io.IOException: Expected schema) to suggest workaround 
(--map-column-java)
 Key: SQOOP-3068
 URL: https://issues.apache.org/jira/browse/SQOOP-3068
 Project: Sqoop
  Issue Type: Improvement
  Components: hive-integration
Affects Versions: 1.4.6
Reporter: Markus Kemper


Please consider enhancing the error to include more detail and suggest 
workaround (--map-columns-java).  

Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch 
with the json schema that Hive produces vs. the json schema that Sqoop 
generates.  The test case below demonstrates how to reproduce the issue as well 
as workaround it.

SETUP (create parquet table with Sqoop import and Beeline CTAS)
{noformat}
STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c_int int, c_date date, c_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"describe t1"
-
| Field| Type | Null | Key | Default
  | Extra| 
-
| c_int| int(11)  | YES | | (null)  
 |  | 
| c_date   | date | YES | | (null)  
 |  | 
| c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP   
 | on update CURRENT_TIMESTAMP | 
-

STEP 02 : Insert and Select Row

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date(), current_timestamp())"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
--
| c_int   | c_date | c_timestamp | 
--
| 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
--

STEP 03 : Create Hive Tables

beeline -u jdbc:hive2:// -e "use default; drop table t1"
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
--hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 
'stored as parquet' --num-mappers 1
beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as 
select * from t1_text;show create table t1;"

++--+
|   createtab_stmt   |
++--+
| CREATE TABLE `t1`( |
|   `c_int` int, |
|   `c_date` string, |
|   `c_timestamp` string)|
| ROW FORMAT SERDE   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT  |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT   |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION   |
|   'hdfs://:8020/user/hive/warehouse/t1' |
| TBLPROPERTIES (|
|   'COLUMN_STATS_ACCURATE'='true',  |
|   'numFiles'='1',  |
|   'numRows'='2',   |
|   'rawDataSize'='6',   |
|   'totalSize'='605',   |
|   'transient_lastDdlTime'='1478298298')|
++--+
{noformat}

REPRODUCE ISSUE (import --hive-import append and overwrite)
{noformat}

STEP 01: Attempt --hive-import --append

sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default 
--hive-table t1 --as-parquetfile --num-mappers 1 --append

16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' 
exists! Sqoop will append data into the existing Hive table. Consider using 
--hive-overwrite, if you do NOT intend to do appen

[jira] [Created] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE

2016-11-23 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3058:


 Summary: Sqoop import with Netezza --direct fails properly but 
also produces NPE
 Key: SQOOP-3058
 URL: https://issues.apache.org/jira/browse/SQOOP-3058
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


The [error] is expected however the [npe] seems like a defect, see [test case] 
below

[error]
ERROR:  relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1

[npe]
16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.NullPointerException

[test case]
{noformat}
#
# STEP 01 - Setup Netezza Table and Data
#

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"DROP TABLE SQOOP_SME1.T1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"INSERT INTO SQOOP_SME1.T1 VALUES (1)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"SELECT C1 FROM SQOOP_SME1.T1"

#
# STEP 02 - Test Import and Export (baseline)
#

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"DELETE FROM SQOOP_SME1.T1"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T1" --export-dir /user/root/t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"SELECT C1 FROM SQOOP_SME1.T1"

---
| C1  | 
---
| 1   | 
---

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"DELETE FROM SQOOP_SME1.T1"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"T1" --export-dir /user/root/t1 --num-mappers 1 --direct
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"SELECT C1 FROM SQOOP_SME1.T1"

---
| C1  | 
---
| 1   | 
---

#
# STEP 03 - Test Import and Export (with SCHEMA in --table option AND --direct)
#

/* Notes: This failure seems correct however the NPE after the failure seems 
like a defect  */

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"DELETE FROM SQOOP_SME1.T1"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
"SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct

16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: 
org.netezza.error.NzSQLException: ERROR:  relation does not exist 
SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
org.netezza.error.NzSQLException: ERROR:  relation does not exist 
SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1
at 
org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280)
at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76)
at org.netezza.sql.NzConnection.execute(NzConnection.java:2869)
at 
org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126)
at 
org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143)
at 
org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89)
at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at 
org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151)
at 
org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116)
at 
org.apache.sqoop.mapreduce.netezza.NetezzaExternalTableExportJob.configureOutputFormat(NetezzaExternalTableExportJob.java:128)
at 
org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
at 
org.apache.sqoop.manager.DirectNetezzaManager.exportTable(DirectNetezzaManager.java:209)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.NullPo

[jira] [Assigned] (SQOOP-2905) Improve documentation for Oracle connector

2016-11-11 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper reassigned SQOOP-2905:


Assignee: Markus Kemper

> Improve documentation for Oracle connector
> --
>
> Key: SQOOP-2905
> URL: https://issues.apache.org/jira/browse/SQOOP-2905
> Project: Sqoop
>  Issue Type: Task
>  Components: connectors/oracle
>Affects Versions: 1.4.6
>Reporter: Miklos Szurap
>    Assignee: Markus Kemper
>
> In the Sqoop user guide [1] there is a section for how to handle 
> "java.net.SocketException: Connection reset" errors.
> This needs to be improved because the Sqoop client itself can suffer from 
> this error message, not only the mappers.
> The solution is to specify a system property to the sqoop client too, thus 
> the sqoop command will look like (for example to a sqoop import):
> sqoop import -Djava.security.egd="file:///dev/../dev/urandom" -D 
> mapreduce.map.java.opts="-Djava.security.egd=file:///dev/../dev/urandom" 
> 
> [1] 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oracle_connection_reset_errors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-3036) Update Doc (import + --hbase-bulkload) should fail fast when not using (--as-textfile), see SQOOP-3034

2016-11-08 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3036:
-
Summary: Update Doc (import + --hbase-bulkload) should fail fast when not 
using (--as-textfile), see SQOOP-3034  (was: Documentation update needed: 
importing into HBase ignores any format option other than as-textfile)

> Update Doc (import + --hbase-bulkload) should fail fast when not using 
> (--as-textfile), see SQOOP-3034
> --
>
> Key: SQOOP-3036
> URL: https://issues.apache.org/jira/browse/SQOOP-3036
> Project: Sqoop
>  Issue Type: Task
>  Components: docs
>Reporter: Boglarka Egyed
>    Assignee: Markus Kemper
>Priority: Minor
>
> Sqoop import into HBase ignoes any format option other than as-textfile 
> regarding the current implementation. This shall be marked in the 
> documentation too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SQOOP-3036) Documentation update needed: importing into HBase ignores any format option other than as-textfile

2016-11-08 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper reassigned SQOOP-3036:


Assignee: Markus Kemper  (was: Boglarka Egyed)

> Documentation update needed: importing into HBase ignores any format option 
> other than as-textfile
> --
>
> Key: SQOOP-3036
> URL: https://issues.apache.org/jira/browse/SQOOP-3036
> Project: Sqoop
>  Issue Type: Task
>  Components: docs
>Reporter: Boglarka Egyed
>    Assignee: Markus Kemper
>Priority: Minor
>
> Sqoop import into HBase ignoes any format option other than as-textfile 
> regarding the current implementation. This shall be marked in the 
> documentation too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3047) Add support for (import + --hive-import + --as-parquet) when Parquet table already exists

2016-11-04 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637951#comment-15637951
 ] 

Markus Kemper commented on SQOOP-3047:
--

Linking SQOOP-3046

> Add support for (import + --hive-import + --as-parquet) when Parquet table 
> already exists
> -
>
> Key: SQOOP-3047
> URL: https://issues.apache.org/jira/browse/SQOOP-3047
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>    Reporter: Markus Kemper
>
> When a parquet table already exists (import + --hive-import + 
> --as-parquetfile) will fail due to a mismatch with the json schema that Hive 
> produces vs. the json schema that Sqoop generates.  The test case below 
> demonstrates how to reproduce the issue as well as workaround it.
> SETUP (create parquet table with Sqoop import and Beeline CTAS)
> {noformat}
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "describe t1"
> -
> | Field| Type | Null | Key | Default  
> | Extra| 
> -
> | c_int| int(11)  | YES | | (null)
>|  | 
> | c_date   | date | YES | | (null)
>|  | 
> | c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP 
>| on update CURRENT_TIMESTAMP | 
> -
> STEP 02 : Insert and Select Row
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date(), current_timestamp())"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> --
> | c_int   | c_date | c_timestamp | 
> --
> | 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
> --
> STEP 03 : Create Hive Tables
> beeline -u jdbc:hive2:// -e "use default; drop table t1"
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
> --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 
> 'stored as parquet' --num-mappers 1
> beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet 
> as select * from t1_text;show create table t1;"
> ++--+
> |   createtab_stmt   |
> ++--+
> | CREATE TABLE `t1`( |
> |   `c_int` int, |
> |   `c_date` string, |
> |   `c_timestamp` string)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
> | LOCATION   |
> |   'hdfs://:8020/user/hive/warehouse/t1' |
> | TBLPROPERTIES (|
> |   'COLUMN_STATS_ACCURATE'='true',  |
> |   'numFiles'='1',  |
> |   'numRows'='2',   |
> |   'rawDataSize'='6',   |
> |   'totalSize'='605',   |
> |   'transient_lastDdlTime'='1478298298')|
> ++--+
> {noformat}
> REPRODUCE ISSUE (import --hive-import append and overwrite)

[jira] [Created] (SQOOP-3047) Add support for (import + --hive-import + --as-parquet) when Parquet table already exists

2016-11-04 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3047:


 Summary: Add support for (import + --hive-import + --as-parquet) 
when Parquet table already exists
 Key: SQOOP-3047
 URL: https://issues.apache.org/jira/browse/SQOOP-3047
 Project: Sqoop
  Issue Type: Improvement
  Components: hive-integration
Reporter: Markus Kemper


When a parquet table already exists (import + --hive-import + --as-parquetfile) 
will fail due to a mismatch with the json schema that Hive produces vs. the 
json schema that Sqoop generates.  The test case below demonstrates how to 
reproduce the issue as well as workaround it.

SETUP (create parquet table with Sqoop import and Beeline CTAS)
{noformat}
STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c_int int, c_date date, c_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"describe t1"
-
| Field| Type | Null | Key | Default
  | Extra| 
-
| c_int| int(11)  | YES | | (null)  
 |  | 
| c_date   | date | YES | | (null)  
 |  | 
| c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP   
 | on update CURRENT_TIMESTAMP | 
-

STEP 02 : Insert and Select Row

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date(), current_timestamp())"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
--
| c_int   | c_date | c_timestamp | 
--
| 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
--

STEP 03 : Create Hive Tables

beeline -u jdbc:hive2:// -e "use default; drop table t1"
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
--hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 
'stored as parquet' --num-mappers 1
beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as 
select * from t1_text;show create table t1;"

++--+
|   createtab_stmt   |
++--+
| CREATE TABLE `t1`( |
|   `c_int` int, |
|   `c_date` string, |
|   `c_timestamp` string)|
| ROW FORMAT SERDE   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT  |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT   |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION   |
|   'hdfs://:8020/user/hive/warehouse/t1' |
| TBLPROPERTIES (|
|   'COLUMN_STATS_ACCURATE'='true',  |
|   'numFiles'='1',  |
|   'numRows'='2',   |
|   'rawDataSize'='6',   |
|   'totalSize'='605',   |
|   'transient_lastDdlTime'='1478298298')|
++--+
{noformat}

REPRODUCE ISSUE (import --hive-import append and overwrite)
{noformat}

STEP 01: Attempt --hive-import --append

sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default 
--hive-table t1 --as-parquetfile --num-mappers 1 --append

16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' 
exists! Sqoop will append data into the existing Hive table. Consider using 
--hive-overwrite, if you do NOT intend to do appending.
16/11/02 20:54:24 ERROR tool.ImportTool: Encountered IOException running import 
job: java.io.IOException: Expected schema: 
{"type":"record"

[jira] [Commented] (SQOOP-3046) Add support for (import + --hcatalog* + --as-parquetfile)

2016-11-04 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637836#comment-15637836
 ] 

Markus Kemper commented on SQOOP-3046:
--

Linking SQOOP-3010 in the event that this capability is not technically 
possible.

> Add support for (import + --hcatalog* + --as-parquetfile) 
> --
>
> Key: SQOOP-3046
> URL: https://issues.apache.org/jira/browse/SQOOP-3046
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>    Reporter: Markus Kemper
>
> This is a request to identify a way to support Sqoop import with --hcatalog 
> options when writing Parquet data files.   The test case below demonstrates 
> the issue.
> CODE SNIP
> {noformat}
> ../MapredParquetOutputFormat.java 
> 69  @Override
> 70  public RecordWriter<Void, ParquetHiveRecord> getRecordWriter(
> 71  final FileSystem ignored,
> 72  final JobConf job,
> 73  final String name,
> 74  final Progressable progress
> 75  ) throws IOException {
> 76throw new RuntimeException("Should never be used");
> 77  }
> {noformat}
> TEST CASE:
> {noformat}
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "describe t1"
> -
> | Field| Type | Null | Key | Default  
> | Extra| 
> -
> | c_int| int(11)  | YES | | (null)
>|  | 
> | c_date   | date | YES | | (null)
>|  | 
> | c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP 
>| on update CURRENT_TIMESTAMP | 
> -
> STEP 02 : Insert and Select Row
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date(), current_timestamp())"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> --
> | c_int   | c_date | c_timestamp | 
> --
> | 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
> --
> beeline -u jdbc:hive2:// -e "use default; drop table t1"
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
> --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored 
> as parquet' --num-mappers 1
> [sqoop console debug]
> 16/11/02 20:25:15 INFO mapreduce.Job: Task Id : 
> attempt_1478089149450_0046_m_00_0, Status : FAILED
> Error: java.lang.RuntimeException: Should never be used
>   at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
>   at 
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> [yarn maptask debug]  
> 2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Processing split: 1=1 AND 1=1
> 2016-11-02 20:25:15,583 DEBUG [main] 
> org.apache

[jira] [Created] (SQOOP-3046) Add support for (import + --hcatalog* + --as-parquetfile)

2016-11-04 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3046:


 Summary: Add support for (import + --hcatalog* + --as-parquetfile) 
 Key: SQOOP-3046
 URL: https://issues.apache.org/jira/browse/SQOOP-3046
 Project: Sqoop
  Issue Type: Improvement
  Components: hive-integration
Reporter: Markus Kemper


This is a request to identify a way to support Sqoop import with --hcatalog 
options when writing Parquet data files.   The test case below demonstrates the 
issue.

CODE SNIP
{noformat}
../MapredParquetOutputFormat.java   
69  @Override
70  public RecordWriter<Void, ParquetHiveRecord> getRecordWriter(
71  final FileSystem ignored,
72  final JobConf job,
73  final String name,
74  final Progressable progress
75  ) throws IOException {
76throw new RuntimeException("Should never be used");
77  }
{noformat}

TEST CASE:
{noformat}
STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c_int int, c_date date, c_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"describe t1"
-
| Field| Type | Null | Key | Default
  | Extra| 
-
| c_int| int(11)  | YES | | (null)  
 |  | 
| c_date   | date | YES | | (null)  
 |  | 
| c_timestamp  | timestamp| NO  | | CURRENT_TIMESTAMP   
 | on update CURRENT_TIMESTAMP | 
-

STEP 02 : Insert and Select Row

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date(), current_timestamp())"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
--
| c_int   | c_date | c_timestamp | 
--
| 1   | 2016-10-26 | 2016-10-26 14:30:33.0 | 
--

beeline -u jdbc:hive2:// -e "use default; drop table t1"
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
--hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored 
as parquet' --num-mappers 1

[sqoop console debug]
16/11/02 20:25:15 INFO mapreduce.Job: Task Id : 
attempt_1478089149450_0046_m_00_0, Status : FAILED
Error: java.lang.RuntimeException: Should never be used
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
at 
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

[yarn maptask debug]
2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: 1=1 AND 1=1
2016-11-02 20:25:15,583 DEBUG [main] 
org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record 
reader for db product: MYSQL
2016-11-02 20:25:15,613 INFO [main] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
Committer Algorithm version is 1
2016-11-02 20:25:15,614 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is 
deprecated. Instead, use mapreduce.job.output.key.class
2016-11-02 20:25:15,620 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is 
deprecated. Instead, use mapreduce.job.output.value.class
2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild:

[jira] [Updated] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

2016-10-29 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3030:
-
Description: 
It appears that if the case of the columns in the Oracle RDBMS are not used 
with the Sqoop --columns option with (export + --hcatalog + --direct) the Sqoop 
map task will fail with an NPE where the non-direct path fails correct.  My 
test case is below, please review and let me know if you have any questions.

Additionally it would be nice if we could detect this column case mis-match 
during compile time and not submit the job only to have it fail in YARN.

{noformat}
TEST CASE:

STEP 01 : Create Oracle Source/Target Table

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
-
| C1   | C2 | 
-
| 1| some data  | 
-

STEP 02 : Import Oracle Table using Sqoop --hcatalog options

beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--num-mappers 1 --hcatalog-database default --hcatalog-table t1_text 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"

Output:

16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names projected 
: [c1, c2]
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info map 
:
c1 : [Type : 2,Precision : 38,Scale : 0]
c2 : [Type : 12,Precision : 10,Scale : 0]
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
default.t1_text for import
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
statement: 
create table `default`.`t1_text` (
`c1` decimal(38),
`c2` varchar(10))
stored as textfile

16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 34.4222 
seconds (0.3486 bytes/sec)
16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
---
+-+-+--+
| t1_text.c1  | t1_text.c2  |
+-+-+--+
| 1   | some data   |
+-+-+--+

STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 
--num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns 
"c1,c2"

Output: (failure = correct)

16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names projected 
: [c1, c2]
16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info map 
:
c1 : [Type : 2,Precision : 38,Scale : 0]
c2 : [Type : 12,Precision : 10,Scale : 0]

16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema 
fields = [c1, c2]
16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is 
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key 
fields = []
16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema 
fields = [c1, c2]

16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records.
16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job failed!
---
2016-10-18 09:14:19,418 INFO [main] 
org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties 
{transient_lastDdlTime=1476807097, name=default.t1_text, 
serialization.null.format=\N, columns=c1,c2, 
serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
serialization.format=1, columns.types=decimal(38,0),varchar(10)}
2016-10-18 09:14:19,660 INFO [Thread-12] 
org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is 
finished. keepGoing=false
2016-10-18 09:14:19,952 ERROR [Thread-11] 
org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update 
thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier

STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options 
and Oracle --direct

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 
--num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns 
"c1,c2" --direct

[jira] [Commented] (SQOOP-3038) Sqoop export using --hcatalog with RDBMS reserved word column name results in "null" value

2016-10-26 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608476#comment-15608476
 ] 

Markus Kemper commented on SQOOP-3038:
--

Workaround:
1. Use (export --export-dir) instead of (export --hcatalog)
2. Use non-reserved word as column name


> Sqoop export using --hcatalog with RDBMS reserved word column name results in 
> "null" value
> --
>
> Key: SQOOP-3038
> URL: https://issues.apache.org/jira/browse/SQOOP-3038
> Project: Sqoop
>  Issue Type: Bug
>  Components: hive-integration
>Reporter: Markus Kemper
>
> ---
> | c1  | value |  <=== RDBMS reserved word "value"
> ---
> | 1   | (null) | <=== null?
> ---
> TEST CASE:
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 varchar(5))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, 'one')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "show create table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1_value"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1_value (c1 int, \`value\` varchar(5))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1_value select * from t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "show create table t1_value"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1_value"
> Output:
> ---
> | Table| Create Table | 
> ---
> | t1   | CREATE TABLE `t1` (
>   `c1` int(11) DEFAULT NULL,
>   `c2` varchar(5) DEFAULT NULL
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 
> ---
> ---
> | c1  | c2| 
> ---
> | 1   | one   | 
> ---
> ---
> | Table| Create Table | 
> ---
> | t1_value | CREATE TABLE `t1_value` (
>   `c1` int(11) DEFAULT NULL,
>   `value` varchar(5) DEFAULT NULL
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 
> ---
> ---
> | c1  | value | 
> ---
> | 1   | one   | 
> ---
> STEP 02 - Verify Import/Export Using --target-dir and --export-dir
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "delete from t1"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1 --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1_value --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "delete from t1_value"
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> t1_value --export-dir /user/root/t1 --num-mappers 1
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1_value"
> Output:
> ---
> | c1  | c2| 
> ---
> | 1   | one   | 
> ---
> ---
> | c1  | value | 
> ---
> | 1   | one   | 
> ---
> STEP 03 - Verify import with --hive-import and export with --hcatalog
> beeline -u jdbc:hive2:// -e "use default; drop tabl

[jira] [Created] (SQOOP-3038) Sqoop export using --hcatalog with RDBMS reserved word column name results in "null" value

2016-10-25 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3038:


 Summary: Sqoop export using --hcatalog with RDBMS reserved word 
column name results in "null" value
 Key: SQOOP-3038
 URL: https://issues.apache.org/jira/browse/SQOOP-3038
 Project: Sqoop
  Issue Type: Bug
  Components: hive-integration
Reporter: Markus Kemper


---
| c1  | value |  <=== RDBMS reserved word "value"
---
| 1   | (null) | <=== null?
---

TEST CASE:

STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 varchar(5))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, 'one')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"show create table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1_value"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1_value (c1 int, \`value\` varchar(5))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1_value select * from t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"show create table t1_value"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_value"

Output:
---
| Table| Create Table | 
---
| t1   | CREATE TABLE `t1` (
  `c1` int(11) DEFAULT NULL,
  `c2` varchar(5) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 
---
---
| c1  | c2| 
---
| 1   | one   | 
---
---
| Table| Create Table | 
---
| t1_value | CREATE TABLE `t1_value` (
  `c1` int(11) DEFAULT NULL,
  `value` varchar(5) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 
---
---
| c1  | value | 
---
| 1   | one   | 
---

STEP 02 - Verify Import/Export Using --target-dir and --export-dir

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--export-dir /user/root/t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_value --target-dir /user/root/t1 --delete-target-dir --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1_value"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
t1_value --export-dir /user/root/t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1_value"

Output:
---
| c1  | c2| 
---
| 1   | one   | 
---
---
| c1  | value | 
---
| 1   | one   | 
---

STEP 03 - Verify import with --hive-import and export with --hcatalog

beeline -u jdbc:hive2:// -e "use default; drop table t1;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/root/t1 --delete-target-dir --hive-import --hive-database 
default --hive-table t1 --num-mappers 1
beeline -u jdbc:hive2:// -e "use default; select * from t1;"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"delete from t1"
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--hcatalog-database default --hcatalog-table t1 --num-mappers 1
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
+++--+
| t1.c1  | t1.c2  |
+++--+
| 1  | one|
+++--+
---
| c1  | c2| 

[jira] [Comment Edited] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-19 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589895#comment-15589895
 ] 

Markus Kemper edited comment on SQOOP-3022 at 10/19/16 9:25 PM:


Additional Clarification:

With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning:

If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d 
you need to specify 
(-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory).

Also to enable/disable OraOop using (-Doraoop.disabled=true|false).

With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop 
jar, meaning:

All references (e.g. managers.d, oraoop.jar) of a previous manual install 
should be removed and the only required option to enable OraOop is to use the 
--direct option.  We do however recommend also deploying the Sqoop1 Client 
Gateway on any host where the Sqoop1 CLI command is executed.  Any global 
OraOop configurations can also be set in the Gateway's sqoop-site.xml safety 
valve.

In the end we resolved [~Tagar]'s issue by doing the following
1. Confirmed Sqoop1 Client Gateway was deployed
2. Confirmed OraOop managers.d file was not present
3. Confirmed that oraoop.jar was not present
4. Removed options:
4.1. 
-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory
4.2. -Doraoop.disabled=false
5. Added options:
5.1. --direct

Hope this helps.
[~markuskem...@me.com] 


was (Author: markuskem...@me.com):
Additional Clarification:

With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning:

If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d 
you need to specify 
(-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory).

Also to enable/disable OraOop using (-Doraoop.disabled=true|false).

With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop 
jar, meaning:

All references (e.g. managers.d, oraoop.jar) of a previous manual install 
should be removed and the only required option to enable OraOop is to use the 
--direct option.  We do however recommend also deploying the Sqoop1 Client 
Gateway on any host where the Sqoop1 CLI command is executed.  Any global 
OraOop configurations can also be set in the Gateway's sqoop-site.xml safety 
valve.

Hope this helps.
[~markuskem...@me.com] 

> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs

2016-10-19 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589895#comment-15589895
 ] 

Markus Kemper commented on SQOOP-3022:
--

Additional Clarification:

With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning:

If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d 
you need to specify 
(-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory).

Also to enable/disable OraOop using (-Doraoop.disabled=true|false).

With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop 
jar, meaning:

All references (e.g. managers.d, oraoop.jar) of a previous manual install 
should be removed and the only required option to enable OraOop is to use the 
--direct option.  We do however recommend also deploying the Sqoop1 Client 
Gateway on any host where the Sqoop1 CLI command is executed.  Any global 
OraOop configurations can also be set in the Gateway's sqoop-site.xml safety 
valve.

Hope this helps.
[~markuskem...@me.com] 

> sqoop export for Oracle generates tremendous amounts of redo logs
> -
>
> Key: SQOOP-3022
> URL: https://issues.apache.org/jira/browse/SQOOP-3022
> Project: Sqoop
>  Issue Type: Bug
>  Components: codegen, connectors, connectors/oracle
>Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6
>Reporter: Ruslan Dautkhanov
>  Labels: export, oracle
>
> Sqoop export for Oracle generates tremendous amounts of redo logs (comparable 
> to export size or more).
> We have put target tables in nologgin mode, but Oracle will still generate 
> redo logs unless +APPEND Oracle insert hint is used.
> See https://oracle-base.com/articles/misc/append-hint for examples.
> Please add an option for sqoop to generate insert statements in Oracle with 
> APPEND statement. Our databases are swamped with redo/archived logs whenever 
> we sqoop data to them. This is easily avoidable. And from business 
> prospective sqooping to staging tables in nologgin mode is totally fine.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-3029) Add an option for uppercase/lowercase column name mapping between HCatalog and RDBMS cloumn name list

2016-10-18 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586798#comment-15586798
 ] 

Markus Kemper commented on SQOOP-3029:
--

Hey [~maugli],

I am not sure I follow you completely here.

Are you suggesting matching lower or upper or case-sensitive?

Thanks, [~markuskem...@me.com]

> Add an option for uppercase/lowercase column name mapping between HCatalog 
> and RDBMS cloumn name list
> -
>
> Key: SQOOP-3029
> URL: https://issues.apache.org/jira/browse/SQOOP-3029
> Project: Sqoop
>  Issue Type: Improvement
>Reporter: Attila Szabo
> Fix For: 1.4.7
>
>
> The current implementation of HCatalog related export works out of the box 
> with RDBMS systems (e.g. Oracle) if the column names match case sensitive, 
> although in several cases the column names are received as an UPPER_CASE list 
> from the RDBMS, and thus the export fails, within the ClassWriter.
> Although the users have the possibility to define the name of the columns 
> explicit with the --columns option, in case of 100+ columns it's not 
> practical.
> It would be great to have an option which could do the matching in lower and 
> UPPER case as well. The start point for this implementation should start from 
> SqoopHCatUtilities#configureHCat method with respect to ensure HCatalog api 
> is able to handle the different cased column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

2016-10-18 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-3030:
-
Comment: was deleted

(was: You are welcome [~Tagar]. 

[~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, 
primarily because the use case required export of Parquet files which 
unfortunately --export-dir does not support due to a KiteSDK limitation.

Thanks, [~markuskem...@me.com] )

> Export to Oracle using (--direct + --hcatalog + --columns) with columns 
> having the wrong case sensitivity fails with NPE
> 
>
> Key: SQOOP-3030
> URL: https://issues.apache.org/jira/browse/SQOOP-3030
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle, hive-integration
>    Reporter: Markus Kemper
>
> It appears that if the case of the columns in the Oracle RDBMS are not used 
> with the Sqoop --columns option with (export + --hcatalog + --direct) the 
> Sqoop map task will fail with an NPE where the non-direct path fails correct. 
>  My test case is below, please review and let me know if you have any 
> questions.
> Additionally it would be nice if we could detect this column case mis-match 
> during compile time and not submit the job only to have it fail in YARN.
> TEST CASE:
> STEP 01 : Create Oracle Source/Target Table
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | C1   | C2 | 
> -
> | 1| some data  | 
> -
> STEP 02 : Import Oracle Table using Sqoop --hcatalog options
> beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text 
> --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
> beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"
> Output:
> 
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
> default.t1_text for import
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
> statement: 
> create table `default`.`t1_text` (
>   `c1` decimal(38),
>   `c2` varchar(10))
> stored as textfile
> 
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 
> 34.4222 seconds (0.3486 bytes/sec)
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ---
> +-+-+--+
> | t1_text.c1  | t1_text.c2  |
> +-+-+--+
> | 1   | some data   |
> +-+-+--+
> STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns 
> options
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text 
> --columns "c1,c2"
> Output: (failure = correct)
> 
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 
> 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema 
> fields = [c1, c2]
> 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is 
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning 
> key fields = []
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema 
> field

[jira] [Commented] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

2016-10-18 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586786#comment-15586786
 ] 

Markus Kemper commented on SQOOP-3030:
--

You are welcome [~Tagar]. 

[~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, 
primarily because the use case required export of Parquet files which 
unfortunately --export-dir does not support due to a KiteSDK limitation.

Thanks, [~markuskem...@me.com] 

> Export to Oracle using (--direct + --hcatalog + --columns) with columns 
> having the wrong case sensitivity fails with NPE
> 
>
> Key: SQOOP-3030
> URL: https://issues.apache.org/jira/browse/SQOOP-3030
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle, hive-integration
>    Reporter: Markus Kemper
>
> It appears that if the case of the columns in the Oracle RDBMS are not used 
> with the Sqoop --columns option with (export + --hcatalog + --direct) the 
> Sqoop map task will fail with an NPE where the non-direct path fails correct. 
>  My test case is below, please review and let me know if you have any 
> questions.
> Additionally it would be nice if we could detect this column case mis-match 
> during compile time and not submit the job only to have it fail in YARN.
> TEST CASE:
> STEP 01 : Create Oracle Source/Target Table
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | C1   | C2 | 
> -
> | 1| some data  | 
> -
> STEP 02 : Import Oracle Table using Sqoop --hcatalog options
> beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text 
> --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
> beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"
> Output:
> 
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
> default.t1_text for import
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
> statement: 
> create table `default`.`t1_text` (
>   `c1` decimal(38),
>   `c2` varchar(10))
> stored as textfile
> 
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 
> 34.4222 seconds (0.3486 bytes/sec)
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ---
> +-+-+--+
> | t1_text.c1  | t1_text.c2  |
> +-+-+--+
> | 1   | some data   |
> +-+-+--+
> STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns 
> options
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text 
> --columns "c1,c2"
> Output: (failure = correct)
> 
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 
> 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema 
> fields = [c1, c2]
> 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is 
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning 
> key fields = []
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema 
> field

[jira] [Commented] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

2016-10-18 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586785#comment-15586785
 ] 

Markus Kemper commented on SQOOP-3030:
--

You are welcome [~Tagar]. 

[~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, 
primarily because the use case required export of Parquet files which 
unfortunately --export-dir does not support due to a KiteSDK limitation.

Thanks, [~markuskem...@me.com] 

> Export to Oracle using (--direct + --hcatalog + --columns) with columns 
> having the wrong case sensitivity fails with NPE
> 
>
> Key: SQOOP-3030
> URL: https://issues.apache.org/jira/browse/SQOOP-3030
> Project: Sqoop
>  Issue Type: Bug
>  Components: connectors/oracle, hive-integration
>    Reporter: Markus Kemper
>
> It appears that if the case of the columns in the Oracle RDBMS are not used 
> with the Sqoop --columns option with (export + --hcatalog + --direct) the 
> Sqoop map task will fail with an NPE where the non-direct path fails correct. 
>  My test case is below, please review and let me know if you have any 
> questions.
> Additionally it would be nice if we could detect this column case mis-match 
> during compile time and not submit the job only to have it fail in YARN.
> TEST CASE:
> STEP 01 : Create Oracle Source/Target Table
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> -
> | C1   | C2 | 
> -
> | 1| some data  | 
> -
> STEP 02 : Import Oracle Table using Sqoop --hcatalog options
> beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text 
> --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
> beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"
> Output:
> 
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
> default.t1_text for import
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
> statement: 
> create table `default`.`t1_text` (
>   `c1` decimal(38),
>   `c2` varchar(10))
> stored as textfile
> 
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 
> 34.4222 seconds (0.3486 bytes/sec)
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ---
> +-+-+--+
> | t1_text.c1  | t1_text.c2  |
> +-+-+--+
> | 1   | some data   |
> +-+-+--+
> STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns 
> options
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text 
> --columns "c1,c2"
> Output: (failure = correct)
> 
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names 
> projected : [c1, c2]
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info 
> map :
>   c1 : [Type : 2,Precision : 38,Scale : 0]
>   c2 : [Type : 12,Precision : 10,Scale : 0]
> 
> 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema 
> fields = [c1, c2]
> 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is 
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning 
> key fields = []
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema 
> field

[jira] [Created] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

2016-10-18 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-3030:


 Summary: Export to Oracle using (--direct + --hcatalog + 
--columns) with columns having the wrong case sensitivity fails with NPE
 Key: SQOOP-3030
 URL: https://issues.apache.org/jira/browse/SQOOP-3030
 Project: Sqoop
  Issue Type: Bug
  Components: connectors/oracle, hive-integration
Reporter: Markus Kemper


It appears that if the case of the columns in the Oracle RDBMS are not used 
with the Sqoop --columns option with (export + --hcatalog + --direct) the Sqoop 
map task will fail with an NPE where the non-direct path fails correct.  My 
test case is below, please review and let me know if you have any questions.

Additionally it would be nice if we could detect this column case mis-match 
during compile time and not submit the job only to have it fail in YARN.

TEST CASE:

STEP 01 : Create Oracle Source/Target Table

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c1 int, c2 varchar(10))"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, 'some data')"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
-
| C1   | C2 | 
-
| 1| some data  | 
-

STEP 02 : Import Oracle Table using Sqoop --hcatalog options

beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 
--num-mappers 1 --hcatalog-database default --hcatalog-table t1_text 
--create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"

Output:

16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names projected 
: [c1, c2]
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info map 
:
c1 : [Type : 2,Precision : 38,Scale : 0]
c2 : [Type : 12,Precision : 10,Scale : 0]
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table 
default.t1_text for import
16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table 
statement: 
create table `default`.`t1_text` (
`c1` decimal(38),
`c2` varchar(10))
stored as textfile

16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 34.4222 
seconds (0.3486 bytes/sec)
16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
---
+-+-+--+
| t1_text.c1  | t1_text.c2  |
+-+-+--+
| 1   | some data   |
+-+-+--+

STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options

sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 
--num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns 
"c1,c2"

Output: (failure = correct)

16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names projected 
: [c1, c2]
16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info map 
:
c1 : [Type : 2,Precision : 38,Scale : 0]
c2 : [Type : 12,Precision : 10,Scale : 0]

16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema 
fields = [c1, c2]
16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is 
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key 
fields = []
16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema 
fields = [c1, c2]

16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records.
16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job failed!
---
2016-10-18 09:14:19,418 INFO [main] 
org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties 
{transient_lastDdlTime=1476807097, name=default.t1_text, 
serialization.null.format=\N, columns=c1,c2, 
serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
serialization.format=1, columns.types=decimal(38,0),varchar(10)}
2016-10-18 09:14:19,660 INFO [Thread-12] 
org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is 
finished. keepGoing=false
2016-10-18 09:14:19,952 ERROR [Thread-11] 
org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update 
thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier

STEP 03 : Attempt Export to Oracle using Sqo

[jira] [Commented] (SQOOP-3025) Oracle Direct export from HCatalog parquet table gives NPE in kerberized cluster

2016-10-17 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582749#comment-15582749
 ] 

Markus Kemper commented on SQOOP-3025:
--

Hey Ruslan,

Are you able to provide the following:
1. On the host executing the Sqoop CLI command what are your security settings 
for [hive conf] and [hadoop conf]
2. Our [sqoop test] appears to be working without error

[hive conf]
$ cat /etc/hive/conf/*.xml | egrep -B 1 -A 1 -i 
"security.authentication|hadoop.security.authorization<"
  
hadoop.security.authentication
kerberos
  
hadoop.security.authorization
true

[hadoop conf]
$ cat /etc/hadoop/conf/*.xml | egrep -B 1 -A 1 -i 
"security.authentication|hadoop.security.authorization<"
  
hadoop.security.authentication
kerberos
  
hadoop.security.authorization
true

[sqoop test]
$ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
---
| C1   | C2   | 
---
| 1| one  | 
---

$ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1 --target-dir /user/xxx/t1_text --delete-target-dir --num-mappers 1 
--as-textfile --hive-import --hive-database default --hive-table t1_text 
--verbose

$ hive -e "use default; create table t1_parquet stored as parquet as select * 
from t1_text;"
$ sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_parquet 
--direct

Output:
16/10/17 09:01:16 INFO sqoop.Sqoop: Running Sqoop version: 
1.4.6-cdh5.8.4-SNAPSHOT
16/10/17 09:01:16 WARN tool.BaseSqoopTool: Setting your password on the 
command-line is insecure. Consider using -P instead.
16/10/17 09:01:16 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/17 09:01:18 INFO oracle.OraOopOracleQueries: Current schema is: SQOOP
16/10/17 09:01:18 INFO oracle.OraOopManagerFactory: 
**
*** Using Data Connector for Oracle and Hadoop ***
**

16/10/17 09:01:24 INFO client.RMProxy: Connecting to ResourceManager at 
xxx.cloudera.com/xxx.xx.xx.xxx:8032
16/10/17 09:01:25 INFO hdfs.DFSClient: Created token for admin: 
HDFS_DELEGATION_TOKEN owner=x...@xxx.cloudera.com, renewer=yarn, realUser=, 
issueDate=1476720085263, maxDate=1477324885263, sequenceNumber=8, masterKeyId=6 
on ha-hdfs:ns1
16/10/17 09:01:25 INFO security.TokenCache: Got dt for hdfs://ns1; Kind: 
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns1, Ident: (token for admin: 
HDFS_DELEGATION_TOKEN owner=x...@xxx.cloudera.com, renewer=yarn, realUser=, 
issueDate=1476720085263, maxDate=1477324885263, sequenceNumber=8, masterKeyId=6)

16/10/17 09:02:04 INFO mapreduce.Job:  map 0% reduce 0%
16/10/17 09:02:22 INFO mapreduce.Job:  map 100% reduce 0%
16/10/17 09:02:22 INFO mapreduce.Job: Job job_1476708184468_0005 completed 
successfully

16/10/17 09:02:22 INFO mapreduce.ExportJobBase: Transferred 8.1641 KB in 
57.6968 seconds (144.8954 bytes/sec)
16/10/17 09:02:22 INFO mapreduce.ExportJobBase: Exported 1 records.

$ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"

Output:
---
| C1   | C2   | 
---
| 1| one  | 
| 1| one  | 
---

Thanks, Markus

> Oracle Direct export from HCatalog parquet table gives NPE in kerberized 
> cluster
> 
>
> Key: SQOOP-3025
> URL: https://issues.apache.org/jira/browse/SQOOP-3025
> Project: Sqoop
>  Issue Type: Bug
>  Components: build, codegen, connectors/oracle, hive-integration, 
> metastore
>Affects Versions: 1.4.6
> Environment: Hadoop 2.6; Sqoop 1.4.6; Oracle 12c
>Reporter: Ruslan Dautkhanov
> Attachments: scoop_output.log, 
> sqoop_OracleDirect_mapper_nullPointer_exception.txt
>
>
> Getting following exception in one of the mappers stack when try to export to 
> Oracle from Hive Parquet 
> {noformat}
> 2016-10-14 23:14:01,027 ERROR [main] 
> org.apache.sqoop.manager.oracle.OraOopOutputFormatInsert: The following error 
> occurred during configurePreparedStatement()
> java.lang.NullPointerException
>   at 
> org.apache.sqoop.manager.oracle.OraOopOutputFormatBase$OraOopDBRec

[jira] [Comment Edited] (SQOOP-1906) Export support for mixed update/insert against db2

2016-08-29 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446704#comment-15446704
 ] 

Markus Kemper edited comment on SQOOP-1906 at 8/29/16 7:12 PM:
---

Hello [~xieshiju] and [~jarcec],

Is the blocking issue on this Jira the creation of a review board entry for the 
patch?

The IBM url below suggests that IBM supports this capability however  have not 
found any commit or code for IBM in Db2Manager.java that has the 
"upsertTable()" method.  Any assistance in understanding if this capability is 
supported with IBM would be appreciated.

https://www.ibm.com/support/knowledgecenter/SSPT3X_4.1.0/com.ibm.swg.im.infosphere.biginsights.product.doc/doc/bi_sqoop.html
IBM-specific backported patches and changes
Export support for mixed update/insert against DB2.

Thanks, [~markuskem...@me.com]


was (Author: markuskem...@me.com):
Hello [~xieshiju] and [~jarcec],

Is the blocking issue on this Jira the creation of a review board entry for the 
patch?

Thanks, [~markuskem...@me.com]

>  Export support for mixed  update/insert against db2
> 
>
> Key: SQOOP-1906
> URL: https://issues.apache.org/jira/browse/SQOOP-1906
> Project: Sqoop
>  Issue Type: Improvement
>  Components: connectors
>Affects Versions: 1.4.5
> Environment: RedHat6.4+Sqoop1.4.5+Hadoop2.4.1+Hive0.13.0+DB2 10.5
>Reporter: xieshiju
>Assignee: xieshiju
>  Labels: features
> Fix For: 1.4.7
>
> Attachments: SQOOP-1906.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The DB2 connector need to implemented the option --update-mode allowinsert. 
> option --update-mode allowinsert which can be used when loading into a table 
> that is not empty. 
> In allowinsert mode export will try to update the row but if ithe row does 
> not exist it will insert it. Without this option only rows that already exist 
> in the table will be updated and any new rows are ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-1906) Export support for mixed update/insert against db2

2016-08-29 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446704#comment-15446704
 ] 

Markus Kemper commented on SQOOP-1906:
--

Hello [~xieshiju] and [~jarcec],

Is the blocking issue on this Jira the creation of a review board entry for the 
patch?

Thanks, [~markuskem...@me.com]

>  Export support for mixed  update/insert against db2
> 
>
> Key: SQOOP-1906
> URL: https://issues.apache.org/jira/browse/SQOOP-1906
> Project: Sqoop
>  Issue Type: Improvement
>  Components: connectors
>Affects Versions: 1.4.5
> Environment: RedHat6.4+Sqoop1.4.5+Hadoop2.4.1+Hive0.13.0+DB2 10.5
>Reporter: xieshiju
>Assignee: xieshiju
>  Labels: features
> Fix For: 1.4.7
>
> Attachments: SQOOP-1906.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The DB2 connector need to implemented the option --update-mode allowinsert. 
> option --update-mode allowinsert which can be used when loading into a table 
> that is not empty. 
> In allowinsert mode export will try to update the row but if ithe row does 
> not exist it will insert it. Without this option only rows that already exist 
> in the table will be updated and any new rows are ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2748) Remove parentheses from WHERE clause

2016-08-25 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436758#comment-15436758
 ] 

Markus Kemper commented on SQOOP-2748:
--

Hello Warren,

I will defer to others on how to best address this for the IBM IMS use case 
however to the best of my knowledge the SQL is valid.  If the IMS JDBC driver 
does not allow this syntax perhaps an IBM IMS specific option via a --direct 
path would be a good approach.

As a workaround, does using >1 map task allow  your job to run?

The snips below suggest that Sqoop may generate different SQL with >1 map tasks 
possibly avoiding the condition that is impacting you.

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/user1/t1 --delete-target-dir --as-textfile  --split-by c1 
--num-mappers 1 --verbose

2016-08-24 05:12:08,403 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: 1=1 AND 1=1
2016-08-24 05:12:08,520 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: 1=1 AND 1=1
2016-08-24 05:12:08,539 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `c1`, 
`c2` FROM `t1` AS `t1` WHERE ( 1=1 ) AND ( 1=1 )

sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--target-dir /user/user1/t1 --delete-target-dir --as-textfile  --split-by c1 
--num-mappers 2 --verbose

2016-08-24 05:03:12,669 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: `c1` >= 1 AND `c1` < 5001
2016-08-24 05:03:12,756 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: `c1` >= 1 AND 
`c1` < 5001
2016-08-24 05:03:12,774 INFO [main] 
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `c1`, 
`c2` FROM `t1` AS `t1` WHERE ( `c1` >= 1 ) AND ( `c1` < 5001 )

>From your output do I am curious if you know why the IBM IMS JDBC Driver 
>generates "SELECT ... FROM MYIMSTABLE AS MYIMSTABLE WHERE ..." instead of 
>"SELECT ... FROM MYIMSTABLE WHERE ..."?

> Remove parentheses from WHERE clause
> 
>
> Key: SQOOP-2748
> URL: https://issues.apache.org/jira/browse/SQOOP-2748
> Project: Sqoop
>  Issue Type: Wish
>  Components: codegen
>Affects Versions: 1.4.6
> Environment: Redhat 6
> Cloudera 5.5.0
>Reporter: Warren Chisholm
>Priority: Minor
>  Labels: import, ims, oneequalone, parentheses, sqoop
>
> When performing a sqoop import with option "-m 1" the following where clause 
> is added by sqoop:
> WHERE ( 1=1 ) AND ( 1=1 )
> This where clause is redundant and creates an incompatibility with the IBM 
> IMS Universal JDBC driver.
> The IMS JDBC Driver does not allow the use of parentheses in the where 
> clause. The below quote was taken from 2nd dot point here: 
> https://www-01.ibm.com/support/knowledgecenter/SSEPH2_13.1.0/com.ibm.ims13.doc.apg/ims_odbjdbcusagewhere.htm.
> "Do not use parentheses. Qualification statements are evaluated from left to 
> right. The order of evaluation for operators is the IMS evaluation order for 
> segment search arguments."
> I have modified the below commands and output to anonymize the details.
> Sample sqoop command:
> {code}
> sqoop import --connect 
> 'jdbc:ims://myserver:/MYDATABASE:currentSchema=MYSCHEMA;' -username 
>  -password  --null-string '\\N' --null-non-string '\\N' 
> --hive-import --create-hive-table --table MYIMSTABLE --hive-table MYHIVETABLE 
> --hive-drop-import-delims --driver com.ibm.ims.jdbc.IMSDriver 
> --delete-target-dir -m 1
> {code}
> Extract of log produced:
> {code}
> ...
> 15/12/16 17:20:05 INFO mapreduce.Job:  map 0% reduce 0%
> 15/12/16 17:20:16 INFO mapreduce.Job: Task Id : 
> attempt_1449625205240_0003_m_00_0, Status : FAILED
> Error: java.io.IOException: SQLException in nextKeyValue
> at 
> org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at 
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.map

Re: Review Request 50794: SQOOP-2884

2016-08-09 Thread Markus Kemper

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50794/
---

(Updated Aug. 10, 2016, 12:21 a.m.)


Review request for Sqoop, Kathleen Ting and Attila Szabo.


Changes
---

New patch to remove whitespace in /src/doc/man/import-args.txt per comments


Repository: sqoop-trunk


Description
---

Added man and html related changes for SQOOP-2884 (--temporary-rootdir)


Diffs (updated)
-

  src/docs/man/import-args.txt 49855ce 
  src/docs/user/import.txt fbb32f5 

Diff: https://reviews.apache.org/r/50794/diff/


Testing
---

1. Updated the following files:
1.1. /src/doc/man/import-args.txt < add command option
1.2. /src/doc/user/import.txt < add command details
2. Reviewed changes with browser and man.


Thanks,

Markus Kemper



Re: Review Request 50794: SQOOP-2884

2016-08-09 Thread Markus Kemper

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50794/
---

(Updated Aug. 10, 2016, 12:01 a.m.)


Review request for Sqoop, Kathleen Ting and Attila Szabo.


Changes
---

Updated /src/doc/man/import-args.txt to remove whitespace per comments


Repository: sqoop-trunk


Description
---

Added man and html related changes for SQOOP-2884 (--temporary-rootdir)


Diffs
-

  src/docs/man/import-args.txt 49855ce 
  src/docs/user/import.txt fbb32f5 

Diff: https://reviews.apache.org/r/50794/diff/


Testing
---

1. Updated the following files:
1.1. /src/doc/man/import-args.txt < add command option
1.2. /src/doc/user/import.txt < add command details
2. Reviewed changes with browser and man.


File Attachments (updated)


Updated Patch per comments
  
https://reviews.apache.org/media/uploaded/files/2016/08/10/118c9164-ced0-4d75-9c84-8b021b9b770e__SQOOP-2884-1.patch


Thanks,

Markus Kemper



Re: Review Request 50794: SQOOP-2884

2016-08-09 Thread Markus Kemper


> On Aug. 4, 2016, 2:28 p.m., Erzsebet Szilagyi wrote:
> > src/docs/man/import-args.txt, line 53
> > <https://reviews.apache.org/r/50794/diff/1/?file=1462854#file1462854line53>
> >
> > In the future, please make sure you trim excess whitespaces (they show 
> > up with red in the Diff View of the Review Board).

Updated /src/doc/man/import-args.txt and removed white space
Please see new patch SQOOP-2884-1.patch


- Markus


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50794/#review144763
-------


On Aug. 4, 2016, 1:50 p.m., Markus Kemper wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50794/
> ---
> 
> (Updated Aug. 4, 2016, 1:50 p.m.)
> 
> 
> Review request for Sqoop, Kathleen Ting and Attila Szabo.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Added man and html related changes for SQOOP-2884 (--temporary-rootdir)
> 
> 
> Diffs
> -
> 
>   src/docs/man/import-args.txt 49855ce 
>   src/docs/user/import.txt fbb32f5 
> 
> Diff: https://reviews.apache.org/r/50794/diff/
> 
> 
> Testing
> ---
> 
> 1. Updated the following files:
> 1.1. /src/doc/man/import-args.txt < add command option
> 1.2. /src/doc/user/import.txt < add command details
> 2. Reviewed changes with browser and man.
> 
> 
> Thanks,
> 
> Markus Kemper
> 
>



[jira] [Comment Edited] (SQOOP-2884) Document --temporary-rootdir

2016-08-09 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414462#comment-15414462
 ] 

Markus Kemper edited comment on SQOOP-2884 at 8/9/16 11:51 PM:
---

Updated patch to remove white space per comments in:
https://reviews.apache.org/r/50794/

Changed File(s):
/src/doc/man/import-args.txt

Verified changes using:
1. mc
2. browser


was (Author: markuskem...@me.com):
Updated patch to remove white space per comments in:
https://reviews.apache.org/r/50794/

Verified changes using:
1. mc
2. browser

> Document --temporary-rootdir
> 
>
> Key: SQOOP-2884
> URL: https://issues.apache.org/jira/browse/SQOOP-2884
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Attila Szabo
>    Assignee: Markus Kemper
> Attachments: SQOOP-2884-1.patch, SQOOP-2884.patch
>
>
> In SQOOP-2880 we've implemented this new option, but it needs an updated 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2884) Document --temporary-rootdir

2016-08-09 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-2884:
-
Attachment: SQOOP-2884-1.patch

Updated patch to remove white space per comments in:
https://reviews.apache.org/r/50794/

Verified changes using:
1. mc
2. browser

> Document --temporary-rootdir
> 
>
> Key: SQOOP-2884
> URL: https://issues.apache.org/jira/browse/SQOOP-2884
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Attila Szabo
>    Assignee: Markus Kemper
> Attachments: SQOOP-2884-1.patch, SQOOP-2884.patch
>
>
> In SQOOP-2880 we've implemented this new option, but it needs an updated 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: New Sqoop Committer - Attila Szabo

2016-08-09 Thread Markus Kemper
Awesome news Attila, congrats!


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Aug 9, 2016 at 2:46 PM, Boglarka Egyed <b...@cloudera.com> wrote:

> Congratulations, Attila! You rock! :)
>
> On Tue, Aug 9, 2016 at 8:40 PM, Kathleen Ting <kathl...@apache.org> wrote:
>
>> On behalf of the Apache Sqoop PMC, I am excited to welcome Attila
>> Szabo as a new committer on Apache Sqoop. Please join me in
>> congratulating him!
>>
>> Attila has added several new features, most notably improving Sqoop
>> export performance with wide tables, improving usability by adding a
>> parameter to handle the temporary directory during Sqoop imports, and
>> improving quality by fixing unit tests. You can see all of his
>> contributions at [1].
>>
>> We appreciate all of Attila's hard work thus far, and look forward to
>> his continued contributions.
>>
>> Best,
>> Kate
>> (on behalf of the Apache Sqoop PMC)
>>
>>
>> Links:
>> 1: https://s.apache.org/xgYm
>>
>
>


Review Request 50794: SQOOP-2884

2016-08-04 Thread Markus Kemper

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50794/
---

Review request for Sqoop, Kathleen Ting and Attila Szabo.


Repository: sqoop-trunk


Description
---

Added man and html related changes for SQOOP-2884 (--temporary-rootdir)


Diffs
-

  src/docs/man/import-args.txt 49855ce 
  src/docs/user/import.txt fbb32f5 

Diff: https://reviews.apache.org/r/50794/diff/


Testing
---

1. Updated the following files:
1.1. /src/doc/man/import-args.txt < add command option
1.2. /src/doc/user/import.txt < add command details
2. Reviewed changes with browser and man.


Thanks,

Markus Kemper



[jira] [Updated] (SQOOP-2884) Document --temporary-rootdir

2016-08-04 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-2884:
-
Attachment: SQOOP-2884.patch

1. Updated the following files:
1.1. /src/doc/man/import-args.txt < add command option
1.2. /src/doc/user/import.txt < add command details
2. Reviewed changes with browser and man.

> Document --temporary-rootdir
> 
>
> Key: SQOOP-2884
> URL: https://issues.apache.org/jira/browse/SQOOP-2884
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Attila Szabo
>Assignee: Markus Kemper
> Attachments: SQOOP-2884.patch
>
>
> In SQOOP-2880 we've implemented this new option, but it needs an updated 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))

2016-08-03 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper reassigned SQOOP-2874:


Assignee: Markus Kemper

> Highlight Sqoop import with --as-parquetfile use cases (Dataset name  
> is not alphanumeric (plus '_'))
> ---
>
> Key: SQOOP-2874
> URL: https://issues.apache.org/jira/browse/SQOOP-2874
> Project: Sqoop
>  Issue Type: Improvement
>  Components: docs
>    Reporter: Markus Kemper
>    Assignee: Markus Kemper
> Attachments: Jira_SQOOP-2874_TestCases.txt
>
>
> Hello Sqoop Community,
> Would it be possible to request some documentation enhancements?
> The ask is here is to proactively help raise awareness and improve user 
> experience with a few specific use cases [1] where some Sqoop commands have 
> restricted character options when using import with --as-parquetfile.  
> My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet 
> files.  From the Kite documentation [3] we see that to ensure compatibility 
> (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces 
> which bubble up in Sqoop.
> The following Sqoop use cases when using import with --as-parquetfile result 
> in the error [2] below.  Full tests cases for each scenario are attached.  If 
> it is an option to enhance the Sqoop documentation for these use cases I am 
> happy to provide proposed changes, let me know.
> [1] Use Cases:
> 1. sqoop import --as-parquetfile + --target-dir 
> //.
> 1.1. The '.' is not allowed
> 2. sqoop import --as-parquetfile + --table .  + (no 
> --target-dir)
> 2.1. The '.' is not allowed, this is essentially the same as (1)
> 3. sqoop import --as-parquetfile + --hive-import --table 
> . 
> 3.1. The proper usage is to use --hive-database with --hive-table however 
> with --as-textfile --hive-table works with .
> [2] Kite Error:
> 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
>   at 
> org.kitesdk.data.ValidationException.check(ValidationException.java:55)
>   at 
> org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
>   at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
>   at org.kitesdk.data.Datasets.create(Datasets.java:239)
>   at org.kitesdk.data.Datasets.create(Datasets.java:307)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
>   at 
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
>   at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>   at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>   at 
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
>   at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> [3] Kite Documenation:
> http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
> Names and Namespaces
> URIs also define a name and namespace for your dataset. Kite uses these 
> values when the underlying system has the same concept (for example, Hive). 
> The name and namespace are typically the last two values in a URI. For 
> example, if you create a dataset using the URI 
> dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the 
> fact_tables Hive database. If you create a dataset using the URI 
> dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset 
> named ratings in the fact_tables namespace.  To ensure compatibility with 
> Hive and other underlying systems, names and namespaces in URIs must be made 
> of alphanumeric or underscore (_) characters and cannot start with a number.
> Thanks, Markus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SQOOP-2884) Document --temporary-rootdir

2016-08-02 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper reassigned SQOOP-2884:


Assignee: Markus Kemper  (was: Attila Szabo)

> Document --temporary-rootdir
> 
>
> Key: SQOOP-2884
> URL: https://issues.apache.org/jira/browse/SQOOP-2884
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Attila Szabo
>    Assignee: Markus Kemper
>
> In SQOOP-2880 we've implemented this new option, but it needs an updated 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SQOOP-2988) Update Sqoop1 Docs (warning about parallel jobs and "local" Sqoop1 Metastore)

2016-08-02 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper reassigned SQOOP-2988:


Assignee: Markus Kemper  (was: Boglarka Egyed)

> Update Sqoop1 Docs (warning about parallel jobs and "local" Sqoop1 Metastore)
> -
>
> Key: SQOOP-2988
> URL: https://issues.apache.org/jira/browse/SQOOP-2988
> Project: Sqoop
>  Issue Type: Sub-task
>Reporter: Boglarka Egyed
>Assignee: Markus Kemper
>  Labels: doc, incremental, job, metastore, parallel
>
> An update of the Sqoop1 Documentation is needed. Warning should be added 
> about that in case of parallel incremental workloads using the "local" 
> metastore can result in unexpected behavior such as INSERT/UDATE related 
> information (e.g. incremental.last.value) doesn't get updated properly. A 
> recommendation should also be added to use shared metastore instead. More 
> details can be found in the parent issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2971) OraOop does not close connections properly

2016-06-24 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348391#comment-15348391
 ] 

Markus Kemper commented on SQOOP-2971:
--

Thank you [~vasas],

In the field we have seen this issue result in the following Oracle exception 
thus limited the degree of parallelism possible when SESSIONS are capped.

16/04/12 13:26:22 INFO mapreduce.Job: Task Id : 
attempt_1459758835311_12642_m_07_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: 
java.sql.SQLException: ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.sql.SQLException: ORA-02391: 
exceeded simultaneous SESSIONS_PER_USER limit

> OraOop does not close connections properly
> --
>
> Key: SQOOP-2971
> URL: https://issues.apache.org/jira/browse/SQOOP-2971
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Attila Szabo
>Assignee: Szabolcs Vasas
> Fix For: 1.4.6, 1.4.7
>
> Attachments: SQOOP-2971.patch
>
>
> When importing with OraOop for every mapper task two connections are created, 
> and those connections are not closed until the end of the import process, 
> thus Sqoop use twice of the necessary amount, and kinda leaks the expensive 
> resources. This could cause problem when the number of the connections is 
> limited on the DB server side, however the user would like to leverage from 
> having more mappers (e.g. right now with 10 avail connections the user cannot 
> use 10 mappers, but instead only 4).
> Please provide a fix for this issue either by sharing the connection, or 
> closing the connections when they are not needed anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))

2016-03-07 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-2874:
-
Attachment: Jira_SQOOP-2874_TestCases.txt

Attaching test cases.

> Highlight Sqoop import with --as-parquetfile use cases (Dataset name  
> is not alphanumeric (plus '_'))
> ---
>
> Key: SQOOP-2874
> URL: https://issues.apache.org/jira/browse/SQOOP-2874
> Project: Sqoop
>  Issue Type: Improvement
>  Components: docs
>    Reporter: Markus Kemper
> Attachments: Jira_SQOOP-2874_TestCases.txt
>
>
> Hello Sqoop Community,
> Would it be possible to request some documentation enhancements?
> The ask is here is to proactively help raise awareness and improve user 
> experience with a few specific use cases [1] where some Sqoop commands have 
> restricted character options when using import with --as-parquetfile.  
> My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet 
> files.  From the Kite documentation [3] we see that to ensure compatibility 
> (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces 
> which bubble up in Sqoop.
> The following Sqoop use cases when using import with --as-parquetfile result 
> in the error [2] below.  Full tests cases for each scenario are attached.  If 
> it is an option to enhance the Sqoop documentation for these use cases I am 
> happy to provide proposed changes, let me know.
> [1] Use Cases:
> 1. sqoop import --as-parquetfile + --target-dir 
> //.
> 1.1. The '.' is not allowed
> 2. sqoop import --as-parquetfile + --table .  + (no 
> --target-dir)
> 2.1. The '.' is not allowed, this is essentially the same as (1)
> 3. sqoop import --as-parquetfile + --hive-import --table 
> . 
> 3.1. The proper usage is to use --hive-database with --hive-table however 
> with --as-textfile --hive-table works with .
> [2] Kite Error:
> 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
>   at 
> org.kitesdk.data.ValidationException.check(ValidationException.java:55)
>   at 
> org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
>   at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
>   at org.kitesdk.data.Datasets.create(Datasets.java:239)
>   at org.kitesdk.data.Datasets.create(Datasets.java:307)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
>   at 
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
>   at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>   at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>   at 
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
>   at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> [3] Kite Documenation:
> http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
> Names and Namespaces
> URIs also define a name and namespace for your dataset. Kite uses these 
> values when the underlying system has the same concept (for example, Hive). 
> The name and namespace are typically the last two values in a URI. For 
> example, if you create a dataset using the URI 
> dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the 
> fact_tables Hive database. If you create a dataset using the URI 
> dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset 
> named ratings in the fact_tables namespace.  To ensure compatibility with 
> Hive and other underlying systems, names and namespaces in URIs must be made 
> of alphanumeric or underscore (_) characters and cannot start with a number.
> Thanks, Markus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))

2016-03-07 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-2874:


 Summary: Highlight Sqoop import with --as-parquetfile use cases 
(Dataset name  is not alphanumeric (plus '_'))
 Key: SQOOP-2874
 URL: https://issues.apache.org/jira/browse/SQOOP-2874
 Project: Sqoop
  Issue Type: Improvement
  Components: docs
Reporter: Markus Kemper


Hello Sqoop Community,

Would it be possible to request some documentation enhancements?

The ask is here is to proactively help raise awareness and improve user 
experience with a few specific use cases [1] where some Sqoop commands have 
restricted character options when using import with --as-parquetfile.  

My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet 
files.  From the Kite documentation [3] we see that to ensure compatibility 
(with Hive, etc.), Kite imposes some restrictions on Names and Namespaces which 
bubble up in Sqoop.

The following Sqoop use cases when using import with --as-parquetfile result in 
the error [2] below.  Full tests cases for each scenario are attached.  If it 
is an option to enhance the Sqoop documentation for these use cases I am happy 
to provide proposed changes, let me know.

[1] Use Cases:
1. sqoop import --as-parquetfile + --target-dir //.
1.1. The '.' is not allowed
2. sqoop import --as-parquetfile + --table .  + (no 
--target-dir)
2.1. The '.' is not allowed, this is essentially the same as (1)
3. sqoop import --as-parquetfile + --hive-import --table 
. 
3.1. The proper usage is to use --hive-database with --hive-table however with 
--as-textfile --hive-table works with .

[2] Kite Error:
16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
alphanumeric (plus '_')
at 
org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at 
org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
at 
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at 
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at 
org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
at 
org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
at 
org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at 
org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

[3] Kite Documenation:
http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
Names and Namespaces
URIs also define a name and namespace for your dataset. Kite uses these values 
when the underlying system has the same concept (for example, Hive). The name 
and namespace are typically the last two values in a URI. For example, if you 
create a dataset using the URI dataset:hive:fact_tables/ratings, Kite stores a 
Hive table ratings in the fact_tables Hive database. If you create a dataset 
using the URI dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an 
HDFS dataset named ratings in the fact_tables namespace.  To ensure 
compatibility with Hive and other underlying systems, names and namespaces in 
URIs must be made of alphanumeric or underscore (_) characters and cannot start 
with a number.

Thanks, Markus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2858) Sqoop export with Avro data using (--update-key and --update-mode allowinsert) fails

2016-02-29 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-2858:
-
Attachment: SQOOP-2858_TestCase.txt

Attaching test cases.
See tests 10, 13, 16 and 17 for failure scenarios.


> Sqoop export with Avro data using (--update-key  and --update-mode 
> allowinsert) fails
> --
>
> Key: SQOOP-2858
> URL: https://issues.apache.org/jira/browse/SQOOP-2858
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
> Attachments: SQOOP-2858_TestCase.txt
>
>
> Summary:
> 1. sqoop export (--export-dir ) fails with error [1]
> 2. sqoop export (--export-dir  --update-key --update-mode 
> allowinsert) fails with error [2]
> 3. sqoop export (--hcatalog-database --hcatalog-table  
> --update-key --update-mode allowinsert) with error [3]
> See attachment for full test cases.
> [1]
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1
> 16/02/24 13:29:51 INFO mapreduce.Job: Task Id : 
> attempt_1456318803987_0015_m_00_0, Status : FAILED
> Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.math.BigDecimal
>   at T2_EXPORT.setField(T2_EXPORT.java:288)
> [2]
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 --map-column-java 
> C1_INT=Integer --update-key C1_INT --update-mode allowinsert
> 16/02/24 13:58:29 INFO mapreduce.Job: Task Id : 
> attempt_1456318803987_0022_m_00_0, Status : FAILED
> Error: java.io.IOException: Can't export data, please check failed map task 
> logs
>   at 
> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
>   at 
> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at 
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: Can't parse input data: 
> 'Objavro.schema�{"type":"record"'
> [3]
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T2_EXPORT --hcatalog-database db1 --hcatalog-table t1_avro --num-mappers 1 
> --update-key C1_INT --update-mode allowinsert
> 16/02/24 13:35:06 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.getFileType(ExportJobBase.java:127)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.isSequenceFiles(ExportJobBase.java:118)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.inputIsSequenceFiles(ExportJobBase.java:492)
>   at 
> org.apache.sqoop.mapreduce.JdbcUpdateExportJob.getMapperClass(JdbcUpdateExportJob.java:69)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.configureMapper(ExportJobBase.java:268)
>   at 
> org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:426)
>   at 
> org.apache.sqoop.manager.OracleManager.upsertTable(OracleManager.java:467)
>   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:74)
>   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-2858) Sqoop export with Avro data using (--update-key and --update-mode allowinsert) fails

2016-02-29 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-2858:


 Summary: Sqoop export with Avro data using (--update-key  and 
--update-mode allowinsert) fails
 Key: SQOOP-2858
 URL: https://issues.apache.org/jira/browse/SQOOP-2858
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


Summary:
1. sqoop export (--export-dir ) fails with error [1]
2. sqoop export (--export-dir  --update-key --update-mode 
allowinsert) fails with error [2]
3. sqoop export (--hcatalog-database --hcatalog-table  --update-key 
--update-mode allowinsert) with error [3]

See attachment for full test cases.

[1]
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1
16/02/24 13:29:51 INFO mapreduce.Job: Task Id : 
attempt_1456318803987_0015_m_00_0, Status : FAILED
Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.math.BigDecimal
at T2_EXPORT.setField(T2_EXPORT.java:288)

[2]
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 --map-column-java 
C1_INT=Integer --update-key C1_INT --update-mode allowinsert
16/02/24 13:58:29 INFO mapreduce.Job: Task Id : 
attempt_1456318803987_0022_m_00_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at 
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at 
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: 
'Objavro.schema�{"type":"record"'

[3]
sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
T2_EXPORT --hcatalog-database db1 --hcatalog-table t1_avro --num-mappers 1 
--update-key C1_INT --update-mode allowinsert
16/02/24 13:35:06 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.sqoop.mapreduce.ExportJobBase.getFileType(ExportJobBase.java:127)
at 
org.apache.sqoop.mapreduce.ExportJobBase.isSequenceFiles(ExportJobBase.java:118)
at 
org.apache.sqoop.mapreduce.ExportJobBase.inputIsSequenceFiles(ExportJobBase.java:492)
at 
org.apache.sqoop.mapreduce.JdbcUpdateExportJob.getMapperClass(JdbcUpdateExportJob.java:69)
at 
org.apache.sqoop.mapreduce.ExportJobBase.configureMapper(ExportJobBase.java:268)
at 
org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:426)
at 
org.apache.sqoop.manager.OracleManager.upsertTable(OracleManager.java:467)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:74)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data

2016-02-19 Thread Markus Kemper (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154773#comment-15154773
 ] 

Markus Kemper commented on SQOOP-2847:
--

Thank you [~jarcec].

> Sqoop --incremental + missing parent --target-dir reports success with no data
> --
>
> Key: SQOOP-2847
> URL: https://issues.apache.org/jira/browse/SQOOP-2847
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
>Assignee: Jarek Jarcec Cecho
> Fix For: 1.4.7
>
> Attachments: SQOOP-2847.patch, SQOOP-2847_TestCase.txt
>
>
> The test case attached demonstrates inconsistent behavior between import 
> (incremental and non-incremental) when the path for --target-dir contains 
> parent directories that do not exist.
> 1. Incremental reports import success however the data is not found in HDFS
> 2. Non-incremental reports success, creates the missing directories and 
> writes the data to HDFS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data

2016-02-19 Thread Markus Kemper (JIRA)

 [ 
https://issues.apache.org/jira/browse/SQOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Kemper updated SQOOP-2847:
-
Attachment: SQOOP-2847_TestCase.txt

Test case

> Sqoop --incremental + missing parent --target-dir reports success with no data
> --
>
> Key: SQOOP-2847
> URL: https://issues.apache.org/jira/browse/SQOOP-2847
> Project: Sqoop
>  Issue Type: Bug
>    Reporter: Markus Kemper
> Attachments: SQOOP-2847_TestCase.txt
>
>
> The test case attached demonstrates inconsistent behavior between import 
> (incremental and non-incremental) when the path for --target-dir contains 
> parent directories that do not exist.
> 1. Incremental reports import success however the data is not found in HDFS
> 2. Non-incremental reports success, creates the missing directories and 
> writes the data to HDFS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data

2016-02-19 Thread Markus Kemper (JIRA)
Markus Kemper created SQOOP-2847:


 Summary: Sqoop --incremental + missing parent --target-dir reports 
success with no data
 Key: SQOOP-2847
 URL: https://issues.apache.org/jira/browse/SQOOP-2847
 Project: Sqoop
  Issue Type: Bug
Reporter: Markus Kemper


The test case attached demonstrates inconsistent behavior between import 
(incremental and non-incremental) when the path for --target-dir contains 
parent directories that do not exist.
1. Incremental reports import success however the data is not found in HDFS
2. Non-incremental reports success, creates the missing directories and writes 
the data to HDFS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)