Re: [ANNOUNCE] New Sqoop PMC member - Szabolcs Vasas
Awesome news! Thanks, Markus > On Nov 6, 2018, at 04:48, Boglarka Egyed wrote: > > On behalf of the Apache Sqoop PMC, I am pleased to welcome Szabolcs Vasas as > a new Sqoop PMC Member. Please join me in congratulating him! > > Szabolcs has countless code contributions[1] as well as provides thorough > reviews > for others constantly[2]. He continuously offers help to new contributors > enabling > the project to grow and he also demonstrated a huge interest to shape the > project > with helping and scoping previous and upcoming releases. > > Szabolcs's hard work is much appreciated and we look forward to his > continued contributions! > > 1: https://s.apache.org/nzgU > 2: https://reviews.apache.org/users/vasas/reviews/ > > Kind Regards, > Bogi
Re: [ANNOUNCE] Apache Sqoop 1.4.7 released
Thank you Attila! Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Wed, Jan 31, 2018 at 8:20 AM, Attila Szabó <mau...@apache.org> wrote: > The Apache Sqoop team is pleased to announce the release of Sqoop 1.4.7. > > This is the sixth release of Apache Sqoop as a TLP project. Sqoop is a tool > designed > for efficiently transferring bulk data between Apache Hadoop and structured > datastores, > such as relational databases. > > The release is available here:http://www.apache.org/dyn/closer.cgi/sqoop/ > > The full change log is available > here:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311320=12329683 > > Your help and feedback is more than welcome. For more information on how to > report problems > and to get involved, visit the project website at http://sqoop.apache.org/. > > The Apache Sqoop Team > > > > > <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail> > Virus-free. > www.avg.com > <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail> > <#m_4264704754596709160_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >
[jira] [Commented] (SQOOP-3014) Sqoop with HCatalog import loose precision for large numbers that does not fit into double
[ https://issues.apache.org/jira/browse/SQOOP-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149309#comment-16149309 ] Markus Kemper commented on SQOOP-3014: -- Adding additional test case (below) with the following comments: * Import into HDFS seems to be working as expected * The hcatalog load phase is not working as noted by others ** The issue does not appear to be specific to HDFS ORC files *Test Case* {noformat} ### # STEP 01 - CREATE SQL TABLE AND DATA ### export MYCONN=jdbc:oracle:thin:@sqoop.apache.com:1521/db11g export MYUSER=sqoop export MYPSWD=sqoop sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table sqoop_3014" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table sqoop_3014 (c1 integer, c2 decimal(22,5), c3 varchar(40))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into sqoop_3014 values (1, 454018528782.42006329, '454018528782.42006329')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into sqoop_3014 values (2, 87658675864540185.123456789123456789, '87658675864540185.123456789123456789')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014" -- | C1 | C2 | C3 | -- | 1| 454018528782.42006 | 454018528782.42006329 | | 2| 87658675864540185.12346 | 87658675864540185.123456789123456789 | -- ### # STEP 02 - IMPORT DATA INTO HDFS (--as-textfile) ### sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table SQOOP_3014 --target-dir /user/root/sqoop_3014 --delete-target-dir --num-mappers 1 --as-textfile --verbose hdfs dfs -cat /user/root/sqoop_3014/part* Output: 1,454018528782.42006,454018528782.42006329 2,87658675864540185.12346,87658675864540185.123456789123456789 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --target-dir /user/root/sqoop_3014 --delete-target-dir --num-mappers 1 --as-textfile --verbose hdfs dfs -cat /user/root/sqoop_3014/part* Output: 1,454018528782.42006,454018528782.42006329 2,87658675864540185.12346,87658675864540185.123456789123456789 ### # STEP 03 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as textfile") ### beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_text purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 --hcatalog-table sqoop_3014_text --create-hcatalog-table --hcatalog-storage-stanza "stored as textfile" --num-mappers 1 --verbose beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_text; show create table sqoop_3014_text;" Output: +-+-+---+--+ | sqoop_3014_text.c1 | sqoop_3014_text.c2 | sqoop_3014_text.c3 | +-+-+---+--+ | 1 | 454018528782.42004 | 454018528782.42006329 | | 2 | 87658675864540192 | 87658675864540185.123456789123456789 | +-+-+---+--+ ### # STEP 04 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as orc") ### beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_orc purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 --hcatalog-table sqoop_3014_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orc" --num-mappers 1 --verbose beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_orc; show create table sqoop_3014_orc;" Output: +-+-+---+--+ | sqoop_3014_text.c1 | sqoop_3014_text.c2 | sqoop_3014_text.c3 | +-+-+---+--+ | 1
[jira] [Commented] (SQOOP-3066) Introduce an option + env variable to enable/disable SQOOP-2737 feature
[ https://issues.apache.org/jira/browse/SQOOP-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145336#comment-16145336 ] Markus Kemper commented on SQOOP-3066: -- The option added for this is: \-\-oracle-escaping-disabled true|false > Introduce an option + env variable to enable/disable SQOOP-2737 feature > --- > > Key: SQOOP-3066 > URL: https://issues.apache.org/jira/browse/SQOOP-3066 > Project: Sqoop > Issue Type: Improvement >Affects Versions: 1.4.6 >Reporter: Attila Szabo >Assignee: Attila Szabo >Priority: Critical > Fix For: 1.4.7 > > Attachments: SQOOP-3066.patch > > > After [SQOOP-2737] several users ran into that their SQOOP commands are > failing due to the fact the original commands were not phrased cases > sensitive table/column/schema names in mind. > There had been also another outcome of this feature, that the "--split-by" > option does not accept Oracle functions anymore (e.g. MOD(col_name,4) ), as > after correct escaping+quoting it would be handled by Oracle as a db col > name, instead of an expression to evaluate. > My goal here is to introduce an option to turn on/off the (fully proper and > industry standard) escaping implemented in [SQOOP-2737], and also add an > environment variable support for that, thus users would have the backward > compatible fallback plan, and not changing their command. > I do also plan to implement an argument for supporting split-by with database > functions/expressions, and thus the escaping and the split-by expressions > features would become independent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Getting upper bound in --incremental mode
Hey Jagrut, Are you using the Sqoop1 Metastore job tool (assuming yes)? Are you wanting to override the current stored --last-value when executing the Sqoop job? Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <jagrutsha...@gmail.com> wrote: > Hi Markus - The question was that --incremental with --lastmodified option > always takes the current time as the upper bound, and this gets stored as > the --last-value for the next run. > > In certain cases, it is desirable that the upper bound should come from > the actual column values, and that should get set for the --last-value for > next run. > - > Jagrut > > > > On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <mar...@cloudera.com> > wrote: > >> Hey Jagrut, >> >> Can you elaborate more about the problem you are facing and what you mean >> by (Is this possible to set while running sqoop?). >> >> >> Markus Kemper >> Customer Operations Engineer >> [image: www.cloudera.com] <http://www.cloudera.com> >> >> >> >> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <jagrutsha...@gmail.com> >> wrote: >> >> > Hi Tony - I was under the assumption that append mode will not work for >> > timestamp column. But I gave it a try after your reply, and it works. >> And >> > it gets the upper bound from the database itself. Thanks. >> > >> > -- >> > Jagrut >> > >> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <t...@phdata.io> wrote: >> > >> >> Does `--incremental append` work for you? >> >> >> >> > You should specify append mode when importing a table where new rows >> >> are continually being added with increasing row id values >> >> >> >> Tony >> >> >> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsha...@gmail.com> >> >> wrote: >> >> > >> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v >> >> 1.4.2) >> >> > generates a query like: >> >> > WHERE column >= last_modified_time and column < current_time >> >> > >> >> > The --last-value is set to the current_time and gets used for the >> next >> >> run. >> >> > >> >> > Here, the upper bound is always set to the current_time. In some >> cases, >> >> > this upper bound is required to be taken from the database table >> column >> >> > itself. So, the query is required of the form: >> >> > WHERE column >= last_modified_time and column < >> >> max_time_in_db_table_column >> >> > >> >> > And the --last-value for next run needs to be set as >> >> > the max_time_in_db_table_column (and not the current_time). >> >> > >> >> > Is this possible to set while running sqoop? If no, is there any >> >> > workaround suggested for this? >> >> > >> >> > Thanks a lot. >> >> > -- >> >> > Jagrut >> >> >> >> >> > >> > > > > -- > Jagrut > >
[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically
[ https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094664#comment-16094664 ] Markus Kemper commented on SQOOP-3042: -- As a possible workaround the Sqoop Environment script could be used to manage the contents of the compile dir this. Sample Code {noformat} cat /etc/sqoop/conf/sqoop-env.sh echo echo SQOOP COMPILE DIR DIRECTORY COUNT ls -ltd /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR FILES COUNT ls -ltR /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR OLDEST DIRECTORY ls -ltd /tmp/sqoop*/compile/* | tail -1 echo SQOOP COMPILE DIR TMPWATCH 60d DELETE TEST tmpwatch --test 60d /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR TMPWATCH 60d DELETE tmpwatch 60d /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR DIRECTORY COUNT ls -ltd /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR FILES COUNT ls -ltR /tmp/sqoop*/compile/* | wc echo SQOOP COMPILE DIR OLDEST DIRECTORY ls -ltd /tmp/sqoop*/compile/* | tail -1 echo {noformat} *Sample Output* {noformat} $ sqoop version SQOOP COMPILE DIR DIRECTORY COUNT 1410 12690 142458 SQOOP COMPILE DIR FILES COUNT 6853 27702 247866 SQOOP COMPILE DIR OLDEST DIRECTORY drwxrwxr-x 2 user1 user1 4096 Dec 6 2016 /tmp/sqoop-user1/compile/734bf9f8090d24a8757303c44faf2825 SQOOP COMPILE DIR TMPWATCH 60d DELETE TEST 0 0 0 SQOOP COMPILE DIR TMPWATCH 60d DELETE 0 0 0 SQOOP COMPILE DIR DIRECTORY COUNT 1410 12690 142458 SQOOP COMPILE DIR FILES COUNT 6853 27702 247866 SQOOP COMPILE DIR OLDEST DIRECTORY drwxrwxr-x 2 user1 user1 4096 Dec 6 2016 /tmp/sqoop-user1/compile/734bf9f8090d24a8757303c44faf2825 Warning: /opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/07/20 06:16:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1 Sqoop 1.4.6-cdh5.10.1 git commit id Compiled by jenkins on Mon Mar 20 02:34:12 PDT 2017 {noformat} > Sqoop does not clear compile directory under /tmp/sqoop-/compile > automatically > > > Key: SQOOP-3042 > URL: https://issues.apache.org/jira/browse/SQOOP-3042 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Eric Lin >Assignee: Eric Lin >Priority: Critical > Labels: patch > Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, > SQOOP-3042.4.patch > > > After running sqoop, all the temp files generated by ClassWriter are left > behind on disk, so anyone can check those JAVA files to see the schema of > those tables that Sqoop has been interacting with. By default, the directory > is under /tmp/sqoop-/compile. > In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see > that we did add "deleteOnExit" on the temp dir: > {code} > for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) { > hashDir = new File(baseDir, RandomHash.generateMD5String()); > while (hashDir.exists()) { > hashDir = new File(baseDir, RandomHash.generateMD5String()); > } > if (hashDir.mkdirs()) { > // We created the directory. Use it. > // If this directory is not actually filled with files, delete it > // when the JVM quits. > hashDir.deleteOnExit(); > break; > } > } > {code} > However, I believe it failed to delete due to directory is not empty. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Getting upper bound in --incremental mode
Hey Jagrut, Can you elaborate more about the problem you are facing and what you mean by (Is this possible to set while running sqoop?). Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <jagrutsha...@gmail.com> wrote: > Hi Tony - I was under the assumption that append mode will not work for > timestamp column. But I gave it a try after your reply, and it works. And > it gets the upper bound from the database itself. Thanks. > > -- > Jagrut > > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <t...@phdata.io> wrote: > >> Does `--incremental append` work for you? >> >> > You should specify append mode when importing a table where new rows >> are continually being added with increasing row id values >> >> Tony >> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsha...@gmail.com> >> wrote: >> > >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v >> 1.4.2) >> > generates a query like: >> > WHERE column >= last_modified_time and column < current_time >> > >> > The --last-value is set to the current_time and gets used for the next >> run. >> > >> > Here, the upper bound is always set to the current_time. In some cases, >> > this upper bound is required to be taken from the database table column >> > itself. So, the query is required of the form: >> > WHERE column >= last_modified_time and column < >> max_time_in_db_table_column >> > >> > And the --last-value for next run needs to be set as >> > the max_time_in_db_table_column (and not the current_time). >> > >> > Is this possible to set while running sqoop? If no, is there any >> > workaround suggested for this? >> > >> > Thanks a lot. >> > -- >> > Jagrut >> >> >
[jira] [Created] (SQOOP-3212) Sqoop1 (export + --export-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)
Markus Kemper created SQOOP-3212: Summary: Sqoop1 (export + --export-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704) Key: SQOOP-3212 URL: https://issues.apache.org/jira/browse/SQOOP-3212 Project: Sqoop Issue Type: Bug Environment: $ sqoop version 17/07/17 15:22:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1 Reporter: Markus Kemper Sqoop1 (export + --export-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:db2://host.domain.com:5/SQOOP export MYUSER=sqoop; export MYPSWD=sqoop; sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_default" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_default (c1_default int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_default values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1_default from t1_default" --- | C1_DEFAULT | --- | 1 | --- # # STEP 02 - Import and Export Data (baseline) using (--as-textfile, --as-avrodatafile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-textfile hdfs dfs -cat /user/root/t1_default/part* sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --export-dir /user/root/t1_default --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1_default from t1_default" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_default" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_default values (1)" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-avrodatafile sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --export-dir /user/root/t1_default --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1_default from t1_default" Output 17/07/17 14:57:22 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 22.3116 seconds (0.0896 bytes/sec) 17/07/17 14:57:22 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1 ~ 17/07/17 14:58:37 INFO mapreduce.ExportJobBase: Transferred 138 bytes in 21.4796 seconds (6.4247 bytes/sec) 17/07/17 14:58:37 INFO mapreduce.ExportJobBase: Exported 1 records. ~ --- | C1_DEFAULT | --- | 1 | | 1 | --- ~ 17/07/17 15:04:18 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 24.5847 seconds (10.6977 bytes/sec) 17/07/17 15:04:18 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 17/07/17 15:06:18 INFO mapreduce.ExportJobBase: Transferred 653 bytes in 22.1808 seconds (29.4398 bytes/sec) 17/07/17 15:06:18 INFO mapreduce.ExportJobBase: Exported 1 records. ~ --- | C1_DEFAULT | --- | 1 | | 1 | --- # # STEP 03 - Create Table and Data with case-sensitive DB2 table name # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table \"t1_lower\"" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table \"t1_lower\" (\"c1_lower\" int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into \"t1_lower\" values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select \"c1_lower\" from \"t1_lower\"" sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | egrep -i "t1" Output: --- | c1_lower| --- | 1 | --- ~ T1_DEFAULT t1_lower # # STEP 04 - Import and Export Data with case-sensitive DB2 table name using (--as-textfile, --as-avrodatafile # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from \"t1_lower\" where \$CONDITIONS" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-textfile hdfs dfs -cat /user/root/t1_lower/part
[jira] [Updated] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)
[ https://issues.apache.org/jira/browse/SQOOP-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3211: - Description: Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:db2://host.domain.com:5/SQOOP export MYUSER=sqoop; export MYPSWD=sqoop; sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_default" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_default (c1_default int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_default values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1_default from t1_default" --- | C1_DEFAULT | --- | 1 | --- # # STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-textfile hdfs dfs -cat /user/root/t1_default/part* sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-avrodatafile avro-tools tojson --pretty 'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro' sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-parquetfile hdfs dfs -ls /user/root/t1_default/*.parquet parquet-tools cat --json 'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet' Output: 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 seconds (0.0661 bytes/sec) 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1 ~ 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 22.8116 seconds (11.5292 bytes/sec) 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ { "C1_DEFAULT" : { "int" : 1 } } ~ 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 29.3071 seconds (43.9484 bytes/sec) 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ -rw-r--r-- 3 root root449 2017-07-17 14:06 /user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet ~ {"C1_DEFAULT":1} # # STEP 03 - Create Table and Data with case-sensitive DB2 table name # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table \"t1_lower\"" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table \"t1_lower\" (\"c1_lower\" int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into \"t1_lower\" values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select \"c1_lower\" from \"t1_lower\"" sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | egrep -i "t1" Output: --- | c1_lower| --- | 1 | --- ~ T1_DEFAULT t1_lower # # STEP 04 - Import Data with case-sensitive DB2 table name using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-textfile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-avrodatafile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-parquetfile Output: 17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t1_lower AS t WHERE 1=0 17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(f
[jira] [Updated] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)
[ https://issues.apache.org/jira/browse/SQOOP-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3211: - Description: Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:db2://host.domain.com:5/SQOOP export MYUSER=sqoop; export MYPSWD=sqoop; sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1 from t1" --- | C1 | --- | 1 | --- # # STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-textfile hdfs dfs -cat /user/root/t1_default/part* sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-avrodatafile avro-tools tojson --pretty 'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro' sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-parquetfile hdfs dfs -ls /user/root/t1_default/*.parquet parquet-tools cat --json 'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet' Output: 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 seconds (0.0661 bytes/sec) 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1 ~ 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 22.8116 seconds (11.5292 bytes/sec) 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ { "C1_DEFAULT" : { "int" : 1 } } ~ 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 29.3071 seconds (43.9484 bytes/sec) 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ -rw-r--r-- 3 root root449 2017-07-17 14:06 /user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet ~ {"C1_DEFAULT":1} # # STEP 03 - Create Table and Data with case-sensitive DB2 table name # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table \"t1_lower\"" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table \"t1_lower\" (\"c1_lower\" int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into \"t1_lower\" values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select \"c1_lower\" from \"t1_lower\"" sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | egrep -i "t1" Output: --- | c1_lower| --- | 1 | --- ~ T1_DEFAULT t1_lower # # STEP 04 - Import Data with case-sensitive DB2 table name using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-textfile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-avrodatafile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-parquetfile Output: 17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t1_lower AS t WHERE 1=0 17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=DB2INST1.T1_LOWER, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) ~ 17/07/17 14:14:47 INFO manager.SqlManager:
[jira] [Created] (SQOOP-3211) Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704)
Markus Kemper created SQOOP-3211: Summary: Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704) Key: SQOOP-3211 URL: https://issues.apache.org/jira/browse/SQOOP-3211 Project: Sqoop Issue Type: Bug Environment: $ sqoop version 17/07/17 14:22:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1 Reporter: Markus Kemper Sqoop1 (import + --target-dir + --table) using DB2 with case-sensitive-table fails with error (DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704), see test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:db2://host.domain.com:5/SQOOP export MYUSER=sqoop; export MYPSWD=sqoop; sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select c1 from t1" --- | C1 | --- | 1 | --- # # STEP 02 - Import Data (baseline) using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 --as-textfile hdfs dfs -cat /user/root/t1/part* sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-avrodatafile avro-tools tojson --pretty 'hdfs://host.domain.com/user/root/t1_default/part-m-0.avro' sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_default --target-dir /user/root/t1_default --delete-target-dir --num-mappers 1 --as-parquetfile hdfs dfs -ls /user/root/t1_default/*.parquet parquet-tools cat --json 'hdfs://host.domain.com/user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet' Output: 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 30.2579 seconds (0.0661 bytes/sec) 17/07/17 14:00:41 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1 ~ 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Transferred 263 bytes in 22.8116 seconds (11.5292 bytes/sec) 17/07/17 14:02:05 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ { "C1_DEFAULT" : { "int" : 1 } } ~ 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Transferred 1.2578 KB in 29.3071 seconds (43.9484 bytes/sec) 17/07/17 14:06:29 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ -rw-r--r-- 3 root root449 2017-07-17 14:06 /user/root/t1_default/c2a7687c-b2bd-40d6-8959-d8ce9c240ae6.parquet ~ {"C1_DEFAULT":1} # # STEP 03 - Create Table and Data with case-sensitive DB2 table name # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table \"t1_lower\"" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table \"t1_lower\" (\"c1_lower\" int)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into \"t1_lower\" values (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select \"c1_lower\" from \"t1_lower\"" sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD | egrep -i "t1" Output: --- | c1_lower| --- | 1 | --- ~ T1_DEFAULT t1_lower # # STEP 04 - Import Data with case-sensitive DB2 table name using (--as-textfile, --as-avrodatafile, --as-parquetfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-textfile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-avrodatafile sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table \"t1_lower\" --target-dir /user/root/t1_lower --delete-target-dir --num-mappers 1 --as-parquetfile Output: 17/07/17 14:13:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t1_lower AS t WHERE 1=0 17/07/17 14:13:59 ERROR manager.SqlManager: Error executing statement: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2
[jira] [Created] (SQOOP-3210) Sqoop1 (import + --hive-import + --map-column-hive) using column aliases fails with error (Import failed: No column by the name)
Markus Kemper created SQOOP-3210: Summary: Sqoop1 (import + --hive-import + --map-column-hive) using column aliases fails with error (Import failed: No column by the name) Key: SQOOP-3210 URL: https://issues.apache.org/jira/browse/SQOOP-3210 Project: Sqoop Issue Type: Improvement Environment: $ sqoop version 17/07/17 12:28:51 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.1 Reporter: Markus Kemper Sqoop1 (import + --hive-import + --map-column-hive) using column aliases fails with error (Import failed: No column by the name), see test case below. Please enable the ability to use column aliases. *Test Case* {noformat} # # STEP 01 - Create Table and Data # [example] export MYCONN=jdbc:mysql://host.domain.com:3306/db1 export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: - | c1 | c2 | c3 | - | 1 | 2017-07-17 | some data | - # # STEP 02 - Verify Hive Import (baseline) # beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir /data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' --lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 --hive-table t1 beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1" Output: +---++--+--+ | col_name | data_type | comment | +---++--+--+ | c1| int| | | c2| string | | | c3| string | | +---++--+--+ ++-++--+ | t1.c1 |t1.c2| t1.c3| ++-++--+ | 1 | 2017-07-17 | some data | ++-++--+ # # STEP 03 - Verify Hive Import with (--map-column-hive) # beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir /data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' --lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 --hive-table t1 --map-column-hive "c2=date,c3=varchar(10)" beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1" +---+--+--+--+ | col_name | data_type | comment | +---+--+--+--+ | c1| int | | | c2| date | | | c3| varchar(10) | | +---+--+--+--+ ++-++--+ | t1.c1 |t1.c2| t1.c3| ++-++--+ | 1 | 2017-07-17 | some data | ++-++--+ # # STEP 04 - Verify Hive Import with (--map-column-hive) and (database.table) notation # beeline -u jdbc:hive2:// -e "use db1; drop table t1 purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 where \$CONDITIONS" --num-mappers 1 --target-dir /data/dbs/db1/t1 --delete-target-dir --fields-terminated-by ',' --lines-terminated-by '\n' --as-textfile --hive-import --hive-database db1 --hive-table t1 --map-column-hive "db1.c2=date,db1.c3=varchar(10)" beeline -u jdbc:hive2:// -e "use db1; desc t1; select * from t1" Output: 17/07/17 12:10:47 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 20.602 seconds (1.1164 bytes/sec) 17/07/17 12:10:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. 17/07/17 12:10:47 INFO manager.SqlManager: Executing SQL statement: select * from t1 where (1 = 0) 17/07/17 12:10:47 INFO manager.SqlManager: Executing SQL statement: select * from t1 where (1 = 0) 17/07/17 12:10:47 ERROR tool.ImportTool: Import failed: No column by the name db1.c3found while importing data # # STEP 05 - Verify Hive Import with (--map-column-hive) and (co
[jira] [Created] (SQOOP-3188) Sqoop1 (import + --target-dir) with empty directory (/usr/lib/hive) fails with error (java.lang.NoClassDefFoundError: org/json/JSONObject)
Markus Kemper created SQOOP-3188: Summary: Sqoop1 (import + --target-dir) with empty directory (/usr/lib/hive) fails with error (java.lang.NoClassDefFoundError: org/json/JSONObject) Key: SQOOP-3188 URL: https://issues.apache.org/jira/browse/SQOOP-3188 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (import + --target-dir) with empty directory (/usr/lib/hive) fails with error (java.lang.NoClassDefFoundError: org/json/JSONObject), see test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:mysql://mysql.sqoop.com:3306/sqoop export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: - | c1 | c2 | c3 | - | 1 | 2017-05-10 | some data | - # # STEP 02 - Import Data into HDFS # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 hdfs dfs -cat /user/root/t1/part* Output: 17/05/10 13:46:24 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 22.65 seconds (1.0155 bytes/sec) 17/05/10 13:46:24 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1,2017-05-10,some data # # STEP 03 - Create Bogus Hive Directory and Attempt to Import into HDFS # mkdir /usr/lib/hive chmod 777 /usr/lib/hive sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 Output: 17/05/10 13:47:44 INFO mapreduce.ImportJobBase: Beginning import of t1 Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:776) at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:388) at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:374) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:256) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.json.JSONObject at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 15 more # # STEP 04 - Remove Bogus Hive Directory and Attempt to Import into HDFS # rm -rf /usr/lib/hive sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 hdfs dfs -cat /user/root/t1/part* Output: 17/05/10 13:52:30 INFO mapreduce.ImportJobBase: Transferred 23 bytes in 22.6361 seconds (1.0161 bytes/sec) 17/05/10 13:52:30 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1,2017-05-10,some data {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3186) Add Sqoop1 (import + --incremental + --check-column) support for functions/expressions
Markus Kemper created SQOOP-3186: Summary: Add Sqoop1 (import + --incremental + --check-column) support for functions/expressions Key: SQOOP-3186 URL: https://issues.apache.org/jira/browse/SQOOP-3186 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Add Sqoop1 (import + --incremental + --check-column) support for functions/expressions, for example: *Example* {noformat} sqoop import \ --connect $MYCONN --username $MYUSER --password $MYPSWD \ --table T1 --target-dir /path/directory --merge-key C1 \ --incremental lastmodified --last-value '2017-01-01 00:00:00.0' \ --check-column nvl(C4,to_date('2017-01-01 00:00:00') {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (SQOOP-3184) Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations
[ https://issues.apache.org/jira/browse/SQOOP-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3184: - Description: Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations === https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports 7.2.10. Incremental Imports === was: Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option === https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports 7.2.10. Incremental Imports Table 5. Incremental import arguments: Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform. --check-column (col) --incremental (mode) --last-value (value) An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported. === > Sqoop1 (doc import + --incremental) does not cover HDFS file formats > supported and/or limitations > - > > Key: SQOOP-3184 > URL: https://issues.apache.org/jira/browse/SQOOP-3184 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > > Sqoop1 (doc import + --incremental) does not cover HDFS file formats > supported and/or limitations > === > https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports > 7.2.10. Incremental Imports > === -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3184) Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations
Markus Kemper created SQOOP-3184: Summary: Sqoop1 (doc import + --incremental) does not cover HDFS file formats supported and/or limitations Key: SQOOP-3184 URL: https://issues.apache.org/jira/browse/SQOOP-3184 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option === https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports 7.2.10. Incremental Imports Table 5. Incremental import arguments: Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform. --check-column (col) --incremental (mode) --last-value (value) An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported. === -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3183) Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option
Markus Kemper created SQOOP-3183: Summary: Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option Key: SQOOP-3183 URL: https://issues.apache.org/jira/browse/SQOOP-3183 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (doc import + --incremental lastmodified) does not cover the --merge-key option === https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports 7.2.10. Incremental Imports Table 5. Incremental import arguments: Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform. --check-column (col) --incremental (mode) --last-value (value) An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported. === -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (SQOOP-3182) Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Can't parse input data: 'PAR1')
[ https://issues.apache.org/jira/browse/SQOOP-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3182: - Summary: Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Can't parse input data: 'PAR1') (was: Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1')) > Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with > (Can't parse input data: 'PAR1') > > > Key: SQOOP-3182 > URL: https://issues.apache.org/jira/browse/SQOOP-3182 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > > Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with > (Can't parse input data: 'PAR1'). See test case below. > *Test Case* > {noformat} > # > # STEP 01 - Create Table and Data > # > export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g; > export MYUSER=sqoop > export MYPSWD=sqoop > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > Output: > - > | C1 | C2 | C3 | C4 | > - > | 1| 2017-05-06 06:59:02.0 | NEW ROW 1 | 2017-05-06 > 06:59:02 | > - > # > # STEP 02 - Import Data into HDFS > # > hdfs dfs -rm -r /user/root/t1 > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 > --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile > --map-column-java C2=String,C4=String --num-mappers 1 --verbose > hdfs dfs -ls /user/root/t1/*.parquet > parquet-tools cat --json > 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet' > Output: > 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in > 23.6174 seconds (113.8988 bytes/sec) > 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records. > 17/05/06 07:01:34 INFO tool.ImportTool: --last-value 2017-05-06 07:01:09.0 > ~ > -rw-r--r-- 3 root root 1144 2017-05-06 07:01 > /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet > ~ > {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 > 06:59:02"} > # > # STEP 03 - Insert New Row and Update Existing Row > # > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1 order by c1" > Output: > - > | C1 | C2 | C3 | C4 | > - > | 1| 2017-05-06 06:59:02.0 | UPDATE 1 | 2017-05-06 > 07:04:40 | > | 2| 2017-05-06 07:04:38.0 | NEW ROW 2 | 2017-05-06 > 07:04:38 | > - > # > # STEP 04 - Import Data into HDFS and Merge changes > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 > --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile > --map-column-java C2=String,C4=String --num-mappers 1 --verbose > Output: > 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in > 27.4934 seconds (99.1148 bytes/sec) > 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retri
[jira] [Created] (SQOOP-3182) Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1')
Markus Kemper created SQOOP-3182: Summary: Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1') Key: SQOOP-3182 URL: https://issues.apache.org/jira/browse/SQOOP-3182 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1'). See test case below. *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g; export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: - | C1 | C2 | C3 | C4 | - | 1| 2017-05-06 06:59:02.0 | NEW ROW 1 | 2017-05-06 06:59:02 | - # # STEP 02 - Import Data into HDFS # hdfs dfs -rm -r /user/root/t1 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose hdfs dfs -ls /user/root/t1/*.parquet parquet-tools cat --json 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet' Output: 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 23.6174 seconds (113.8988 bytes/sec) 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records. 17/05/06 07:01:34 INFO tool.ImportTool: --last-value 2017-05-06 07:01:09.0 ~ -rw-r--r-- 3 root root 1144 2017-05-06 07:01 /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet ~ {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 06:59:02"} # # STEP 03 - Insert New Row and Update Existing Row # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" Output: - | C1 | C2 | C3 | C4 | - | 1| 2017-05-06 06:59:02.0 | UPDATE 1 | 2017-05-06 07:04:40 | | 2| 2017-05-06 07:04:38.0 | NEW ROW 2 | 2017-05-06 07:04:38 | - # # STEP 04 - Import Data into HDFS and Merge changes # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose Output: 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 27.4934 seconds (99.1148 bytes/sec) 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records. 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@121fdcee 17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run merge job. 17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: 4bc6b65cd0194b81938f4660974ee392_T1 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through URL: jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/ 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is java.net.FactoryURLClassLoader@121fdcee 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1 17/05/06 07:06:43 ERROR tool.ImportTool: Import failed:
[jira] [Created] (SQOOP-3181) Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Could not find class .)
Markus Kemper created SQOOP-3181: Summary: Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Could not find class .) Key: SQOOP-3181 URL: https://issues.apache.org/jira/browse/SQOOP-3181 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Could not find class .). See test case below *Test Case* {noformat} # # STEP 01 - Create Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g; export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: - | C1 | C2 | C3 | C4 | - | 1| 2017-05-06 06:59:02.0 | NEW ROW 1 | 2017-05-06 06:59:02 | - # # STEP 02 - Import Data into HDFS # hdfs dfs -rm -r /user/root/t1 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose hdfs dfs -ls /user/root/t1/*.parquet parquet-tools cat --json 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet' Output: 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 23.6174 seconds (113.8988 bytes/sec) 17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records. 17/05/06 07:01:34 INFO tool.ImportTool: --last-value 2017-05-06 07:01:09.0 ~ -rw-r--r-- 3 root root 1144 2017-05-06 07:01 /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet ~ {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 06:59:02"} # # STEP 03 - Insert New Row and Update Existing Row # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" Output: - | C1 | C2 | C3 | C4 | - | 1| 2017-05-06 06:59:02.0 | UPDATE 1 | 2017-05-06 07:04:40 | | 2| 2017-05-06 07:04:38.0 | NEW ROW 2 | 2017-05-06 07:04:38 | - # # STEP 04 - Import Data into HDFS and Merge changes # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose Output: 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 27.4934 seconds (99.1148 bytes/sec) 17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records. 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@121fdcee 17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run merge job. 17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: 4bc6b65cd0194b81938f4660974ee392_T1 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through URL: jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/ 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is java.net.FactoryURLClassLoader@121fdcee 17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1 17/05/06 07:06:43 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not load jar /tmp/sqoop-root/compi
[jira] [Updated] (SQOOP-3179) Add Sqoop1 (import + --incremental lastmodified + --check-column) support for NULLs
[ https://issues.apache.org/jira/browse/SQOOP-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3179: - Description: When using Sqoop (import + --incremental lastmodified + --check-column) if the check column contains NULLs the rows are ignored. Please consider adding the ability to include check column NULL values {noformat} # # STEP 01 - Create RDBMS Table and Data # export MYCONN=jdbc:oracle:thin:@myoracle.mydomain.com:1521/db11g export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(15), c4 timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, sysdate, 'data row 1', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" -- | C1 | C2 | C3 | C4 | -- | 1| 2017-05-04 15:17:33.0 | data row 1 | 2017-05-04 15:17:33 | -- # # STEP 02 - Create Sqoop Incremental Import Job # sqoop job --create inc_import_hdfs -- import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 00:00:00' --as-textfile --map-column-java C2=String,C4=String --fields-terminated-by '\001' --lines-terminated-by '\n' --num-mappers 1 --verbose --hive-drop-import-delims # # STEP 03 - Execute Job and Verify data in HDFS # hdfs dfs -rm -r /user/root/t1 sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 24.6352 seconds (2.2326 bytes/sec) 17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 # # STEP 04 - Insert New Rows (one with NULL --check-column) # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (2, sysdate, 'data row 2', NULL)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (3, sysdate, 'data row 3', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" Output: -- | C1 | C2 | C3 | C4 | -- | 1| 2017-05-04 15:17:33.0 | data row 1 | 2017-05-04 15:17:33 | | 2| 2017-05-04 15:27:19.0 | data row 2 | (null) | | 3| 2017-05-04 15:27:22.0 | data row 3 | 2017-05-04 15:27:22 | -- # # STEP 05 - Execute Job and Verify data in HDFS (row with NULL --check-column is not imported) # sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 21.6227 seconds (2.5436 bytes/sec) 17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 32017-05-04 15:27:22.0data row 32017-05-04 15:27:22 # # STEP 06 - Update Row with NULL --check-column, Execute Job and Verify data in HDFS # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "update t1 set c4 = sysdate where c1 = 2" sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 27.2132 seconds (2.0211 bytes/sec) 17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 22017-05-04 15:27:19.0data row 22017-05-04 15:31:55 <=== 32017-05-04 15:27:22.0data row 32017-05-04 15:27:22 {noformat} was: When using Sqoop (import + --incremental lastmodified + --check-column) if the check column contains NULLs the rows are ignored. Please consider adding the ability to include ch
[jira] [Created] (SQOOP-3179) Add Sqoop1 (import + --incremental lastmodified + --check-column) support for NULLs
Markus Kemper created SQOOP-3179: Summary: Add Sqoop1 (import + --incremental lastmodified + --check-column) support for NULLs Key: SQOOP-3179 URL: https://issues.apache.org/jira/browse/SQOOP-3179 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper When using Sqoop (import + --incremental lastmodified + --check-column) if the check column contains NULLs the rows are ignored. Please consider adding the ability to include check column NULL values {noformat} # # STEP 01 - Create RDBMS Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g export MYUSER=sqoop export MYPSWD=welcome1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(15), c4 timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, sysdate, 'data row 1', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" -- | C1 | C2 | C3 | C4 | -- | 1| 2017-05-04 15:17:33.0 | data row 1 | 2017-05-04 15:17:33 | -- # # STEP 02 - Create Sqoop Incremental Import Job # sqoop job --create inc_import_hdfs -- import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 00:00:00' --as-textfile --map-column-java C2=String,C4=String --fields-terminated-by '\001' --lines-terminated-by '\n' --num-mappers 1 --verbose --hive-drop-import-delims # # STEP 03 - Execute Job and Verify data in HDFS # hdfs dfs -rm -r /user/root/t1 sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 24.6352 seconds (2.2326 bytes/sec) 17/05/04 15:25:20 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 # # STEP 04 - Insert New Rows (one with NULL --check-column) # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (2, sysdate, 'data row 2', NULL)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (3, sysdate, 'data row 3', sysdate)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1" Output: -- | C1 | C2 | C3 | C4 | -- | 1| 2017-05-04 15:17:33.0 | data row 1 | 2017-05-04 15:17:33 | | 2| 2017-05-04 15:27:19.0 | data row 2 | (null) | | 3| 2017-05-04 15:27:22.0 | data row 3 | 2017-05-04 15:27:22 | -- # # STEP 05 - Execute Job and Verify data in HDFS (row with NULL --check-column is not imported) # sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 21.6227 seconds (2.5436 bytes/sec) 17/05/04 15:28:22 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 32017-05-04 15:27:22.0data row 32017-05-04 15:27:22 # # STEP 06 - Update Row with NULL --check-column, Execute Job and Verify data in HDFS # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "update t1 set c4 = sysdate where c1 = 2" sqoop job --exec inc_import_hdfs -- --connect $MYCONN --username $MYUSER --password $MYPSWD hdfs dfs -cat /user/root/t1/part* Output: 17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Transferred 55 bytes in 27.2132 seconds (2.0211 bytes/sec) 17/05/04 15:32:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 12017-05-04 15:17:33.0data row 12017-05-04 15:17:33 22017-05-04 15:27:19.0data row 22017-05-04 15:31:55 <=== 32017-05-04 15:27:22.0data row 32017-05-04 15:27:22 {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3175) Sqoop1 (import + --as-parquetfile) writes data to wrong Hive table if same table name exists in Hive default database
Markus Kemper created SQOOP-3175: Summary: Sqoop1 (import + --as-parquetfile) writes data to wrong Hive table if same table name exists in Hive default database Key: SQOOP-3175 URL: https://issues.apache.org/jira/browse/SQOOP-3175 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop1 (import + --as-parquetfile) writes data to wrong Hive table if same table name exists in Hive default database. The test case below demonstrates this issue. *Test Case* {noformat} # Issue: Data files written to the wrong table with Parquet # # STEP 01 - Create Table and Data # export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/sqoop export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'new row 1')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" - | c1 | c2 | c3 | - | 1 | 2017-04-22 | new row 1 | - # # STEP 02 - Create HDFS Structures # beeline -u jdbc:hive2:// -e "drop database db1 cascade; drop database db2 cascade; drop database db3 cascade;" sudo -u hdfs hdfs dfs -rm -r /data/tmp sudo -u hdfs hdfs dfs -rm -r /data/dbs sudo -u hdfs hdfs dfs -mkdir -p /data/tmp sudo -u hdfs hdfs dfs -chmod 777 /data/tmp sudo -u hdfs hdfs dfs -mkdir -p /data/dbs sudo -u hdfs hdfs dfs -chmod 777 /data/dbs # # STEP 03 - Create Hive Databases # beeline -u jdbc:hive2:// -e "create database db1 location '/data/dbs/db1'; create database db2 location '/data/dbs/db2';" beeline -u jdbc:hive2:// -e "show databases; describe database default; describe database db1; describe database db2;" beeline -u jdbc:hive2:// -e "use default; show tables; use db1; show tables; use db2; show tables;" hdfs dfs -ls -R /user/hive/warehouse /data +--++--+-+-+-+--+ | db_name |comment | location | owner_name | owner_type | parameters | +--++--+-+-+-+--+ | default | Default Hive database | hdfs://nameservice1/user/hive/warehouse | public | ROLE| | +--++--+-+-+-+--+ +--+--+---+-+-+-+--+ | db_name | comment | location | owner_name | owner_type | parameters | +--+--+---+-+-+-+--+ | db1 | | hdfs://nameservice1/data/dbs/db1 | root| USER | | +--+--+---+-+-+-+--+ +--+--+---+-+-+-+--+ | db_name | comment | location | owner_name | owner_type | parameters | +--+--+---+-+-+-+--+ | db2 | | hdfs://nameservice1/data/dbs/db2 | root| USER | | +--+--+---+-+-+-+--+ ~ +---+--+ | tab_name | +---+--+ +---+--+ +---+--+ | tab_name | +---+--+ +---+--+ +---+--+ | tab_name | +---+--+ +---+--+ ~ drwxrwxrwx - root supergroup 0 2017-04-22 16:22 /data/dbs/db1 drwxrwxrwx - root supergroup 0 2017-04-22 16:22 /data/dbs/db2 # # STEP 04 - Import RDBMS Table, Create Hive Table in Database (default) and Load Data (using --as-textfile) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /data/tmp/t1 --hive-import --hive-database default --hive-table t1 --as-textfile --num-mappers 1 hdfs dfs -ls -R /user/hive/
[jira] [Created] (SQOOP-3166) Sqoop1 (import + --query + aggregate function + --split-by -m >=2) fails with error (unknown column)
Markus Kemper created SQOOP-3166: Summary: Sqoop1 (import + --query + aggregate function + --split-by -m >=2) fails with error (unknown column) Key: SQOOP-3166 URL: https://issues.apache.org/jira/browse/SQOOP-3166 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper This issue appears to be RDBMS generic. *Test Case* {noformat} # Description: # # 1. Sqoop import fails with Unknown Column error with the following conditions # 1.1. Using --query + sql aggregate function() + --split-by + --num-mappers >1 fails # 2. The Sqoop documentation does not seem to clarify requirements for "select list" and "--split-by" # # Documentation # http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.10.0/SqoopUserGuide.html#_controlling_parallelism # 7.2.4. Controlling Parallelism # # STEP 01 - [ORACLE] Create Data # export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop view v1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create view v1 as select c1 as \"ID\", c2, c3 from t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(c1) from t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(id) from v1" | COUNT(C1)| | 1| ~ | COUNT(ID)| | 1| # # STEP 02 - [ORACLE] Import from Table/View with --num-mappes 2 (reproduction) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(c1) from t1 where \$CONDITIONS" --target-dir /user/root/t1 --delete-target-dir --num-mappers 2 --split-by C1 --verbose sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(id) from v1 where \$CONDITIONS" --target-dir /user/root/t1 --delete-target-dir --num-mappers 2 --split-by ID --verbose Output: 17/04/03 09:09:11 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(C1), MAX(C1) FROM (select count(c1) from t1 where (1 = 1) ) t1 17/04/03 09:09:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1490976836761_0069 17/04/03 09:09:11 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: java.sql.SQLSyntaxErrorException: ORA-00904: "C1": invalid identifier ~ 17/04/03 09:10:01 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM (select count(id) from v1 where (1 = 1) ) t1 17/04/03 09:10:01 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1490976836761_0070 17/04/03 09:10:01 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: java.sql.SQLSyntaxErrorException: ORA-00904: "ID": invalid identifier # # STEP 03 - [ORACLE] Import from Table/View with --num-mappes 1 (workaround) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(c1) from t1 where \$CONDITIONS" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 --split-by C1 --verbose sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(id) from v1 where \$CONDITIONS" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 --split-by ID --verbose Output: 17/04/03 09:07:11 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 21.5799 seconds (0.0927 bytes/sec) 17/04/03 09:07:11 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 17/04/03 09:08:13 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 20.4732 seconds (0.0977 bytes/sec) 17/04/03 09:08:13 INFO mapreduce.ImportJobBase: Retrieved 1 records. # # STEP 04 - [MYSQL] Create Data # export MYCONN=jdbc:mysql://mysql1.cloudera.com:3306/db_coe export MYUSER=sqoop export M
[jira] [Updated] (SQOOP-3160) Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)
[ https://issues.apache.org/jira/browse/SQOOP-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3160: - Description: Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError), see test case below. *Test Case* {noformat} # # STEP 01 - Create Data # export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t\$1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t\$1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t\$1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t\$1" Output: --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:31:18.0 | some data | --- # # STEP 02 - Import with '$' (--table T$1) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T$1 --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1 Output: 17/03/21 07:22:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T" t WHERE 1=0 17/03/21 07:22:30 ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist # # STEP 03 - Import with '$' (--table T\$1) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T\$1 --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1 Output: 17/03/22 07:30:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T$1" t WHERE 1=0 17/03/22 07:30:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce 17/03/22 07:30:38 INFO mapreduce.Job: Task Id : attempt_1488818259270_0124_m_00_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.sqoop.mapreduce.db.DBRecordReader.createValue(DBRecordReader.java:197) at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:230) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 14 more Caused by: java.lang.NoClassDefFoundError: T$1$1 at T$1.init0(T$1.java:43) at T$1.(T$1.java:63) ... 19 more # # STEP 04 - Import with '$' (--table "T$1") # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T$1" --target-dir /user/root/t$1 --delete-target-dir --num-mappers 1 Output: 17/03/21 07:25:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T" t WHERE 1=0 17/03/21 07:25:08 ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist # # STEP 5 - Import with '$' (--table "T\$1")
[jira] [Created] (SQOOP-3160) Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)
Markus Kemper created SQOOP-3160: Summary: Sqoop (import + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError) Key: SQOOP-3160 URL: https://issues.apache.org/jira/browse/SQOOP-3160 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop (export + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError), see test case below. *Test Case* {noformat} # # STEP 01 - Create Data # export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:30:02.0 | some data | --- sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t\$1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t\$1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t\$1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t\$1" Output: --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:31:18.0 | some data | --- # # STEP 02 - Import and Export without '$' (baseline) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 hdfs dfs -cat /user/root/t1/part* Output: 17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Transferred 34 bytes in 21.6129 seconds (1.5731 bytes/sec) 17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1,2017-03-21 13:30:02.0,some data sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --export-dir /user/root/t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: 17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Transferred 163 bytes in 21.4938 seconds (7.5836 bytes/sec) 17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Exported 1 records. ~ --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:30:02.0 | some data | | 1| 2017-03-21 13:30:02.0 | some data | --- # # STEP 03 - Export with '$' (--table T$1) # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T$1 --export-dir /user/root/t1 --num-mappers 1 Output: 17/03/21 06:49:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T" t WHERE 1=0 17/03/21 06:49:28 ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist # # STEP 04 - Export with '$' (--table T\$1) # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T\$1 --export-dir /user/root/t1 --num-mappers 1 Output: 17/03/22 07:27:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T$1" t WHERE 1=0 17/03/22 07:27:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce 17/03/22 07:27:27 INFO mapreduce.ExportJobBase: Beginning export of T$1 2017-03-22 07:27:54,215 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:72) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.sqoop.mapreduce.Auto
[jira] [Created] (SQOOP-3159) Sqoop (export + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError)
Markus Kemper created SQOOP-3159: Summary: Sqoop (export + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError) Key: SQOOP-3159 URL: https://issues.apache.org/jira/browse/SQOOP-3159 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Sqoop (export + --table) with Oracle table_name having '$' fails with error (ORA-00942 or java.lang.NoClassDefFoundError), see test case below. *Test Case* {noformat} # # STEP 01 - Create Data # export MYCONN=jdbc:oracle:thin:@oracle1.cloudera.com:1521/db11g; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:30:02.0 | some data | --- sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t\$1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t\$1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t\$1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t\$1" Output: --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:31:18.0 | some data | --- # # STEP 02 - Import and Export without '$' (baseline) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 hdfs dfs -cat /user/root/t1/part* Output: 17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Transferred 34 bytes in 21.6129 seconds (1.5731 bytes/sec) 17/03/21 06:39:31 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ 1,2017-03-21 13:30:02.0,some data sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --export-dir /user/root/t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: 17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Transferred 163 bytes in 21.4938 seconds (7.5836 bytes/sec) 17/03/21 06:46:07 INFO mapreduce.ExportJobBase: Exported 1 records. ~ --- | C1 | C2 | C3 | --- | 1| 2017-03-21 13:30:02.0 | some data | | 1| 2017-03-21 13:30:02.0 | some data | --- # # STEP 03 - Export with '$' (--table T$1) # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T$1 --export-dir /user/root/t1 --num-mappers 1 Output: 17/03/21 06:49:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T" t WHERE 1=0 17/03/21 06:49:28 ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist # # STEP 04 - Export with '$' (--table T\$1) # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T\$1 --export-dir /user/root/t1 --num-mappers 1 Output: 17/03/22 07:27:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "T$1" t WHERE 1=0 17/03/22 07:27:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce 17/03/22 07:27:27 INFO mapreduce.ExportJobBase: Beginning export of T$1 2017-03-22 07:27:54,215 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:72) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.sqoop.mapreduce.Auto
Re: New Sqoop Committer - Anna Szonyi
Awesome news, congrats Anna! Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Thu, Mar 16, 2017 at 2:50 PM, Attila Szabó <mau...@apache.org> wrote: > Nice job Anna! > > Very much well deserved!!! > > Congrats, > > Attila > > On Mar 16, 2017 7:47 PM, "Abraham Fine" <af...@apache.org> wrote: > >> On behalf of the Apache Sqoop PMC, I am pleased to welcome Anna Szonyi >> as a new committer to Apache Sqoop. Please join me in congratulating her >> for this accomplishment! >> >> Anna has made a number of contributions to the Sqoop test and build code >> in addition to many bug fixes. Anna has also contributed to our >> community by performing code reviews and helping users on our mailing >> list. You can see many of her contributions here: >> https://s.apache.org/PfGo >> >> We are excited too see what Anna contributes next! >> >> Thanks, >> Abraham Fine >> >
[jira] [Commented] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)
[ https://issues.apache.org/jira/browse/SQOOP-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864368#comment-15864368 ] Markus Kemper commented on SQOOP-3134: -- Linking SQOOP-2783 > Add option to configure Avro schema output file name with (import + > --as-avrodatafile) > --- > > Key: SQOOP-3134 > URL: https://issues.apache.org/jira/browse/SQOOP-3134 > Project: Sqoop > Issue Type: Improvement > Reporter: Markus Kemper > > Please consider adding an option to configure the Avro schema output file > name that is created with Sqoop (import + --as-avrodatafile), example cases > below. > {noformat} > # > # STEP 01 - Create Data > # > export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe > export MYUSER=sqoop > export MYPSWD=cloudera > sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 date, c3 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, current_date, 'some data')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > - > | c1 | c2 | c3 | > - > | 1 | 2017-02-13 | some data | > - > # > # STEP 02 - Import + --table + --as-avrodatafile > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 > --as-avrodatafile > ls -l ./* > Output: > 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in > 20.6988 seconds (19.9529 bytes/sec) > 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records. > > -rw-r--r-- 1 root root 492 Feb 13 12:14 ./t1.avsc < want option to > configure this file name > -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java > # > # STEP 03 - Import + --query + --as-avrodatafile > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir > /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile > ls -l ./* > Output: > 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in > 25.2757 seconds (17.7245 bytes/sec) > 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records. > ~ > -rw-r--r-- 1 root root 527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < > want option to configure this file name > -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3134) Add option to configure Avro schema output file name with (import + --as-avrodatafile)
Markus Kemper created SQOOP-3134: Summary: Add option to configure Avro schema output file name with (import + --as-avrodatafile) Key: SQOOP-3134 URL: https://issues.apache.org/jira/browse/SQOOP-3134 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Please consider adding an option to configure the Avro schema output file name that is created with Sqoop (import + --as-avrodatafile), example cases below. {noformat} # # STEP 01 - Create Data # export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe export MYUSER=sqoop export MYPSWD=cloudera sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" - | c1 | c2 | c3 | - | 1 | 2017-02-13 | some data | - # # STEP 02 - Import + --table + --as-avrodatafile # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile ls -l ./* Output: 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Transferred 413 bytes in 20.6988 seconds (19.9529 bytes/sec) 17/02/13 12:14:52 INFO mapreduce.ImportJobBase: Retrieved 1 records. -rw-r--r-- 1 root root 492 Feb 13 12:14 ./t1.avsc < want option to configure this file name -rw-r--r-- 1 root root 12462 Feb 13 12:14 ./t1.java # # STEP 03 - Import + --query + --as-avrodatafile # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 where \$CONDITIONS" --split-by c1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 --as-avrodatafile ls -l ./* Output: 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Transferred 448 bytes in 25.2757 seconds (17.7245 bytes/sec) 17/02/13 12:16:58 INFO mapreduce.ImportJobBase: Retrieved 1 records. ~ -rw-r--r-- 1 root root 527 Feb 13 12:16 ./AutoGeneratedSchema.avsc < want option to configure this file name -rw-r--r-- 1 root root 12590 Feb 13 12:16 ./QueryResult.java {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SQOOP-3133) Sqoop (export + --non-resilient) with SQLServer ignores Sqoop option (-Dsqoop.export.statements.per.transaction)
Markus Kemper created SQOOP-3133: Summary: Sqoop (export + --non-resilient) with SQLServer ignores Sqoop option (-Dsqoop.export.statements.per.transaction) Key: SQOOP-3133 URL: https://issues.apache.org/jira/browse/SQOOP-3133 Project: Sqoop Issue Type: Bug Components: docs Reporter: Markus Kemper Please consider updating the doc to clarify this limitation. When using Sqoop (export + --non-resilient) with SQLServer it ignores Sqoop option (-Dsqoop.export.statements.per.transaction) which means that each export task will execute within a single RDBMS transaction instead of committing on interval as it proceeds. This could be considered a desirable behavior for rolling back a failure during export however it may not be ideal for performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SQOOP-3130) Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers requested
[ https://issues.apache.org/jira/browse/SQOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854178#comment-15854178 ] Markus Kemper commented on SQOOP-3130: -- >From reviewing this issue further another possible workaround is to set the >(mapreduce.input.fileinputformat.split.minsize) to help control the count of >map tasks used when exporting Avro data files. This method is not likely to >be 100% deterministic if the volume of data is larger that the boundaries of >the "minsize" and count of map tasks requested however it is more ideal than >one map task per Avro data file. The example below demonstrates this >workaround. *Use (mapreduce.input.fileinputformat.split.minsize) without \--num-mappers* {noformat} sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_text" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(*) from t1_text" sqoop export -Dmapreduce.input.fileinputformat.split.minsize=134217728 --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT --export-dir /user/root/external/t1_text --input-fields-terminated-by ',' 17/02/06 05:56:25 INFO input.FileInputFormat: Total input paths to process : 10 17/02/06 05:56:25 INFO mapreduce.JobSubmitter: number of splits:4 <=== uses default --num-mappers value (4) 17/02/06 05:57:11 INFO mapreduce.ExportJobBase: Transferred 3.519 MB in 47.9893 seconds (75.0887 KB/sec) 17/02/06 05:57:11 INFO mapreduce.ExportJobBase: Exported 10 records. {noformat} *Use (mapreduce.input.fileinputformat.split.minsize) with \--num-mappers* {noformat} sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_text" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(*) from t1_text" sqoop export -Dmapreduce.input.fileinputformat.split.minsize=134217728 --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT --export-dir /user/root/external/t1_text --input-fields-terminated-by ',' --num-mappers 2 17/02/06 05:59:12 INFO input.FileInputFormat: Total input paths to process : 10 17/02/06 05:59:12 INFO mapreduce.JobSubmitter: number of splits:2 <=== uses requested --num-mappers value (2) 17/02/06 05:59:42 INFO mapreduce.ExportJobBase: Transferred 3.5189 MB in 32.3481 seconds (111.3925 KB/sec) 17/02/06 05:59:42 INFO mapreduce.ExportJobBase: Exported 10 records. {noformat} > Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers > requested > - > > Key: SQOOP-3130 > URL: https://issues.apache.org/jira/browse/SQOOP-3130 > Project: Sqoop > Issue Type: Bug >Reporter: Markus Kemper > > When using Sqoop (export + --export-dir + Avro files) it is not obeying > --num-mappers requested, instead it is creating a map task per Avro data > file. > One known workaround for this issue is to use the Sqoop --hcatalog options. > Please see the test case below demonstrating the issue and workaround. > *Test Case* > {noformat} > # > # STEP 01 - Create Data > # > for i in {1..10}; do d=`date +"%Y-%m-%d %H:%M:%S" --date="+$i days"`; > echo "$i,$d,row data" >> ./data.csv; done > ls -l ./*; > wc data.csv > hdfs dfs -mkdir -p /user/root/external/t1 > hdfs dfs -put ./data.csv /user/root/external/t1/data.csv > hdfs dfs -ls -R /user/root/external/t1/ > Output: > -rw-r--r-- 1 root root 3488895 Feb 1 11:20 ./data.csv > ~ > 10 30 3488895 data.csv > ~ > -rw-r--r-- 3 root root3488895 2017-02-01 11:26 > /user/root/external/t1/data.csv > # > # STEP 02 - Create RDBMS Table and Export Data > # > export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g; > export MYUSER=sqoop > export MYPSWD=cloudera > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1_text" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1_text (c1 int, c2 date, c3 varchar(10))" > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1_TEXT --export-dir /user/root/external/t1 --input-fields-terminated-by ',' > --num-mappers 1 > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select count(*) from t1_text" > Output: > 17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Transferred 3.3274 MB in > 24.8037 seconds (137.3688 KB/sec) > 17/02/01 11:33:31 INF
[jira] [Created] (SQOOP-3130) Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers requested
Markus Kemper created SQOOP-3130: Summary: Sqoop (export + --export-dir + Avro files) is not obeying --num-mappers requested Key: SQOOP-3130 URL: https://issues.apache.org/jira/browse/SQOOP-3130 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper When using Sqoop (export + --export-dir + Avro files) it is not obeying --num-mappers requested, instead it is creating a map task per Avro data file. One known workaround for this issue is to use the Sqoop --hcatalog options. Please see the test case below demonstrating the issue and workaround. *Test Case* {noformat} # # STEP 01 - Create Data # for i in {1..10}; do d=`date +"%Y-%m-%d %H:%M:%S" --date="+$i days"`; echo "$i,$d,row data" >> ./data.csv; done ls -l ./*; wc data.csv hdfs dfs -mkdir -p /user/root/external/t1 hdfs dfs -put ./data.csv /user/root/external/t1/data.csv hdfs dfs -ls -R /user/root/external/t1/ Output: -rw-r--r-- 1 root root 3488895 Feb 1 11:20 ./data.csv ~ 10 30 3488895 data.csv ~ -rw-r--r-- 3 root root3488895 2017-02-01 11:26 /user/root/external/t1/data.csv # # STEP 02 - Create RDBMS Table and Export Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/db11g; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_text" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_text (c1 int, c2 date, c3 varchar(10))" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT --export-dir /user/root/external/t1 --input-fields-terminated-by ',' --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(*) from t1_text" Output: 17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Transferred 3.3274 MB in 24.8037 seconds (137.3688 KB/sec) 17/02/01 11:33:31 INFO mapreduce.ExportJobBase: Exported 10 records. ~~ | COUNT(*) | | 10 | # # STEP 03 - Import Data as Text Creating 10 HDFS Files # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT --target-dir /user/root/external/t1_text --delete-target-dir --num-mappers 10 --split-by C1 --as-textfile hdfs dfs -ls /user/root/external/t1_text/part* Output: 17/02/01 11:38:26 INFO mapreduce.ImportJobBase: Transferred 3.518 MB in 57.0517 seconds (63.1434 KB/sec) 17/02/01 11:38:26 INFO mapreduce.ImportJobBase: Retrieved 10 records. ~ -rw-r--r-- 3 root root 358894 2017-02-01 11:38 /user/root/external/t1_text/part-m-0 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-1 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-2 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-3 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-4 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-5 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-6 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-7 -rw-r--r-- 3 root root 37 2017-02-01 11:38 /user/root/external/t1_text/part-m-8 -rw-r--r-- 3 root root 370001 2017-02-01 11:38 /user/root/external/t1_text/part-m-9 # # STEP 04 - Export 10 Text Formatted Data Using 2 Splits # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_text" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(*) from t1_text" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_TEXT --export-dir /user/root/external/t1_text --input-fields-terminated-by ',' --num-mappers 2 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select count(*) from t1_text" Output: | COUNT(*) | | 0| ~ 17/02/01 11:47:26 INFO input.FileInputFormat: Total input paths to process : 10 17/02/01 11:47:26 INFO mapreduce.JobSubmitter: number of splits:2 17/02/01 11:47:55 INFO mapreduce.ExportJobBase: Transferred 3.5189 MB in 31.7104 seconds (113.6324 KB/sec) 17/02/01 11:47:55 INFO mapreduce.ExportJobBase: Exported 10 records. ~ | COUNT(*) | | 10
[jira] [Created] (SQOOP-3129) Sqoop Metastore job command does not honor -D generic arguments
Markus Kemper created SQOOP-3129: Summary: Sqoop Metastore job command does not honor -D generic arguments Key: SQOOP-3129 URL: https://issues.apache.org/jira/browse/SQOOP-3129 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Not 100% sure that this is a defect but, it seems like one to me. The Sqoop job help suggests that generic args are supported however I am not able to get them to work, see test case below *Sqoop Help* {noformat} $ sqoop job -help Output: 17/01/17 08:13:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.3 usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [] [TOOL-ARGS]] {noformat} *Test Case* {noformat} # # STEP 01 - Setup Table and Data # export MYCONN=jdbc:mysql://mysql.cloudera.com:3306/db_coe export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" | c1 | c2 | | 1 | some data | # # STEP 02 - Import Data Baseline (default) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 yarn application -status application_1481911879790_0093 | head -6 17/01/02 07:03:46 INFO client.RMProxy: Connecting to ResourceManager at host1.cloudera.com/NN.NN.NNN.NNN:8032 Application Report : Application-Id : application_1481911879790_0093 Application-Name : t1.jar Application-Type : MAPREDUCE User : root Queue : root.users.root # # STEP 03 - Import Data (-Dmapreduce.job.queuename=MY_SQOOP_QUEUE -Dmapreduce.job.name=MY_SQOOP_JOB) # sqoop import -Dmapreduce.job.queuename=MY_SQOOP_QUEUE -Dmapreduce.job.name=MY_SQOOP_JOB --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 yarn application -status application_1481911879790_0094 | head -6 17/01/02 07:07:00 INFO client.RMProxy: Connecting to ResourceManager at host1.cloudera.com/NN.NN.NNN.NNN:8032 Application Report : Application-Id : application_1481911879790_0094 Application-Name : MY_SQOOP_JOB Application-Type : MAPREDUCE User : root Queue : root.MY_SQOOP_QUEUE # # STEP 04 - Import Data using Sqoop1 Metastore Job (default) # sqoop job --create my_sqoop_meta_job -- import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 sqoop job --list Available jobs: my_sqoop_meta_job sqoop job --exec my_sqoop_meta_job yarn application -status application_1481911879790_0095 | head -6 17/01/02 07:16:13 INFO client.RMProxy: Connecting to ResourceManager at host1.cloudera.com/NN.NN.NNN.NNN:8032 Application Report : Application-Id : application_1481911879790_0095 Application-Name : t1.jar Application-Type : MAPREDUCE User : root Queue : root.users.root # # STEP 06 - Review Sqoop Job Help # $ sqoop job -help Output: 17/01/17 08:13:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.3 usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [] [TOOL-ARGS]] Job management arguments: --create Create a new saved job --delete Delete a saved job --exec Run a saved job --help Print usage instructions --list List saved jobs --meta-connect Specify JDBC connect string for the metastore --show Show the parameters for a saved job --verbosePrint more information while working Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D
[jira] [Commented] (SQOOP-3089) Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with insufficient message
[ https://issues.apache.org/jira/browse/SQOOP-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781033#comment-15781033 ] Markus Kemper commented on SQOOP-3089: -- Linking SQOOP-3095 > Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with > insufficient message > > > Key: SQOOP-3089 > URL: https://issues.apache.org/jira/browse/SQOOP-3089 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > > The ask in this JIRA is to enhance the debug message to be more informative > about how to map the data and successfully perform the operation. Full test > case below > *Sqoop Debug (current)* > 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL > type 101 > *Sqoop Debug (requested)* > 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL > type 101, please try using --map-column-java = > {noformat} > # > # STEP 01 - Setup Table and Data > # > export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; > export MYUSER=sqoop > export MYPSWD=cloudera > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1_oracle" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1_oracle (c1 int, c2 binary_double)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1_oracle values (1, 1.1)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1_oracle" > Output: > --- > | C1 | C2 | > --- > | 1| 1.1 | > --- > # > # STEP 02 - Import Data as Parquet (reproduction) > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir > --num-mappers 1 --as-parquetfile --verbose > Output: > 16/12/21 12:16:08 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM T1_ORACLE t WHERE 1=0 > 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0] > 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C2 of type [101, 0, > 0] > 16/12/21 12:16:08 DEBUG manager.OracleManager$ConnCache: Caching released > connection for > jdbc:oracle:thin:@host-10-17-101-252.coe.cloudera.com:1521/orcl12c/sqoop > 16/12/21 12:16:08 DEBUG util.ClassLoaderStack: Restoring classloader: > java.net.FactoryURLClassLoader@55465b1f > 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL > type 101 > # > # STEP 03 - Import Data as Parquet using option (--map-column-java C2=Double) > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir > --num-mappers 1 --as-parquetfile --verbose --map-column-java C2=Double > Output: > 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Transferred 1.6406 KB in > 77.989 seconds (21.5415 bytes/sec) > 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Retrieved 1 records. > # > # STEP 04 - View Data using Parquet-tools > # > hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet > parquet-tools schema -d > hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet > parquet-tools cat --json > hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet > Output: > -rw-r--r-- 3 user1 user1612 2016-12-21 12:34 > /user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet > --- > message T1_ORACLE { > optional binary C1 (UTF8); > optional double C2; > } > creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber}) > extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop > import of > T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","double"],"default":null,"columnName":"C2","
[jira] [Commented] (SQOOP-3094) Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE
[ https://issues.apache.org/jira/browse/SQOOP-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781030#comment-15781030 ] Markus Kemper commented on SQOOP-3094: -- Linking SQOOP-3095 and SQOOP-3089 > Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE > -- > > Key: SQOOP-3094 > URL: https://issues.apache.org/jira/browse/SQOOP-3094 > Project: Sqoop > Issue Type: Improvement > Reporter: Markus Kemper > > Some users are not able to easily implement the Sqoop option > (--map-column-java, Example 1 below). > The ask here is to find a way to natively support the Oracle BINARY_DOUBLE > type in Sqoop natively without using --map-column-java (see Example 2 below) > *Example 1* > {noformat} > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 > --as-avrodatafile --map-column-java C2=String > hdfs dfs -ls /user/user1/t1/*.avro > avro-tools tojson --pretty > 'hdfs://namenode.cloudera.com/user/user1/t1/part-m-0.avro' > Output: > 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Transferred 320 bytes in > 39.1563 seconds (8.1724 bytes/sec) > 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Retrieved 1 records. > --- > -rw-r--r-- 3 user1 user1320 2016-12-27 10:28 > /user/user1/t1/part-m-0.avro > --- > { > "C1" : { > "string" : "1" > }, > "C2" : { > "string" : "1.1" > } > } > {noformat} > *Example 2* > {noformat} > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 > --as-parquetfile > Output: > 16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM T1 t WHERE 1=0 > 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0] > 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, > 0] > 16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: > java.net.FactoryURLClassLoader@55465b1f > 16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL > type 101 > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 > --as-parquetfile --direct > Output: > 16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has > been derived from: > oracleConnectionUserName = sqoop > tableStr = T1 > as: > owner : SQOOP > table : T1 > 16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: > ** > *** Using Data Connector for Oracle and Hadoop *** > ** > > 16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT > C1,C2 FROM T1 WHERE 1=0 > 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1 > 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2 > 16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: > java.net.FactoryURLClassLoader@7e087bf5 > 16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL > type 101 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-3095) Add (import + --as-parquetfile) with Oracle BINARY_DOUBLE
Markus Kemper created SQOOP-3095: Summary: Add (import + --as-parquetfile) with Oracle BINARY_DOUBLE Key: SQOOP-3095 URL: https://issues.apache.org/jira/browse/SQOOP-3095 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Some users are not able to easily implement the Sqoop option (--map-column-java, Example 1 below). The ask here is to find a way to natively support the Oracle BINARY_DOUBLE type in Sqoop natively without using --map-column-java (see Example 2 below) *Example 1* {noformat} sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 --as-parquetfile --map-column-java C2=String hdfs dfs -ls /user/user1/t1/*.parquet parquet-tools cat --json 'hdfs://namenode.cloudera.com/user/user1/t1/9c4742a1-68e2-4fd7-8264-6e84110ac409.parquet' Output: 16/12/27 10:34:05 INFO mapreduce.ImportJobBase: Transferred 1.5537 KB in 42.1369 seconds (37.7579 bytes/sec) 16/12/27 10:34:05 INFO mapreduce.ImportJobBase: Retrieved 1 records. --- -rw-r--r-- 3 user1 user1565 2016-12-27 10:34 /user/user1/t1/9c4742a1-68e2-4fd7-8264-6e84110ac409.parquet --- {"C1":"1","C2":"1.1"} {noformat} *Example 2* {noformat} sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 --as-parquetfile Output: 16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM T1 t WHERE 1=0 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0] 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0] 16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@55465b1f 16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 --as-parquetfile --direct Output: 16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has been derived from: oracleConnectionUserName = sqoop tableStr = T1 as: owner : SQOOP table : T1 16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: ** *** Using Data Connector for Oracle and Hadoop *** ** 16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT C1,C2 FROM T1 WHERE 1=0 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2 16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@7e087bf5 16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-3094) Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE
Markus Kemper created SQOOP-3094: Summary: Add (import + --as-avrodatafile) with Oracle BINARY_DOUBLE Key: SQOOP-3094 URL: https://issues.apache.org/jira/browse/SQOOP-3094 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Some users are not able to easily implement the Sqoop option (--map-column-java, Example 1 below). The ask here is to find a way to natively support the Oracle BINARY_DOUBLE type in Sqoop natively without using --map-column-java (see Example 2 below) *Example 1* {noformat} sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 --as-avrodatafile --map-column-java C2=String hdfs dfs -ls /user/user1/t1/*.avro avro-tools tojson --pretty 'hdfs://namenode.cloudera.com/user/user1/t1/part-m-0.avro' Output: 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Transferred 320 bytes in 39.1563 seconds (8.1724 bytes/sec) 16/12/27 10:28:03 INFO mapreduce.ImportJobBase: Retrieved 1 records. --- -rw-r--r-- 3 user1 user1320 2016-12-27 10:28 /user/user1/t1/part-m-0.avro --- { "C1" : { "string" : "1" }, "C2" : { "string" : "1.1" } } {noformat} *Example 2* {noformat} sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 1 --as-parquetfile Output: 16/12/27 10:05:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM T1 t WHERE 1=0 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0] 16/12/27 10:05:43 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0] 16/12/27 10:05:43 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@55465b1f 16/12/27 10:05:43 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/user1/t1 --delete-target-dir --verbose --num-mappers 2 --as-parquetfile --direct Output: 16/12/27 10:08:50 DEBUG oracle.OraOopUtilities: The Oracle table context has been derived from: oracleConnectionUserName = sqoop tableStr = T1 as: owner : SQOOP table : T1 16/12/27 10:08:50 INFO oracle.OraOopManagerFactory: ** *** Using Data Connector for Oracle and Hadoop *** ** 16/12/27 10:08:53 INFO manager.SqlManager: Executing SQL statement: SELECT C1,C2 FROM T1 WHERE 1=0 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C1 16/12/27 10:08:53 DEBUG manager.SqlManager: Found column C2 16/12/27 10:08:53 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@7e087bf5 16/12/27 10:08:53 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-3089) Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with insufficient message
Markus Kemper created SQOOP-3089: Summary: Sqoop import + --as-parquetfile + Oracle + BINARY_DOUBLE fails with insufficient message Key: SQOOP-3089 URL: https://issues.apache.org/jira/browse/SQOOP-3089 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper The ask in this JIRA is to enhance the debug message to be more informative about how to map the data and successfully perform the operation. Full test case below *Sqoop Debug (current)* 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 *Sqoop Debug (requested)* 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101, please try using --map-column-java = {noformat} # # STEP 01 - Setup Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_oracle (c1 int, c2 binary_double)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values (1, 1.1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: --- | C1 | C2 | --- | 1| 1.1 | --- # # STEP 02 - Import Data as Parquet (reproduction) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile --verbose Output: 16/12/21 12:16:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM T1_ORACLE t WHERE 1=0 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C1 of type [2, 38, 0] 16/12/21 12:16:08 DEBUG manager.SqlManager: Found column C2 of type [101, 0, 0] 16/12/21 12:16:08 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@host-10-17-101-252.coe.cloudera.com:1521/orcl12c/sqoop 16/12/21 12:16:08 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@55465b1f 16/12/21 12:16:08 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 101 # # STEP 03 - Import Data as Parquet using option (--map-column-java C2=Double) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile --verbose --map-column-java C2=Double Output: 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Transferred 1.6406 KB in 77.989 seconds (21.5415 bytes/sec) 16/12/21 12:34:36 INFO mapreduce.ImportJobBase: Retrieved 1 records. # # STEP 04 - View Data using Parquet-tools # hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet parquet-tools schema -d hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet parquet-tools cat --json hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet Output: -rw-r--r-- 3 user1 user1612 2016-12-21 12:34 /user/user1/t1_oracle_parquet/e35dab0d-899f-46e7-88b4-cf1c4d24cae9.parquet --- message T1_ORACLE { optional binary C1 (UTF8); optional double C2; } creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber}) extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop import of T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","double"],"default":null,"columnName":"C2","sqlType":"101"}],"tableName":"T1_ORACLE"} --- {"C1":"1","C2":1.1} # # STEP 05 - Import Data as Parquet using option (--map-column-java C2=String) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile --verbose --map-column-java C2=String Output: 16/12/21 12:24:36 INFO mapreduce.ImportJobBase: Transferred 1.6221 KB in 48.9326 seconds (33.9447 bytes/sec) 16/12/21 12:24:36 INFO mapreduce.ImportJobBase: Retrieved
[jira] [Created] (SQOOP-3088) Sqoop export with Parquet data failure does not contain the MapTask error
Markus Kemper created SQOOP-3088: Summary: Sqoop export with Parquet data failure does not contain the MapTask error Key: SQOOP-3088 URL: https://issues.apache.org/jira/browse/SQOOP-3088 Project: Sqoop Issue Type: Bug Components: tools Reporter: Markus Kemper *Test Case* {noformat} # # STEP 01 - Setup Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_oracle (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values (1, 'data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: - | C1 | C2 | - | 1| data | - # # STEP 02 - Import Data as Parquet # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile Output: 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 seconds (33.1478 bytes/sec) 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. # # STEP 03 - Verify Parquet Data # hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet parquet-tools schema -d hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet Output: -rw-r--r-- 3 user1 user1597 2016-12-21 07:11 /user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet --- message T1_ORACLE { optional binary C1 (UTF8); optional binary C2 (UTF8); } creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber}) extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop import of T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"} file schema: T1_ORACLE C1: OPTIONAL BINARY O:UTF8 R:0 D:1 C2: OPTIONAL BINARY O:UTF8 R:0 D:1 row group 1: RC:1 TS:85 C1: BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED C2: BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED # # STEP 04 - Export Parquet Data # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose Output: [sqoop debug] 16/12/21 07:15:06 INFO mapreduce.Job: map 0% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: map 100% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with state FAILED due to: Task failed task_1481911879790_0026_m_00 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=1 Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=32125 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=32125 Total vcore-seconds taken by all map tasks=32125 Total megabyte-seconds taken by all map tasks=32896000 16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 seconds (0 bytes/sec) 16/12/21 07:15:40 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records. 16/12/21 07:15:40 DEBUG util.ClassLoaderSt
[jira] [Updated] (SQOOP-3086) Sqoop export with Parquet data fails with error (parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file)
[ https://issues.apache.org/jira/browse/SQOOP-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3086: - Description: *Test Case* {noformat} # # STEP 01 - Setup Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_oracle (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values (1, 'data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: - | C1 | C2 | - | 1| data | - # # STEP 02 - Import Data as Parquet # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile Output: 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 seconds (33.1478 bytes/sec) 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. # # STEP 03 - Verify Parquet Data # hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet parquet-tools schema -d hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet Output: -rw-r--r-- 3 user1 user1597 2016-12-21 07:11 /user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet --- message T1_ORACLE { optional binary C1 (UTF8); optional binary C2 (UTF8); } creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber}) extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop import of T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"} file schema: T1_ORACLE C1: OPTIONAL BINARY O:UTF8 R:0 D:1 C2: OPTIONAL BINARY O:UTF8 R:0 D:1 row group 1: RC:1 TS:85 C1: BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED C2: BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED # # STEP 04 - Export Parquet Data # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose Output: [sqoop debug] 16/12/21 07:15:06 INFO mapreduce.Job: map 0% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: map 100% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with state FAILED due to: Task failed task_1481911879790_0026_m_00 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=1 Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=32125 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=32125 Total vcore-seconds taken by all map tasks=32125 Total megabyte-seconds taken by all map tasks=32896000 16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 seconds (0 bytes/sec) 16/12/21 07:15:40 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records. 16/12/21 07:15:40 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@577cfae6 16/12/21 07:15:40 ERROR tool.ExportTool: Error during export: Export job failed! [yarn debug] 2016-12-21 07:15:38,911 DEBUG [
[jira] [Created] (SQOOP-3086) Sqoop export with Parquet data fails with error (parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file)
Markus Kemper created SQOOP-3086: Summary: Sqoop export with Parquet data fails with error (parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file) Key: SQOOP-3086 URL: https://issues.apache.org/jira/browse/SQOOP-3086 Project: Sqoop Issue Type: Bug Components: tools Reporter: Markus Kemper *Test Case* {noformat} # # STEP 01 - Setup Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; export MYUSER=sqoop export MYPSWD=cloudera sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_oracle (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values (1, 'data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: - | C1 | C2 | - | 1| data | - # # STEP 02 - Import Data as Parquet # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile Output: 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 50.1693 seconds (33.1478 bytes/sec) 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records. # # STEP 03 - Verify Parquet Data # hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet parquet-tools schema -d hdfs://host-10-17-103-169.coe.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet Output: -rw-r--r-- 3 user1 user1597 2016-12-21 07:11 /user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet --- message T1_ORACLE { optional binary C1 (UTF8); optional binary C2 (UTF8); } creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber}) extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop import of T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"} file schema: T1_ORACLE C1: OPTIONAL BINARY O:UTF8 R:0 D:1 C2: OPTIONAL BINARY O:UTF8 R:0 D:1 row group 1: RC:1 TS:85 C1: BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED C2: BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED # # STEP 04 - Export Parquet Data # sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose Output: [sqoop debug] 16/12/21 07:15:06 INFO mapreduce.Job: map 0% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: map 100% reduce 0% 16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with state FAILED due to: Task failed task_1481911879790_0026_m_00 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=1 Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=32125 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=32125 Total vcore-seconds taken by all map tasks=32125 Total megabyte-seconds taken by all map tasks=32896000 16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.8304 seconds (0 bytes/sec) 16/12/21 07:15:40 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 16/12/21 07:15:40 INFO mapreduce.ExportJobBase
[jira] [Commented] (SQOOP-3078) Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE
[ https://issues.apache.org/jira/browse/SQOOP-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767115#comment-15767115 ] Markus Kemper commented on SQOOP-3078: -- Additional Test Case * This issue seems to be present with Parquet as well * Please review the test case below to make sure that it is a valid reproduction Test Case (parquet) {noformat} sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/root/t1_oracle_parquet --delete-target-dir --num-mappers 1 --as-parquetfile --map-column-java C2=Double,C3=Double --hive-import --hive-database default --hive-table t1_oracle_parquet beeline -u jdbc:hive2:// -e "use default; select * from t1_oracle_parquet;" Output: 16/12/20 13:49:06 INFO mapreduce.ImportJobBase: Transferred 2.584 KB in 52.0475 seconds (50.8382 bytes/sec) 16/12/20 13:49:06 INFO mapreduce.ImportJobBase: Retrieved 1 records. --- +---+---+---+---+--+ | t1_oracle_parquet.c1 | t1_oracle_parquet.c2 | t1_oracle_parquet.c3 | t1_oracle_parquet.c4 | +---+---+---+---+--+ | one.one | 1.1 | 1.1 | one.one | +---+---+---+---+--+ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --hcatalog-database default --hcatalog-table t1_oracle --num-mappers 1 --map-column-java C2=Double,C3=Double Output: - | C1 | C2 | C3 | C4 | - - 16/12/20 13:51:17 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:450) {noformat} > Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE > > > Key: SQOOP-3078 > URL: https://issues.apache.org/jira/browse/SQOOP-3078 > Project: Sqoop > Issue Type: Bug > Components: hive-integration >Reporter: Markus Kemper > > According to the documentation BINARY_DOUBLE should be supported thus this > seems like a possible defect. > {noformat} > # > # DOC - Type Binary Double is supported > # > http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.8.0/SqoopUserGuide.html#_supported_data_types > 25.8.2.4. Supported Data Types > The following Oracle data types are supported by the Data Connector for > Oracle and Hadoop: > BINARY_DOUBLE > # > # STEP 01 - Setup Oracle Table and Data > # > export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; > export MYUSER=sqoop > export MYPSWD=sqoop > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1_oracle" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1_oracle (c1 varchar(10), c2 binary_double, c3 binary_double, > c4 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "delete from t1_oracle" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1_oracle" > Output: > - > | C1 | C2 | C3 | C4 | > - > | one.one| 1.1 | 1.1 | one.one| > - > # > # STEP 02 - Verify Import (--target-dir) and Export (--export-dir) > # > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers > 1 --fields-terminated-by ',' > hdfs dfs -cat /user/root/t1_oracle/part* > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1_ORACLE --export-dir /user/root/t1_oracle --num-mappers 1 > --input-fields-termin
[jira] [Commented] (SQOOP-2943) Make sqoop able to import to Parquet file format in case of HDFS encryption zones are turned on
[ https://issues.apache.org/jira/browse/SQOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742649#comment-15742649 ] Markus Kemper commented on SQOOP-2943: -- Linking SQOOP-2880 > Make sqoop able to import to Parquet file format in case of HDFS encryption > zones are turned on > --- > > Key: SQOOP-2943 > URL: https://issues.apache.org/jira/browse/SQOOP-2943 > Project: Sqoop > Issue Type: Improvement >Affects Versions: 1.4.7 >Reporter: Attila Szabo > Fix For: 1.4.7 > > > If HDFS encryption zones are turned on, and the user tries to import into > Parquet format, where the target location is in a very different encryption > zone, than the zone of the /tmp/ location (typical use case for that > encrypted hive warehouse directory), even the mapper jobs are executed > successfully, and the partial results stored on the temp storage correctly, > the MergeOutputMapper class of Kite SDK dies with an HDFS related exception > ("can't be moved into an encryption zone"). > The problem does not appear in case of clear text output formats. > Please make Sqoop able to solve this scenario as well! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-3078) Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE
Markus Kemper created SQOOP-3078: Summary: Sqoop (export + --hcatalog) with Oracle BINARY_DOUBLE fails with NPE Key: SQOOP-3078 URL: https://issues.apache.org/jira/browse/SQOOP-3078 Project: Sqoop Issue Type: Bug Components: hive-integration Reporter: Markus Kemper According to the documentation BINARY_DOUBLE should be supported thus this seems like a possible defect. {noformat} # # DOC - Type Binary Double is supported # http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.8.0/SqoopUserGuide.html#_supported_data_types 25.8.2.4. Supported Data Types The following Oracle data types are supported by the Data Connector for Oracle and Hadoop: BINARY_DOUBLE # # STEP 01 - Setup Oracle Table and Data # export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c; export MYUSER=sqoop export MYPSWD=sqoop sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_oracle (c1 varchar(10), c2 binary_double, c3 binary_double, c4 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: - | C1 | C2 | C3 | C4 | - | one.one| 1.1 | 1.1 | one.one| - # # STEP 02 - Verify Import (--target-dir) and Export (--export-dir) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers 1 --fields-terminated-by ',' hdfs dfs -cat /user/root/t1_oracle/part* sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --export-dir /user/root/t1_oracle --num-mappers 1 --input-fields-terminated-by ',' sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: one.one,1.1,1.1,one.one --- - | C1 | C2 | C3 | C4 | - | one.one| 1.1 | 1.1 | one.one| | one.one| 1.1 | 1.1 | one.one| - # # STEP 03 - Verify Import (--hive-import) and Export (--export-dir) # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_oracle values ('one.one', 1.1, 1.1, 'one.one')" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --target-dir /user/root/t1_oracle --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table t1_oracle --fields-terminated-by ',' beeline -u jdbc:hive2:// -e "use default; select * from t1_oracle;" hdfs dfs -cat /user/hive/warehouse/t1_oracle/part* sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_ORACLE --export-dir /user/hive/warehouse/t1_oracle --num-mappers 1 --input-fields-terminated-by ',' sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" Output: +---+---+---+---+--+ | t1_oracle.c1 | t1_oracle.c2 | t1_oracle.c3 | t1_oracle.c4 | +---+---+---+---+--+ | one.one | 1.1 | 1.1 | one.one | +---+---+---+---+--+ --- one.one,1.1,1.1,one.one --- - | C1 | C2 | C3 | C4 | - | one.one| 1.1 | 1.1 | one.one| | one.one| 1.1 | 1.1 | one.one| - # # STEP 04 - Verify Import (--hive-import) and Export (--hcatalog) # This is the defect reproduction # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_oracle" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_oracle" sqoop export
[jira] [Created] (SQOOP-3077) Add support for (import + --hcatalog + --as-avrodatafile) with RDBMS type TIMESTAMP
Markus Kemper created SQOOP-3077: Summary: Add support for (import + --hcatalog + --as-avrodatafile) with RDBMS type TIMESTAMP Key: SQOOP-3077 URL: https://issues.apache.org/jira/browse/SQOOP-3077 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Please consider adding support for --hcatalog import and TIMESTAMPS, the Avro Specification suggest that Logical Types support TIMESTAMPS. Avro Doc: https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types {noformat} # # STEP 01 - Setup Table and Data # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_dates" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_dates (c1_int integer, c2_date date, c3_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select dbms_metadata.get_ddl('TABLE', 'T1_DATES', 'SQOOP') from dual" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_dates values (1, current_date, current_timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_dates" Output: | DBMS_METADATA.GET_DDL('TABLE','T1_DATES','SQOOP') | | CREATE TABLE "SQOOP"."T1_DATES" ("C1_INT" NUMBER(*,0), "C2_DATE" DATE, "C3_TIMESTAMP" TIMESTAMP (6) ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "SQOOP" | --- | C1_INT | C2_DATE | C3_TIMESTAMP | | 1| 2016-12-10 15:48:23.0 | 2016-12-10 15:48:23.707327 | # # STEP 02 - Import with Text Format # beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' --num-mappers 1 --map-column-hive c2_date=date,c3_timestamp=timestamp beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from t1_dates_text;" +-+--+ | createtab_stmt | +-+--+ | CREATE TABLE `t1_dates_text`( | | `c1_int` decimal(38,0), | | `c2_date` date, | | `c3_timestamp` timestamp) | | ROW FORMAT SERDE| | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat'| | OUTPUTFORMAT| | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION| | 'hdfs://nameservice1/user/hive/warehouse/t1_dates_text' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1481386391') | +-+--+ -- +---++-+--+ | t1_dates_text.c1_int | t1_dates_text.c2_date | t1_dates_text.c3_timestamp | +---++-+--+ | 1 | 2016-12-10 | 2016-12-10 15:48:23.707327 | +---++-+--+ # # STEP 03 - Import with Avro Format (default) # beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_DATES --hcatalog-database default --hcatalog-table t1_dates_avro --create-hcatalog-table --hcatalog-storage-stanza 'stored as avro' --num-mappers 1 beeline -u jdbc:hive2:// -e "use default; show create table t1_dates_avro; select * from t1_dates_avro;" +--+--+ | createtab_stmt | +
[jira] [Created] (SQOOP-3076) Modify default support with (import + --hcatalog + --as-textfile) and (DATE,TIMESTAMP)
Markus Kemper created SQOOP-3076: Summary: Modify default support with (import + --hcatalog + --as-textfile) and (DATE,TIMESTAMP) Key: SQOOP-3076 URL: https://issues.apache.org/jira/browse/SQOOP-3076 Project: Sqoop Issue Type: Improvement Reporter: Markus Kemper Please consider modifying the default behavior when RDBMS types are supported in Hive as seen in the test case below for (DATE and TIMESTAMP) Test Case {noformat} sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_dates" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_dates (c1_int integer, c2_date date, c3_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select dbms_metadata.get_ddl('TABLE', 'T1_DATES', 'SQOOP') from dual" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_dates values (1, current_date, current_timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_dates" Output: | DBMS_METADATA.GET_DDL('TABLE','T1_DATES','SQOOP') | | CREATE TABLE "SQOOP"."T1_DATES" ("C1_INT" NUMBER(*,0), "C2_DATE" DATE, "C3_TIMESTAMP" TIMESTAMP (6) ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "SQOOP" | --- | C1_INT | C2_DATE | C3_TIMESTAMP | | 1| 2016-12-10 15:48:23.0 | 2016-12-10 15:48:23.707327 | # # STEP 02 - Demonstrate Current Default Behavior # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' --num-mappers 1 beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from t1_dates_text;" +---++--+--+ | col_name| data_type| comment | +---++--+--+ | c1_int| decimal(38,0) | | | c2_date | string | | | c3_timestamp | string | | +---++--+--+ --- +---++-+--+ | t1_dates_text.c1_int | t1_dates_text.c2_date | t1_dates_text.c3_timestamp | +---++-+--+ | 1 | 2016-12-10 15:48:23.0 | 2016-12-10 15:48:23.707327 | +---++-+--+ # # STEP 03 - Demonstrate Suggested Default Behavior # beeline -u jdbc:hive2:// -e "use default; drop table t1_dates_text;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1_DATES --hcatalog-database default --hcatalog-table t1_dates_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' --num-mappers 1 --map-column-hive c2_date=date,c3_timestamp=timestamp beeline -u jdbc:hive2:// -e "use default; describe t1_dates_text; select * from t1_dates_text;" +---++--+--+ | col_name| data_type| comment | +---++--+--+ | c1_int| decimal(38,0) | | | c2_date | date | | | c3_timestamp | timestamp | | +---++--+--+ -- +---++-+--+ | t1_dates_text.c1_int | t1_dates_text.c2_date | t1_dates_text.c3_timestamp | +---++-+--+ | 1 | 2016-12-10 | 2016-12-10 15:48:23.707327 | +---++-+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)
[ https://issues.apache.org/jira/browse/SQOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721860#comment-15721860 ] Markus Kemper commented on SQOOP-3068: -- Created as a result of investigations in SQOOP-3047 > Enhance error (tool.ImportTool: Encountered IOException running import job: > java.io.IOException: Expected schema) to suggest workaround > (--map-column-java) > --- > > Key: SQOOP-3068 > URL: https://issues.apache.org/jira/browse/SQOOP-3068 > Project: Sqoop > Issue Type: Improvement > Components: hive-integration >Affects Versions: 1.4.6 >Reporter: Markus Kemper > > Please consider enhancing the error to include more detail and suggest > workaround (--map-columns-java). > Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch > with the json schema that Hive produces vs. the json schema that Sqoop > generates. The test case below demonstrates how to reproduce the issue as > well as workaround it. > SETUP (create parquet table with Sqoop import and Beeline CTAS) > {noformat} > STEP 01 - Create MySQL Tables > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c_int int, c_date date, c_timestamp timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "describe t1" > - > | Field| Type | Null | Key | Default > | Extra| > - > | c_int| int(11) | YES | | (null) >| | > | c_date | date | YES | | (null) >| | > | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP >| on update CURRENT_TIMESTAMP | > - > STEP 02 : Insert and Select Row > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, current_date(), current_timestamp())" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > -- > | c_int | c_date | c_timestamp | > -- > | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | > -- > STEP 03 : Create Hive Tables > beeline -u jdbc:hive2:// -e "use default; drop table t1" > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username > $MYUSER --password $MYPSWD --table t1 --hcatalog-database default > --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza > 'stored as parquet' --num-mappers 1 > beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet > as select * from t1_text;show create table t1;" > ++--+ > | createtab_stmt | > ++--+ > | CREATE TABLE `t1`( | > | `c_int` int, | > | `c_date` string, | > | `c_timestamp` string)| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | > | LOCATION | > | 'hdfs://:8020/user/hive/warehouse/t1' | > | TBLPROPERTIES (| > | 'COLUMN_STATS_ACCURATE'='true', | > | 'numFiles'='1', | > | 'numRows'='2', | > | 'rawDataSize'='6',
[jira] [Created] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)
Markus Kemper created SQOOP-3068: Summary: Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java) Key: SQOOP-3068 URL: https://issues.apache.org/jira/browse/SQOOP-3068 Project: Sqoop Issue Type: Improvement Components: hive-integration Affects Versions: 1.4.6 Reporter: Markus Kemper Please consider enhancing the error to include more detail and suggest workaround (--map-columns-java). Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch with the json schema that Hive produces vs. the json schema that Sqoop generates. The test case below demonstrates how to reproduce the issue as well as workaround it. SETUP (create parquet table with Sqoop import and Beeline CTAS) {noformat} STEP 01 - Create MySQL Tables sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1" - | Field| Type | Null | Key | Default | Extra| - | c_int| int(11) | YES | | (null) | | | c_date | date | YES | | (null) | | | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | - STEP 02 : Insert and Select Row sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" -- | c_int | c_date | c_timestamp | -- | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | -- STEP 03 : Create Hive Tables beeline -u jdbc:hive2:// -e "use default; drop table t1" sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1 beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as select * from t1_text;show create table t1;" ++--+ | createtab_stmt | ++--+ | CREATE TABLE `t1`( | | `c_int` int, | | `c_date` string, | | `c_timestamp` string)| | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | | LOCATION | | 'hdfs://:8020/user/hive/warehouse/t1' | | TBLPROPERTIES (| | 'COLUMN_STATS_ACCURATE'='true', | | 'numFiles'='1', | | 'numRows'='2', | | 'rawDataSize'='6', | | 'totalSize'='605', | | 'transient_lastDdlTime'='1478298298')| ++--+ {noformat} REPRODUCE ISSUE (import --hive-import append and overwrite) {noformat} STEP 01: Attempt --hive-import --append sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile --num-mappers 1 --append 16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appen
[jira] [Created] (SQOOP-3058) Sqoop import with Netezza --direct fails properly but also produces NPE
Markus Kemper created SQOOP-3058: Summary: Sqoop import with Netezza --direct fails properly but also produces NPE Key: SQOOP-3058 URL: https://issues.apache.org/jira/browse/SQOOP-3058 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper The [error] is expected however the [npe] seems like a defect, see [test case] below [error] ERROR: relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1 [npe] 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException [test case] {noformat} # # STEP 01 - Setup Netezza Table and Data # sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "DROP TABLE SQOOP_SME1.T1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "CREATE TABLE SQOOP_SME1.T1 (C1 INTEGER)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "INSERT INTO SQOOP_SME1.T1 VALUES (1)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "SELECT C1 FROM SQOOP_SME1.T1" # # STEP 02 - Test Import and Export (baseline) # sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T1" --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "DELETE FROM SQOOP_SME1.T1" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T1" --export-dir /user/root/t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "SELECT C1 FROM SQOOP_SME1.T1" --- | C1 | --- | 1 | --- sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "DELETE FROM SQOOP_SME1.T1" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table "T1" --export-dir /user/root/t1 --num-mappers 1 --direct sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "SELECT C1 FROM SQOOP_SME1.T1" --- | C1 | --- | 1 | --- # # STEP 03 - Test Import and Export (with SCHEMA in --table option AND --direct) # /* Notes: This failure seems correct however the NPE after the failure seems like a defect */ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "DELETE FROM SQOOP_SME1.T1" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table "SQOOP_SME1.T1" --export-dir /user/root/t1 --num-mappers 1 --direct 16/11/18 09:19:44 ERROR manager.SqlManager: Error executing statement: org.netezza.error.NzSQLException: ERROR: relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1 org.netezza.error.NzSQLException: ERROR: relation does not exist SQOOP_SME_DB.SQOOP_SME1.SQOOP_SME1.T1 at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:280) at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:76) at org.netezza.sql.NzConnection.execute(NzConnection.java:2869) at org.netezza.sql.NzPreparedStatament._execute(NzPreparedStatament.java:1126) at org.netezza.sql.NzPreparedStatament.prepare(NzPreparedStatament.java:1143) at org.netezza.sql.NzPreparedStatament.(NzPreparedStatament.java:89) at org.netezza.sql.NzConnection.prepareStatement(NzConnection.java:1589) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnNamesForRawQuery(SqlManager.java:151) at org.apache.sqoop.manager.SqlManager.getColumnNames(SqlManager.java:116) at org.apache.sqoop.mapreduce.netezza.NetezzaExternalTableExportJob.configureOutputFormat(NetezzaExternalTableExportJob.java:128) at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433) at org.apache.sqoop.manager.DirectNetezzaManager.exportTable(DirectNetezzaManager.java:209) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) 16/11/18 09:19:44 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPo
[jira] [Assigned] (SQOOP-2905) Improve documentation for Oracle connector
[ https://issues.apache.org/jira/browse/SQOOP-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper reassigned SQOOP-2905: Assignee: Markus Kemper > Improve documentation for Oracle connector > -- > > Key: SQOOP-2905 > URL: https://issues.apache.org/jira/browse/SQOOP-2905 > Project: Sqoop > Issue Type: Task > Components: connectors/oracle >Affects Versions: 1.4.6 >Reporter: Miklos Szurap > Assignee: Markus Kemper > > In the Sqoop user guide [1] there is a section for how to handle > "java.net.SocketException: Connection reset" errors. > This needs to be improved because the Sqoop client itself can suffer from > this error message, not only the mappers. > The solution is to specify a system property to the sqoop client too, thus > the sqoop command will look like (for example to a sqoop import): > sqoop import -Djava.security.egd="file:///dev/../dev/urandom" -D > mapreduce.map.java.opts="-Djava.security.egd=file:///dev/../dev/urandom" > > [1] > https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oracle_connection_reset_errors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-3036) Update Doc (import + --hbase-bulkload) should fail fast when not using (--as-textfile), see SQOOP-3034
[ https://issues.apache.org/jira/browse/SQOOP-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3036: - Summary: Update Doc (import + --hbase-bulkload) should fail fast when not using (--as-textfile), see SQOOP-3034 (was: Documentation update needed: importing into HBase ignores any format option other than as-textfile) > Update Doc (import + --hbase-bulkload) should fail fast when not using > (--as-textfile), see SQOOP-3034 > -- > > Key: SQOOP-3036 > URL: https://issues.apache.org/jira/browse/SQOOP-3036 > Project: Sqoop > Issue Type: Task > Components: docs >Reporter: Boglarka Egyed > Assignee: Markus Kemper >Priority: Minor > > Sqoop import into HBase ignoes any format option other than as-textfile > regarding the current implementation. This shall be marked in the > documentation too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SQOOP-3036) Documentation update needed: importing into HBase ignores any format option other than as-textfile
[ https://issues.apache.org/jira/browse/SQOOP-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper reassigned SQOOP-3036: Assignee: Markus Kemper (was: Boglarka Egyed) > Documentation update needed: importing into HBase ignores any format option > other than as-textfile > -- > > Key: SQOOP-3036 > URL: https://issues.apache.org/jira/browse/SQOOP-3036 > Project: Sqoop > Issue Type: Task > Components: docs >Reporter: Boglarka Egyed > Assignee: Markus Kemper >Priority: Minor > > Sqoop import into HBase ignoes any format option other than as-textfile > regarding the current implementation. This shall be marked in the > documentation too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3047) Add support for (import + --hive-import + --as-parquet) when Parquet table already exists
[ https://issues.apache.org/jira/browse/SQOOP-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637951#comment-15637951 ] Markus Kemper commented on SQOOP-3047: -- Linking SQOOP-3046 > Add support for (import + --hive-import + --as-parquet) when Parquet table > already exists > - > > Key: SQOOP-3047 > URL: https://issues.apache.org/jira/browse/SQOOP-3047 > Project: Sqoop > Issue Type: Improvement > Components: hive-integration > Reporter: Markus Kemper > > When a parquet table already exists (import + --hive-import + > --as-parquetfile) will fail due to a mismatch with the json schema that Hive > produces vs. the json schema that Sqoop generates. The test case below > demonstrates how to reproduce the issue as well as workaround it. > SETUP (create parquet table with Sqoop import and Beeline CTAS) > {noformat} > STEP 01 - Create MySQL Tables > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c_int int, c_date date, c_timestamp timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "describe t1" > - > | Field| Type | Null | Key | Default > | Extra| > - > | c_int| int(11) | YES | | (null) >| | > | c_date | date | YES | | (null) >| | > | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP >| on update CURRENT_TIMESTAMP | > - > STEP 02 : Insert and Select Row > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, current_date(), current_timestamp())" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > -- > | c_int | c_date | c_timestamp | > -- > | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | > -- > STEP 03 : Create Hive Tables > beeline -u jdbc:hive2:// -e "use default; drop table t1" > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username > $MYUSER --password $MYPSWD --table t1 --hcatalog-database default > --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza > 'stored as parquet' --num-mappers 1 > beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet > as select * from t1_text;show create table t1;" > ++--+ > | createtab_stmt | > ++--+ > | CREATE TABLE `t1`( | > | `c_int` int, | > | `c_date` string, | > | `c_timestamp` string)| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | > | LOCATION | > | 'hdfs://:8020/user/hive/warehouse/t1' | > | TBLPROPERTIES (| > | 'COLUMN_STATS_ACCURATE'='true', | > | 'numFiles'='1', | > | 'numRows'='2', | > | 'rawDataSize'='6', | > | 'totalSize'='605', | > | 'transient_lastDdlTime'='1478298298')| > ++--+ > {noformat} > REPRODUCE ISSUE (import --hive-import append and overwrite)
[jira] [Created] (SQOOP-3047) Add support for (import + --hive-import + --as-parquet) when Parquet table already exists
Markus Kemper created SQOOP-3047: Summary: Add support for (import + --hive-import + --as-parquet) when Parquet table already exists Key: SQOOP-3047 URL: https://issues.apache.org/jira/browse/SQOOP-3047 Project: Sqoop Issue Type: Improvement Components: hive-integration Reporter: Markus Kemper When a parquet table already exists (import + --hive-import + --as-parquetfile) will fail due to a mismatch with the json schema that Hive produces vs. the json schema that Sqoop generates. The test case below demonstrates how to reproduce the issue as well as workaround it. SETUP (create parquet table with Sqoop import and Beeline CTAS) {noformat} STEP 01 - Create MySQL Tables sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1" - | Field| Type | Null | Key | Default | Extra| - | c_int| int(11) | YES | | (null) | | | c_date | date | YES | | (null) | | | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | - STEP 02 : Insert and Select Row sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" -- | c_int | c_date | c_timestamp | -- | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | -- STEP 03 : Create Hive Tables beeline -u jdbc:hive2:// -e "use default; drop table t1" sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1 beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as select * from t1_text;show create table t1;" ++--+ | createtab_stmt | ++--+ | CREATE TABLE `t1`( | | `c_int` int, | | `c_date` string, | | `c_timestamp` string)| | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | | LOCATION | | 'hdfs://:8020/user/hive/warehouse/t1' | | TBLPROPERTIES (| | 'COLUMN_STATS_ACCURATE'='true', | | 'numFiles'='1', | | 'numRows'='2', | | 'rawDataSize'='6', | | 'totalSize'='605', | | 'transient_lastDdlTime'='1478298298')| ++--+ {noformat} REPRODUCE ISSUE (import --hive-import append and overwrite) {noformat} STEP 01: Attempt --hive-import --append sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile --num-mappers 1 --append 16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appending. 16/11/02 20:54:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema: {"type":"record"
[jira] [Commented] (SQOOP-3046) Add support for (import + --hcatalog* + --as-parquetfile)
[ https://issues.apache.org/jira/browse/SQOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637836#comment-15637836 ] Markus Kemper commented on SQOOP-3046: -- Linking SQOOP-3010 in the event that this capability is not technically possible. > Add support for (import + --hcatalog* + --as-parquetfile) > -- > > Key: SQOOP-3046 > URL: https://issues.apache.org/jira/browse/SQOOP-3046 > Project: Sqoop > Issue Type: Improvement > Components: hive-integration > Reporter: Markus Kemper > > This is a request to identify a way to support Sqoop import with --hcatalog > options when writing Parquet data files. The test case below demonstrates > the issue. > CODE SNIP > {noformat} > ../MapredParquetOutputFormat.java > 69 @Override > 70 public RecordWriter<Void, ParquetHiveRecord> getRecordWriter( > 71 final FileSystem ignored, > 72 final JobConf job, > 73 final String name, > 74 final Progressable progress > 75 ) throws IOException { > 76throw new RuntimeException("Should never be used"); > 77 } > {noformat} > TEST CASE: > {noformat} > STEP 01 - Create MySQL Tables > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c_int int, c_date date, c_timestamp timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "describe t1" > - > | Field| Type | Null | Key | Default > | Extra| > - > | c_int| int(11) | YES | | (null) >| | > | c_date | date | YES | | (null) >| | > | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP >| on update CURRENT_TIMESTAMP | > - > STEP 02 : Insert and Select Row > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, current_date(), current_timestamp())" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > -- > | c_int | c_date | c_timestamp | > -- > | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | > -- > beeline -u jdbc:hive2:// -e "use default; drop table t1" > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username > $MYUSER --password $MYPSWD --table t1 --hcatalog-database default > --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored > as parquet' --num-mappers 1 > [sqoop console debug] > 16/11/02 20:25:15 INFO mapreduce.Job: Task Id : > attempt_1478089149450_0046_m_00_0, Status : FAILED > Error: java.lang.RuntimeException: Should never be used > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76) > at > org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102) > at > org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > [yarn maptask debug] > 2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: 1=1 AND 1=1 > 2016-11-02 20:25:15,583 DEBUG [main] > org.apache
[jira] [Created] (SQOOP-3046) Add support for (import + --hcatalog* + --as-parquetfile)
Markus Kemper created SQOOP-3046: Summary: Add support for (import + --hcatalog* + --as-parquetfile) Key: SQOOP-3046 URL: https://issues.apache.org/jira/browse/SQOOP-3046 Project: Sqoop Issue Type: Improvement Components: hive-integration Reporter: Markus Kemper This is a request to identify a way to support Sqoop import with --hcatalog options when writing Parquet data files. The test case below demonstrates the issue. CODE SNIP {noformat} ../MapredParquetOutputFormat.java 69 @Override 70 public RecordWriter<Void, ParquetHiveRecord> getRecordWriter( 71 final FileSystem ignored, 72 final JobConf job, 73 final String name, 74 final Progressable progress 75 ) throws IOException { 76throw new RuntimeException("Should never be used"); 77 } {noformat} TEST CASE: {noformat} STEP 01 - Create MySQL Tables sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1" - | Field| Type | Null | Key | Default | Extra| - | c_int| int(11) | YES | | (null) | | | c_date | date | YES | | (null) | | | c_timestamp | timestamp| NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | - STEP 02 : Insert and Select Row sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" -- | c_int | c_date | c_timestamp | -- | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | -- beeline -u jdbc:hive2:// -e "use default; drop table t1" sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1 [sqoop console debug] 16/11/02 20:25:15 INFO mapreduce.Job: Task Id : attempt_1478089149450_0046_m_00_0, Status : FAILED Error: java.lang.RuntimeException: Should never be used at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76) at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) [yarn maptask debug] 2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: 1=1 AND 1=1 2016-11-02 20:25:15,583 DEBUG [main] org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record reader for db product: MYSQL 2016-11-02 20:25:15,613 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1 2016-11-02 20:25:15,614 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 2016-11-02 20:25:15,620 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild:
[jira] [Updated] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
[ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3030: - Description: It appears that if the case of the columns in the Oracle RDBMS are not used with the Sqoop --columns option with (export + --hcatalog + --direct) the Sqoop map task will fail with an NPE where the non-direct path fails correct. My test case is below, please review and let me know if you have any questions. Additionally it would be nice if we could detect this column case mis-match during compile time and not submit the job only to have it fail in YARN. {noformat} TEST CASE: STEP 01 : Create Oracle Source/Target Table sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" - | C1 | C2 | - | 1| some data | - STEP 02 : Import Oracle Table using Sqoop --hcatalog options beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" Output: 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2] 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info map : c1 : [Type : 2,Precision : 38,Scale : 0] c2 : [Type : 12,Precision : 10,Scale : 0] 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.t1_text for import 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: create table `default`.`t1_text` ( `c1` decimal(38), `c2` varchar(10)) stored as textfile 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 34.4222 seconds (0.3486 bytes/sec) 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. --- +-+-+--+ | t1_text.c1 | t1_text.c2 | +-+-+--+ | 1 | some data | +-+-+--+ STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "c1,c2" Output: (failure = correct) 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2] 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info map : c1 : [Type : 2,Precision : 38,Scale : 0] c2 : [Type : 12,Precision : 10,Scale : 0] 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [c1, c2] 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key fields = [] 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema fields = [c1, c2] 16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records. 16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job failed! --- 2016-10-18 09:14:19,418 INFO [main] org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties {transient_lastDdlTime=1476807097, name=default.t1_text, serialization.null.format=\N, columns=c1,c2, serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, serialization.format=1, columns.types=decimal(38,0),varchar(10)} 2016-10-18 09:14:19,660 INFO [Thread-12] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2016-10-18 09:14:19,952 ERROR [Thread-11] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options and Oracle --direct sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "c1,c2" --direct
[jira] [Commented] (SQOOP-3038) Sqoop export using --hcatalog with RDBMS reserved word column name results in "null" value
[ https://issues.apache.org/jira/browse/SQOOP-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608476#comment-15608476 ] Markus Kemper commented on SQOOP-3038: -- Workaround: 1. Use (export --export-dir) instead of (export --hcatalog) 2. Use non-reserved word as column name > Sqoop export using --hcatalog with RDBMS reserved word column name results in > "null" value > -- > > Key: SQOOP-3038 > URL: https://issues.apache.org/jira/browse/SQOOP-3038 > Project: Sqoop > Issue Type: Bug > Components: hive-integration >Reporter: Markus Kemper > > --- > | c1 | value | <=== RDBMS reserved word "value" > --- > | 1 | (null) | <=== null? > --- > TEST CASE: > STEP 01 - Create MySQL Tables > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 varchar(5))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, 'one')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "show create table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1_value" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1_value (c1 int, \`value\` varchar(5))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1_value select * from t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "show create table t1_value" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1_value" > Output: > --- > | Table| Create Table | > --- > | t1 | CREATE TABLE `t1` ( > `c1` int(11) DEFAULT NULL, > `c2` varchar(5) DEFAULT NULL > ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | > --- > --- > | c1 | c2| > --- > | 1 | one | > --- > --- > | Table| Create Table | > --- > | t1_value | CREATE TABLE `t1_value` ( > `c1` int(11) DEFAULT NULL, > `value` varchar(5) DEFAULT NULL > ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | > --- > --- > | c1 | value | > --- > | 1 | one | > --- > STEP 02 - Verify Import/Export Using --target-dir and --export-dir > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "delete from t1" > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > t1 --export-dir /user/root/t1 --num-mappers 1 > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > t1_value --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "delete from t1_value" > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > t1_value --export-dir /user/root/t1 --num-mappers 1 > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1_value" > Output: > --- > | c1 | c2| > --- > | 1 | one | > --- > --- > | c1 | value | > --- > | 1 | one | > --- > STEP 03 - Verify import with --hive-import and export with --hcatalog > beeline -u jdbc:hive2:// -e "use default; drop tabl
[jira] [Created] (SQOOP-3038) Sqoop export using --hcatalog with RDBMS reserved word column name results in "null" value
Markus Kemper created SQOOP-3038: Summary: Sqoop export using --hcatalog with RDBMS reserved word column name results in "null" value Key: SQOOP-3038 URL: https://issues.apache.org/jira/browse/SQOOP-3038 Project: Sqoop Issue Type: Bug Components: hive-integration Reporter: Markus Kemper --- | c1 | value | <=== RDBMS reserved word "value" --- | 1 | (null) | <=== null? --- TEST CASE: STEP 01 - Create MySQL Tables sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 varchar(5))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, 'one')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "show create table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1_value" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1_value (c1 int, \`value\` varchar(5))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1_value select * from t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "show create table t1_value" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_value" Output: --- | Table| Create Table | --- | t1 | CREATE TABLE `t1` ( `c1` int(11) DEFAULT NULL, `c2` varchar(5) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | --- --- | c1 | c2| --- | 1 | one | --- --- | Table| Create Table | --- | t1_value | CREATE TABLE `t1_value` ( `c1` int(11) DEFAULT NULL, `value` varchar(5) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | --- --- | c1 | value | --- | 1 | one | --- STEP 02 - Verify Import/Export Using --target-dir and --export-dir sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --export-dir /user/root/t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_value --target-dir /user/root/t1 --delete-target-dir --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1_value" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1_value --export-dir /user/root/t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1_value" Output: --- | c1 | c2| --- | 1 | one | --- --- | c1 | value | --- | 1 | one | --- STEP 03 - Verify import with --hive-import and export with --hcatalog beeline -u jdbc:hive2:// -e "use default; drop table t1;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/root/t1 --delete-target-dir --hive-import --hive-database default --hive-table t1 --num-mappers 1 beeline -u jdbc:hive2:// -e "use default; select * from t1;" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "delete from t1" sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1 --num-mappers 1 sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: +++--+ | t1.c1 | t1.c2 | +++--+ | 1 | one| +++--+ --- | c1 | c2|
[jira] [Comment Edited] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589895#comment-15589895 ] Markus Kemper edited comment on SQOOP-3022 at 10/19/16 9:25 PM: Additional Clarification: With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning: If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d you need to specify (-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory). Also to enable/disable OraOop using (-Doraoop.disabled=true|false). With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop jar, meaning: All references (e.g. managers.d, oraoop.jar) of a previous manual install should be removed and the only required option to enable OraOop is to use the --direct option. We do however recommend also deploying the Sqoop1 Client Gateway on any host where the Sqoop1 CLI command is executed. Any global OraOop configurations can also be set in the Gateway's sqoop-site.xml safety valve. In the end we resolved [~Tagar]'s issue by doing the following 1. Confirmed Sqoop1 Client Gateway was deployed 2. Confirmed OraOop managers.d file was not present 3. Confirmed that oraoop.jar was not present 4. Removed options: 4.1. -Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory 4.2. -Doraoop.disabled=false 5. Added options: 5.1. --direct Hope this helps. [~markuskem...@me.com] was (Author: markuskem...@me.com): Additional Clarification: With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning: If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d you need to specify (-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory). Also to enable/disable OraOop using (-Doraoop.disabled=true|false). With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop jar, meaning: All references (e.g. managers.d, oraoop.jar) of a previous manual install should be removed and the only required option to enable OraOop is to use the --direct option. We do however recommend also deploying the Sqoop1 Client Gateway on any host where the Sqoop1 CLI command is executed. Any global OraOop configurations can also be set in the Gateway's sqoop-site.xml safety valve. Hope this helps. [~markuskem...@me.com] > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3022) sqoop export for Oracle generates tremendous amounts of redo logs
[ https://issues.apache.org/jira/browse/SQOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589895#comment-15589895 ] Markus Kemper commented on SQOOP-3022: -- Additional Clarification: With OraOop and CDH <5.3.0 you need to manually install OraOop, meaning: If you do not create the Driver Manager for OraOop in /var/lib/sqoop/managers.d you need to specify (-Dsqoop.connection.factories=org.apache.sqoop.manager.oracle.OraOopManagerFactory). Also to enable/disable OraOop using (-Doraoop.disabled=true|false). With OraOop and CDH >=5.3.0 the OraOop classes are included in the CDH Sqoop jar, meaning: All references (e.g. managers.d, oraoop.jar) of a previous manual install should be removed and the only required option to enable OraOop is to use the --direct option. We do however recommend also deploying the Sqoop1 Client Gateway on any host where the Sqoop1 CLI command is executed. Any global OraOop configurations can also be set in the Gateway's sqoop-site.xml safety valve. Hope this helps. [~markuskem...@me.com] > sqoop export for Oracle generates tremendous amounts of redo logs > - > > Key: SQOOP-3022 > URL: https://issues.apache.org/jira/browse/SQOOP-3022 > Project: Sqoop > Issue Type: Bug > Components: codegen, connectors, connectors/oracle >Affects Versions: 1.4.3, 1.4.4, 1.4.5, 1.4.6 >Reporter: Ruslan Dautkhanov > Labels: export, oracle > > Sqoop export for Oracle generates tremendous amounts of redo logs (comparable > to export size or more). > We have put target tables in nologgin mode, but Oracle will still generate > redo logs unless +APPEND Oracle insert hint is used. > See https://oracle-base.com/articles/misc/append-hint for examples. > Please add an option for sqoop to generate insert statements in Oracle with > APPEND statement. Our databases are swamped with redo/archived logs whenever > we sqoop data to them. This is easily avoidable. And from business > prospective sqooping to staging tables in nologgin mode is totally fine. > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-3029) Add an option for uppercase/lowercase column name mapping between HCatalog and RDBMS cloumn name list
[ https://issues.apache.org/jira/browse/SQOOP-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586798#comment-15586798 ] Markus Kemper commented on SQOOP-3029: -- Hey [~maugli], I am not sure I follow you completely here. Are you suggesting matching lower or upper or case-sensitive? Thanks, [~markuskem...@me.com] > Add an option for uppercase/lowercase column name mapping between HCatalog > and RDBMS cloumn name list > - > > Key: SQOOP-3029 > URL: https://issues.apache.org/jira/browse/SQOOP-3029 > Project: Sqoop > Issue Type: Improvement >Reporter: Attila Szabo > Fix For: 1.4.7 > > > The current implementation of HCatalog related export works out of the box > with RDBMS systems (e.g. Oracle) if the column names match case sensitive, > although in several cases the column names are received as an UPPER_CASE list > from the RDBMS, and thus the export fails, within the ClassWriter. > Although the users have the possibility to define the name of the columns > explicit with the --columns option, in case of 100+ columns it's not > practical. > It would be great to have an option which could do the matching in lower and > UPPER case as well. The start point for this implementation should start from > SqoopHCatUtilities#configureHCat method with respect to ensure HCatalog api > is able to handle the different cased column names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
[ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-3030: - Comment: was deleted (was: You are welcome [~Tagar]. [~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, primarily because the use case required export of Parquet files which unfortunately --export-dir does not support due to a KiteSDK limitation. Thanks, [~markuskem...@me.com] ) > Export to Oracle using (--direct + --hcatalog + --columns) with columns > having the wrong case sensitivity fails with NPE > > > Key: SQOOP-3030 > URL: https://issues.apache.org/jira/browse/SQOOP-3030 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle, hive-integration > Reporter: Markus Kemper > > It appears that if the case of the columns in the Oracle RDBMS are not used > with the Sqoop --columns option with (export + --hcatalog + --direct) the > Sqoop map task will fail with an NPE where the non-direct path fails correct. > My test case is below, please review and let me know if you have any > questions. > Additionally it would be nice if we could detect this column case mis-match > during compile time and not submit the job only to have it fail in YARN. > TEST CASE: > STEP 01 : Create Oracle Source/Target Table > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, 'some data')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > - > | C1 | C2 | > - > | 1| some data | > - > STEP 02 : Import Oracle Table using Sqoop --hcatalog options > beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text > --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' > beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" > Output: > > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table > default.t1_text for import > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table > statement: > create table `default`.`t1_text` ( > `c1` decimal(38), > `c2` varchar(10)) > stored as textfile > > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in > 34.4222 seconds (0.3486 bytes/sec) > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. > --- > +-+-+--+ > | t1_text.c1 | t1_text.c2 | > +-+-+--+ > | 1 | some data | > +-+-+--+ > STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns > options > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "c1,c2" > Output: (failure = correct) > > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > > 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [c1, c2] > 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning > key fields = [] > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema > field
[jira] [Commented] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
[ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586786#comment-15586786 ] Markus Kemper commented on SQOOP-3030: -- You are welcome [~Tagar]. [~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, primarily because the use case required export of Parquet files which unfortunately --export-dir does not support due to a KiteSDK limitation. Thanks, [~markuskem...@me.com] > Export to Oracle using (--direct + --hcatalog + --columns) with columns > having the wrong case sensitivity fails with NPE > > > Key: SQOOP-3030 > URL: https://issues.apache.org/jira/browse/SQOOP-3030 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle, hive-integration > Reporter: Markus Kemper > > It appears that if the case of the columns in the Oracle RDBMS are not used > with the Sqoop --columns option with (export + --hcatalog + --direct) the > Sqoop map task will fail with an NPE where the non-direct path fails correct. > My test case is below, please review and let me know if you have any > questions. > Additionally it would be nice if we could detect this column case mis-match > during compile time and not submit the job only to have it fail in YARN. > TEST CASE: > STEP 01 : Create Oracle Source/Target Table > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, 'some data')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > - > | C1 | C2 | > - > | 1| some data | > - > STEP 02 : Import Oracle Table using Sqoop --hcatalog options > beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text > --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' > beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" > Output: > > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table > default.t1_text for import > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table > statement: > create table `default`.`t1_text` ( > `c1` decimal(38), > `c2` varchar(10)) > stored as textfile > > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in > 34.4222 seconds (0.3486 bytes/sec) > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. > --- > +-+-+--+ > | t1_text.c1 | t1_text.c2 | > +-+-+--+ > | 1 | some data | > +-+-+--+ > STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns > options > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "c1,c2" > Output: (failure = correct) > > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > > 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [c1, c2] > 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning > key fields = [] > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema > field
[jira] [Commented] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
[ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586785#comment-15586785 ] Markus Kemper commented on SQOOP-3030: -- You are welcome [~Tagar]. [~maugli] I have not tried to reproduce the issue outside of --hcatalog yet, primarily because the use case required export of Parquet files which unfortunately --export-dir does not support due to a KiteSDK limitation. Thanks, [~markuskem...@me.com] > Export to Oracle using (--direct + --hcatalog + --columns) with columns > having the wrong case sensitivity fails with NPE > > > Key: SQOOP-3030 > URL: https://issues.apache.org/jira/browse/SQOOP-3030 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle, hive-integration > Reporter: Markus Kemper > > It appears that if the case of the columns in the Oracle RDBMS are not used > with the Sqoop --columns option with (export + --hcatalog + --direct) the > Sqoop map task will fail with an NPE where the non-direct path fails correct. > My test case is below, please review and let me know if you have any > questions. > Additionally it would be nice if we could detect this column case mis-match > during compile time and not submit the job only to have it fail in YARN. > TEST CASE: > STEP 01 : Create Oracle Source/Target Table > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "create table t1 (c1 int, c2 varchar(10))" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "insert into t1 values (1, 'some data')" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query > "select * from t1" > - > | C1 | C2 | > - > | 1| some data | > - > STEP 02 : Import Oracle Table using Sqoop --hcatalog options > beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" > sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text > --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' > beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" > Output: > > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table > default.t1_text for import > 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table > statement: > create table `default`.`t1_text` ( > `c1` decimal(38), > `c2` varchar(10)) > stored as textfile > > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in > 34.4222 seconds (0.3486 bytes/sec) > 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. > --- > +-+-+--+ > | t1_text.c1 | t1_text.c2 | > +-+-+--+ > | 1 | some data | > +-+-+--+ > STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns > options > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text > --columns "c1,c2" > Output: (failure = correct) > > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names > projected : [c1, c2] > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info > map : > c1 : [Type : 2,Precision : 38,Scale : 0] > c2 : [Type : 12,Precision : 10,Scale : 0] > > 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. > 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [c1, c2] > 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning > key fields = [] > 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema > field
[jira] [Created] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
Markus Kemper created SQOOP-3030: Summary: Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE Key: SQOOP-3030 URL: https://issues.apache.org/jira/browse/SQOOP-3030 Project: Sqoop Issue Type: Bug Components: connectors/oracle, hive-integration Reporter: Markus Kemper It appears that if the case of the columns in the Oracle RDBMS are not used with the Sqoop --columns option with (export + --hcatalog + --direct) the Sqoop map task will fail with an NPE where the non-direct path fails correct. My test case is below, please review and let me know if you have any questions. Additionally it would be nice if we could detect this column case mis-match during compile time and not submit the job only to have it fail in YARN. TEST CASE: STEP 01 : Create Oracle Source/Target Table sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 varchar(10))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, 'some data')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" - | C1 | C2 | - | 1| some data | - STEP 02 : Import Oracle Table using Sqoop --hcatalog options beeline -u jdbc:hive2:// -e "use default; drop table t1_text;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile' beeline -u jdbc:hive2:// -e "use default; select * from t1_text;" Output: 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2] 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info map : c1 : [Type : 2,Precision : 38,Scale : 0] c2 : [Type : 12,Precision : 10,Scale : 0] 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.t1_text for import 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: create table `default`.`t1_text` ( `c1` decimal(38), `c2` varchar(10)) stored as textfile 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 34.4222 seconds (0.3486 bytes/sec) 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records. --- +-+-+--+ | t1_text.c1 | t1_text.c2 | +-+-+--+ | 1 | some data | +-+-+--+ STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "c1,c2" Output: (failure = correct) 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2] 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info map : c1 : [Type : 2,Precision : 38,Scale : 0] c2 : [Type : 12,Precision : 10,Scale : 0] 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore. 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [c1, c2] 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key fields = [] 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema fields = [c1, c2] 16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records. 16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job failed! --- 2016-10-18 09:14:19,418 INFO [main] org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties {transient_lastDdlTime=1476807097, name=default.t1_text, serialization.null.format=\N, columns=c1,c2, serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, serialization.format=1, columns.types=decimal(38,0),varchar(10)} 2016-10-18 09:14:19,660 INFO [Thread-12] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2016-10-18 09:14:19,952 ERROR [Thread-11] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier STEP 03 : Attempt Export to Oracle using Sqo
[jira] [Commented] (SQOOP-3025) Oracle Direct export from HCatalog parquet table gives NPE in kerberized cluster
[ https://issues.apache.org/jira/browse/SQOOP-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582749#comment-15582749 ] Markus Kemper commented on SQOOP-3025: -- Hey Ruslan, Are you able to provide the following: 1. On the host executing the Sqoop CLI command what are your security settings for [hive conf] and [hadoop conf] 2. Our [sqoop test] appears to be working without error [hive conf] $ cat /etc/hive/conf/*.xml | egrep -B 1 -A 1 -i "security.authentication|hadoop.security.authorization<" hadoop.security.authentication kerberos hadoop.security.authorization true [hadoop conf] $ cat /etc/hadoop/conf/*.xml | egrep -B 1 -A 1 -i "security.authentication|hadoop.security.authorization<" hadoop.security.authentication kerberos hadoop.security.authorization true [sqoop test] $ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: --- | C1 | C2 | --- | 1| one | --- $ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/xxx/t1_text --delete-target-dir --num-mappers 1 --as-textfile --hive-import --hive-database default --hive-table t1_text --verbose $ hive -e "use default; create table t1_parquet stored as parquet as select * from t1_text;" $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_parquet --direct Output: 16/10/17 09:01:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.4-SNAPSHOT 16/10/17 09:01:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/10/17 09:01:16 INFO manager.SqlManager: Using default fetchSize of 1000 16/10/17 09:01:18 INFO oracle.OraOopOracleQueries: Current schema is: SQOOP 16/10/17 09:01:18 INFO oracle.OraOopManagerFactory: ** *** Using Data Connector for Oracle and Hadoop *** ** 16/10/17 09:01:24 INFO client.RMProxy: Connecting to ResourceManager at xxx.cloudera.com/xxx.xx.xx.xxx:8032 16/10/17 09:01:25 INFO hdfs.DFSClient: Created token for admin: HDFS_DELEGATION_TOKEN owner=x...@xxx.cloudera.com, renewer=yarn, realUser=, issueDate=1476720085263, maxDate=1477324885263, sequenceNumber=8, masterKeyId=6 on ha-hdfs:ns1 16/10/17 09:01:25 INFO security.TokenCache: Got dt for hdfs://ns1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns1, Ident: (token for admin: HDFS_DELEGATION_TOKEN owner=x...@xxx.cloudera.com, renewer=yarn, realUser=, issueDate=1476720085263, maxDate=1477324885263, sequenceNumber=8, masterKeyId=6) 16/10/17 09:02:04 INFO mapreduce.Job: map 0% reduce 0% 16/10/17 09:02:22 INFO mapreduce.Job: map 100% reduce 0% 16/10/17 09:02:22 INFO mapreduce.Job: Job job_1476708184468_0005 completed successfully 16/10/17 09:02:22 INFO mapreduce.ExportJobBase: Transferred 8.1641 KB in 57.6968 seconds (144.8954 bytes/sec) 16/10/17 09:02:22 INFO mapreduce.ExportJobBase: Exported 1 records. $ sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" Output: --- | C1 | C2 | --- | 1| one | | 1| one | --- Thanks, Markus > Oracle Direct export from HCatalog parquet table gives NPE in kerberized > cluster > > > Key: SQOOP-3025 > URL: https://issues.apache.org/jira/browse/SQOOP-3025 > Project: Sqoop > Issue Type: Bug > Components: build, codegen, connectors/oracle, hive-integration, > metastore >Affects Versions: 1.4.6 > Environment: Hadoop 2.6; Sqoop 1.4.6; Oracle 12c >Reporter: Ruslan Dautkhanov > Attachments: scoop_output.log, > sqoop_OracleDirect_mapper_nullPointer_exception.txt > > > Getting following exception in one of the mappers stack when try to export to > Oracle from Hive Parquet > {noformat} > 2016-10-14 23:14:01,027 ERROR [main] > org.apache.sqoop.manager.oracle.OraOopOutputFormatInsert: The following error > occurred during configurePreparedStatement() > java.lang.NullPointerException > at > org.apache.sqoop.manager.oracle.OraOopOutputFormatBase$OraOopDBRec
[jira] [Comment Edited] (SQOOP-1906) Export support for mixed update/insert against db2
[ https://issues.apache.org/jira/browse/SQOOP-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446704#comment-15446704 ] Markus Kemper edited comment on SQOOP-1906 at 8/29/16 7:12 PM: --- Hello [~xieshiju] and [~jarcec], Is the blocking issue on this Jira the creation of a review board entry for the patch? The IBM url below suggests that IBM supports this capability however have not found any commit or code for IBM in Db2Manager.java that has the "upsertTable()" method. Any assistance in understanding if this capability is supported with IBM would be appreciated. https://www.ibm.com/support/knowledgecenter/SSPT3X_4.1.0/com.ibm.swg.im.infosphere.biginsights.product.doc/doc/bi_sqoop.html IBM-specific backported patches and changes Export support for mixed update/insert against DB2. Thanks, [~markuskem...@me.com] was (Author: markuskem...@me.com): Hello [~xieshiju] and [~jarcec], Is the blocking issue on this Jira the creation of a review board entry for the patch? Thanks, [~markuskem...@me.com] > Export support for mixed update/insert against db2 > > > Key: SQOOP-1906 > URL: https://issues.apache.org/jira/browse/SQOOP-1906 > Project: Sqoop > Issue Type: Improvement > Components: connectors >Affects Versions: 1.4.5 > Environment: RedHat6.4+Sqoop1.4.5+Hadoop2.4.1+Hive0.13.0+DB2 10.5 >Reporter: xieshiju >Assignee: xieshiju > Labels: features > Fix For: 1.4.7 > > Attachments: SQOOP-1906.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > The DB2 connector need to implemented the option --update-mode allowinsert. > option --update-mode allowinsert which can be used when loading into a table > that is not empty. > In allowinsert mode export will try to update the row but if ithe row does > not exist it will insert it. Without this option only rows that already exist > in the table will be updated and any new rows are ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-1906) Export support for mixed update/insert against db2
[ https://issues.apache.org/jira/browse/SQOOP-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446704#comment-15446704 ] Markus Kemper commented on SQOOP-1906: -- Hello [~xieshiju] and [~jarcec], Is the blocking issue on this Jira the creation of a review board entry for the patch? Thanks, [~markuskem...@me.com] > Export support for mixed update/insert against db2 > > > Key: SQOOP-1906 > URL: https://issues.apache.org/jira/browse/SQOOP-1906 > Project: Sqoop > Issue Type: Improvement > Components: connectors >Affects Versions: 1.4.5 > Environment: RedHat6.4+Sqoop1.4.5+Hadoop2.4.1+Hive0.13.0+DB2 10.5 >Reporter: xieshiju >Assignee: xieshiju > Labels: features > Fix For: 1.4.7 > > Attachments: SQOOP-1906.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > The DB2 connector need to implemented the option --update-mode allowinsert. > option --update-mode allowinsert which can be used when loading into a table > that is not empty. > In allowinsert mode export will try to update the row but if ithe row does > not exist it will insert it. Without this option only rows that already exist > in the table will be updated and any new rows are ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2748) Remove parentheses from WHERE clause
[ https://issues.apache.org/jira/browse/SQOOP-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436758#comment-15436758 ] Markus Kemper commented on SQOOP-2748: -- Hello Warren, I will defer to others on how to best address this for the IBM IMS use case however to the best of my knowledge the SQL is valid. If the IMS JDBC driver does not allow this syntax perhaps an IBM IMS specific option via a --direct path would be a good approach. As a workaround, does using >1 map task allow your job to run? The snips below suggest that Sqoop may generate different SQL with >1 map tasks possibly avoiding the condition that is impacting you. sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/user1/t1 --delete-target-dir --as-textfile --split-by c1 --num-mappers 1 --verbose 2016-08-24 05:12:08,403 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: 1=1 AND 1=1 2016-08-24 05:12:08,520 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: 1=1 AND 1=1 2016-08-24 05:12:08,539 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `c1`, `c2` FROM `t1` AS `t1` WHERE ( 1=1 ) AND ( 1=1 ) sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/user1/t1 --delete-target-dir --as-textfile --split-by c1 --num-mappers 2 --verbose 2016-08-24 05:03:12,669 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: `c1` >= 1 AND `c1` < 5001 2016-08-24 05:03:12,756 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: `c1` >= 1 AND `c1` < 5001 2016-08-24 05:03:12,774 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `c1`, `c2` FROM `t1` AS `t1` WHERE ( `c1` >= 1 ) AND ( `c1` < 5001 ) >From your output do I am curious if you know why the IBM IMS JDBC Driver >generates "SELECT ... FROM MYIMSTABLE AS MYIMSTABLE WHERE ..." instead of >"SELECT ... FROM MYIMSTABLE WHERE ..."? > Remove parentheses from WHERE clause > > > Key: SQOOP-2748 > URL: https://issues.apache.org/jira/browse/SQOOP-2748 > Project: Sqoop > Issue Type: Wish > Components: codegen >Affects Versions: 1.4.6 > Environment: Redhat 6 > Cloudera 5.5.0 >Reporter: Warren Chisholm >Priority: Minor > Labels: import, ims, oneequalone, parentheses, sqoop > > When performing a sqoop import with option "-m 1" the following where clause > is added by sqoop: > WHERE ( 1=1 ) AND ( 1=1 ) > This where clause is redundant and creates an incompatibility with the IBM > IMS Universal JDBC driver. > The IMS JDBC Driver does not allow the use of parentheses in the where > clause. The below quote was taken from 2nd dot point here: > https://www-01.ibm.com/support/knowledgecenter/SSEPH2_13.1.0/com.ibm.ims13.doc.apg/ims_odbjdbcusagewhere.htm. > "Do not use parentheses. Qualification statements are evaluated from left to > right. The order of evaluation for operators is the IMS evaluation order for > segment search arguments." > I have modified the below commands and output to anonymize the details. > Sample sqoop command: > {code} > sqoop import --connect > 'jdbc:ims://myserver:/MYDATABASE:currentSchema=MYSCHEMA;' -username > -password --null-string '\\N' --null-non-string '\\N' > --hive-import --create-hive-table --table MYIMSTABLE --hive-table MYHIVETABLE > --hive-drop-import-delims --driver com.ibm.ims.jdbc.IMSDriver > --delete-target-dir -m 1 > {code} > Extract of log produced: > {code} > ... > 15/12/16 17:20:05 INFO mapreduce.Job: map 0% reduce 0% > 15/12/16 17:20:16 INFO mapreduce.Job: Task Id : > attempt_1449625205240_0003_m_00_0, Status : FAILED > Error: java.io.IOException: SQLException in nextKeyValue > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at > org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.map
Re: Review Request 50794: SQOOP-2884
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50794/ --- (Updated Aug. 10, 2016, 12:21 a.m.) Review request for Sqoop, Kathleen Ting and Attila Szabo. Changes --- New patch to remove whitespace in /src/doc/man/import-args.txt per comments Repository: sqoop-trunk Description --- Added man and html related changes for SQOOP-2884 (--temporary-rootdir) Diffs (updated) - src/docs/man/import-args.txt 49855ce src/docs/user/import.txt fbb32f5 Diff: https://reviews.apache.org/r/50794/diff/ Testing --- 1. Updated the following files: 1.1. /src/doc/man/import-args.txt < add command option 1.2. /src/doc/user/import.txt < add command details 2. Reviewed changes with browser and man. Thanks, Markus Kemper
Re: Review Request 50794: SQOOP-2884
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50794/ --- (Updated Aug. 10, 2016, 12:01 a.m.) Review request for Sqoop, Kathleen Ting and Attila Szabo. Changes --- Updated /src/doc/man/import-args.txt to remove whitespace per comments Repository: sqoop-trunk Description --- Added man and html related changes for SQOOP-2884 (--temporary-rootdir) Diffs - src/docs/man/import-args.txt 49855ce src/docs/user/import.txt fbb32f5 Diff: https://reviews.apache.org/r/50794/diff/ Testing --- 1. Updated the following files: 1.1. /src/doc/man/import-args.txt < add command option 1.2. /src/doc/user/import.txt < add command details 2. Reviewed changes with browser and man. File Attachments (updated) Updated Patch per comments https://reviews.apache.org/media/uploaded/files/2016/08/10/118c9164-ced0-4d75-9c84-8b021b9b770e__SQOOP-2884-1.patch Thanks, Markus Kemper
Re: Review Request 50794: SQOOP-2884
> On Aug. 4, 2016, 2:28 p.m., Erzsebet Szilagyi wrote: > > src/docs/man/import-args.txt, line 53 > > <https://reviews.apache.org/r/50794/diff/1/?file=1462854#file1462854line53> > > > > In the future, please make sure you trim excess whitespaces (they show > > up with red in the Diff View of the Review Board). Updated /src/doc/man/import-args.txt and removed white space Please see new patch SQOOP-2884-1.patch - Markus --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50794/#review144763 ------- On Aug. 4, 2016, 1:50 p.m., Markus Kemper wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50794/ > --- > > (Updated Aug. 4, 2016, 1:50 p.m.) > > > Review request for Sqoop, Kathleen Ting and Attila Szabo. > > > Repository: sqoop-trunk > > > Description > --- > > Added man and html related changes for SQOOP-2884 (--temporary-rootdir) > > > Diffs > - > > src/docs/man/import-args.txt 49855ce > src/docs/user/import.txt fbb32f5 > > Diff: https://reviews.apache.org/r/50794/diff/ > > > Testing > --- > > 1. Updated the following files: > 1.1. /src/doc/man/import-args.txt < add command option > 1.2. /src/doc/user/import.txt < add command details > 2. Reviewed changes with browser and man. > > > Thanks, > > Markus Kemper > >
[jira] [Comment Edited] (SQOOP-2884) Document --temporary-rootdir
[ https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414462#comment-15414462 ] Markus Kemper edited comment on SQOOP-2884 at 8/9/16 11:51 PM: --- Updated patch to remove white space per comments in: https://reviews.apache.org/r/50794/ Changed File(s): /src/doc/man/import-args.txt Verified changes using: 1. mc 2. browser was (Author: markuskem...@me.com): Updated patch to remove white space per comments in: https://reviews.apache.org/r/50794/ Verified changes using: 1. mc 2. browser > Document --temporary-rootdir > > > Key: SQOOP-2884 > URL: https://issues.apache.org/jira/browse/SQOOP-2884 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Attila Szabo > Assignee: Markus Kemper > Attachments: SQOOP-2884-1.patch, SQOOP-2884.patch > > > In SQOOP-2880 we've implemented this new option, but it needs an updated > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2884) Document --temporary-rootdir
[ https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-2884: - Attachment: SQOOP-2884-1.patch Updated patch to remove white space per comments in: https://reviews.apache.org/r/50794/ Verified changes using: 1. mc 2. browser > Document --temporary-rootdir > > > Key: SQOOP-2884 > URL: https://issues.apache.org/jira/browse/SQOOP-2884 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Attila Szabo > Assignee: Markus Kemper > Attachments: SQOOP-2884-1.patch, SQOOP-2884.patch > > > In SQOOP-2880 we've implemented this new option, but it needs an updated > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: New Sqoop Committer - Attila Szabo
Awesome news Attila, congrats! Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Tue, Aug 9, 2016 at 2:46 PM, Boglarka Egyed <b...@cloudera.com> wrote: > Congratulations, Attila! You rock! :) > > On Tue, Aug 9, 2016 at 8:40 PM, Kathleen Ting <kathl...@apache.org> wrote: > >> On behalf of the Apache Sqoop PMC, I am excited to welcome Attila >> Szabo as a new committer on Apache Sqoop. Please join me in >> congratulating him! >> >> Attila has added several new features, most notably improving Sqoop >> export performance with wide tables, improving usability by adding a >> parameter to handle the temporary directory during Sqoop imports, and >> improving quality by fixing unit tests. You can see all of his >> contributions at [1]. >> >> We appreciate all of Attila's hard work thus far, and look forward to >> his continued contributions. >> >> Best, >> Kate >> (on behalf of the Apache Sqoop PMC) >> >> >> Links: >> 1: https://s.apache.org/xgYm >> > >
Review Request 50794: SQOOP-2884
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50794/ --- Review request for Sqoop, Kathleen Ting and Attila Szabo. Repository: sqoop-trunk Description --- Added man and html related changes for SQOOP-2884 (--temporary-rootdir) Diffs - src/docs/man/import-args.txt 49855ce src/docs/user/import.txt fbb32f5 Diff: https://reviews.apache.org/r/50794/diff/ Testing --- 1. Updated the following files: 1.1. /src/doc/man/import-args.txt < add command option 1.2. /src/doc/user/import.txt < add command details 2. Reviewed changes with browser and man. Thanks, Markus Kemper
[jira] [Updated] (SQOOP-2884) Document --temporary-rootdir
[ https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-2884: - Attachment: SQOOP-2884.patch 1. Updated the following files: 1.1. /src/doc/man/import-args.txt < add command option 1.2. /src/doc/user/import.txt < add command details 2. Reviewed changes with browser and man. > Document --temporary-rootdir > > > Key: SQOOP-2884 > URL: https://issues.apache.org/jira/browse/SQOOP-2884 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Attila Szabo >Assignee: Markus Kemper > Attachments: SQOOP-2884.patch > > > In SQOOP-2880 we've implemented this new option, but it needs an updated > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))
[ https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper reassigned SQOOP-2874: Assignee: Markus Kemper > Highlight Sqoop import with --as-parquetfile use cases (Dataset name > is not alphanumeric (plus '_')) > --- > > Key: SQOOP-2874 > URL: https://issues.apache.org/jira/browse/SQOOP-2874 > Project: Sqoop > Issue Type: Improvement > Components: docs > Reporter: Markus Kemper > Assignee: Markus Kemper > Attachments: Jira_SQOOP-2874_TestCases.txt > > > Hello Sqoop Community, > Would it be possible to request some documentation enhancements? > The ask is here is to proactively help raise awareness and improve user > experience with a few specific use cases [1] where some Sqoop commands have > restricted character options when using import with --as-parquetfile. > My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet > files. From the Kite documentation [3] we see that to ensure compatibility > (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces > which bubble up in Sqoop. > The following Sqoop use cases when using import with --as-parquetfile result > in the error [2] below. Full tests cases for each scenario are attached. If > it is an option to enhance the Sqoop documentation for these use cases I am > happy to provide proposed changes, let me know. > [1] Use Cases: > 1. sqoop import --as-parquetfile + --target-dir > //. > 1.1. The '.' is not allowed > 2. sqoop import --as-parquetfile + --table . + (no > --target-dir) > 2.1. The '.' is not allowed, this is essentially the same as (1) > 3. sqoop import --as-parquetfile + --hive-import --table > . > 3.1. The proper usage is to use --hive-database with --hive-table however > with --as-textfile --hive-table works with . > [2] Kite Error: > 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > at > org.kitesdk.data.ValidationException.check(ValidationException.java:55) > at > org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) > at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) > at org.kitesdk.data.Datasets.create(Datasets.java:239) > at org.kitesdk.data.Datasets.create(Datasets.java:307) > at > org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141) > at > org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119) > at > org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130) > at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) > at > org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > [3] Kite Documenation: > http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html > Names and Namespaces > URIs also define a name and namespace for your dataset. Kite uses these > values when the underlying system has the same concept (for example, Hive). > The name and namespace are typically the last two values in a URI. For > example, if you create a dataset using the URI > dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the > fact_tables Hive database. If you create a dataset using the URI > dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset > named ratings in the fact_tables namespace. To ensure compatibility with > Hive and other underlying systems, names and namespaces in URIs must be made > of alphanumeric or underscore (_) characters and cannot start with a number. > Thanks, Markus -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SQOOP-2884) Document --temporary-rootdir
[ https://issues.apache.org/jira/browse/SQOOP-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper reassigned SQOOP-2884: Assignee: Markus Kemper (was: Attila Szabo) > Document --temporary-rootdir > > > Key: SQOOP-2884 > URL: https://issues.apache.org/jira/browse/SQOOP-2884 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Attila Szabo > Assignee: Markus Kemper > > In SQOOP-2880 we've implemented this new option, but it needs an updated > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SQOOP-2988) Update Sqoop1 Docs (warning about parallel jobs and "local" Sqoop1 Metastore)
[ https://issues.apache.org/jira/browse/SQOOP-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper reassigned SQOOP-2988: Assignee: Markus Kemper (was: Boglarka Egyed) > Update Sqoop1 Docs (warning about parallel jobs and "local" Sqoop1 Metastore) > - > > Key: SQOOP-2988 > URL: https://issues.apache.org/jira/browse/SQOOP-2988 > Project: Sqoop > Issue Type: Sub-task >Reporter: Boglarka Egyed >Assignee: Markus Kemper > Labels: doc, incremental, job, metastore, parallel > > An update of the Sqoop1 Documentation is needed. Warning should be added > about that in case of parallel incremental workloads using the "local" > metastore can result in unexpected behavior such as INSERT/UDATE related > information (e.g. incremental.last.value) doesn't get updated properly. A > recommendation should also be added to use shared metastore instead. More > details can be found in the parent issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2971) OraOop does not close connections properly
[ https://issues.apache.org/jira/browse/SQOOP-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348391#comment-15348391 ] Markus Kemper commented on SQOOP-2971: -- Thank you [~vasas], In the field we have seen this issue result in the following Oracle exception thus limited the degree of parallelism possible when SESSIONS are capped. 16/04/12 13:26:22 INFO mapreduce.Job: Task Id : attempt_1459758835311_12642_m_07_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: java.sql.SQLException: ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit > OraOop does not close connections properly > -- > > Key: SQOOP-2971 > URL: https://issues.apache.org/jira/browse/SQOOP-2971 > Project: Sqoop > Issue Type: Bug >Reporter: Attila Szabo >Assignee: Szabolcs Vasas > Fix For: 1.4.6, 1.4.7 > > Attachments: SQOOP-2971.patch > > > When importing with OraOop for every mapper task two connections are created, > and those connections are not closed until the end of the import process, > thus Sqoop use twice of the necessary amount, and kinda leaks the expensive > resources. This could cause problem when the number of the connections is > limited on the DB server side, however the user would like to leverage from > having more mappers (e.g. right now with 10 avail connections the user cannot > use 10 mappers, but instead only 4). > Please provide a fix for this issue either by sharing the connection, or > closing the connections when they are not needed anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))
[ https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-2874: - Attachment: Jira_SQOOP-2874_TestCases.txt Attaching test cases. > Highlight Sqoop import with --as-parquetfile use cases (Dataset name > is not alphanumeric (plus '_')) > --- > > Key: SQOOP-2874 > URL: https://issues.apache.org/jira/browse/SQOOP-2874 > Project: Sqoop > Issue Type: Improvement > Components: docs > Reporter: Markus Kemper > Attachments: Jira_SQOOP-2874_TestCases.txt > > > Hello Sqoop Community, > Would it be possible to request some documentation enhancements? > The ask is here is to proactively help raise awareness and improve user > experience with a few specific use cases [1] where some Sqoop commands have > restricted character options when using import with --as-parquetfile. > My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet > files. From the Kite documentation [3] we see that to ensure compatibility > (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces > which bubble up in Sqoop. > The following Sqoop use cases when using import with --as-parquetfile result > in the error [2] below. Full tests cases for each scenario are attached. If > it is an option to enhance the Sqoop documentation for these use cases I am > happy to provide proposed changes, let me know. > [1] Use Cases: > 1. sqoop import --as-parquetfile + --target-dir > //. > 1.1. The '.' is not allowed > 2. sqoop import --as-parquetfile + --table . + (no > --target-dir) > 2.1. The '.' is not allowed, this is essentially the same as (1) > 3. sqoop import --as-parquetfile + --hive-import --table > . > 3.1. The proper usage is to use --hive-database with --hive-table however > with --as-textfile --hive-table works with . > [2] Kite Error: > 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > at > org.kitesdk.data.ValidationException.check(ValidationException.java:55) > at > org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) > at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) > at org.kitesdk.data.Datasets.create(Datasets.java:239) > at org.kitesdk.data.Datasets.create(Datasets.java:307) > at > org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141) > at > org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119) > at > org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130) > at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) > at > org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > [3] Kite Documenation: > http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html > Names and Namespaces > URIs also define a name and namespace for your dataset. Kite uses these > values when the underlying system has the same concept (for example, Hive). > The name and namespace are typically the last two values in a URI. For > example, if you create a dataset using the URI > dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the > fact_tables Hive database. If you create a dataset using the URI > dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset > named ratings in the fact_tables namespace. To ensure compatibility with > Hive and other underlying systems, names and namespaces in URIs must be made > of alphanumeric or underscore (_) characters and cannot start with a number. > Thanks, Markus -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_'))
Markus Kemper created SQOOP-2874: Summary: Highlight Sqoop import with --as-parquetfile use cases (Dataset name is not alphanumeric (plus '_')) Key: SQOOP-2874 URL: https://issues.apache.org/jira/browse/SQOOP-2874 Project: Sqoop Issue Type: Improvement Components: docs Reporter: Markus Kemper Hello Sqoop Community, Would it be possible to request some documentation enhancements? The ask is here is to proactively help raise awareness and improve user experience with a few specific use cases [1] where some Sqoop commands have restricted character options when using import with --as-parquetfile. My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet files. From the Kite documentation [3] we see that to ensure compatibility (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces which bubble up in Sqoop. The following Sqoop use cases when using import with --as-parquetfile result in the error [2] below. Full tests cases for each scenario are attached. If it is an option to enhance the Sqoop documentation for these use cases I am happy to provide proposed changes, let me know. [1] Use Cases: 1. sqoop import --as-parquetfile + --target-dir //. 1.1. The '.' is not allowed 2. sqoop import --as-parquetfile + --table . + (no --target-dir) 2.1. The '.' is not allowed, this is essentially the same as (1) 3. sqoop import --as-parquetfile + --hive-import --table . 3.1. The proper usage is to use --hive-database with --hive-table however with --as-textfile --hive-table works with . [2] Kite Error: 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) [3] Kite Documenation: http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html Names and Namespaces URIs also define a name and namespace for your dataset. Kite uses these values when the underlying system has the same concept (for example, Hive). The name and namespace are typically the last two values in a URI. For example, if you create a dataset using the URI dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the fact_tables Hive database. If you create a dataset using the URI dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset named ratings in the fact_tables namespace. To ensure compatibility with Hive and other underlying systems, names and namespaces in URIs must be made of alphanumeric or underscore (_) characters and cannot start with a number. Thanks, Markus -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2858) Sqoop export with Avro data using (--update-key and --update-mode allowinsert) fails
[ https://issues.apache.org/jira/browse/SQOOP-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-2858: - Attachment: SQOOP-2858_TestCase.txt Attaching test cases. See tests 10, 13, 16 and 17 for failure scenarios. > Sqoop export with Avro data using (--update-key and --update-mode > allowinsert) fails > -- > > Key: SQOOP-2858 > URL: https://issues.apache.org/jira/browse/SQOOP-2858 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > Attachments: SQOOP-2858_TestCase.txt > > > Summary: > 1. sqoop export (--export-dir ) fails with error [1] > 2. sqoop export (--export-dir --update-key --update-mode > allowinsert) fails with error [2] > 3. sqoop export (--hcatalog-database --hcatalog-table > --update-key --update-mode allowinsert) with error [3] > See attachment for full test cases. > [1] > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 > 16/02/24 13:29:51 INFO mapreduce.Job: Task Id : > attempt_1456318803987_0015_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.math.BigDecimal > at T2_EXPORT.setField(T2_EXPORT.java:288) > [2] > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 --map-column-java > C1_INT=Integer --update-key C1_INT --update-mode allowinsert > 16/02/24 13:58:29 INFO mapreduce.Job: Task Id : > attempt_1456318803987_0022_m_00_0, Status : FAILED > Error: java.io.IOException: Can't export data, please check failed map task > logs > at > org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) > at > org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at > org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: Can't parse input data: > 'Objavro.schema�{"type":"record"' > [3] > sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table > T2_EXPORT --hcatalog-database db1 --hcatalog-table t1_avro --num-mappers 1 > --update-key C1_INT --update-mode allowinsert > 16/02/24 13:35:06 ERROR sqoop.Sqoop: Got exception running Sqoop: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.sqoop.mapreduce.ExportJobBase.getFileType(ExportJobBase.java:127) > at > org.apache.sqoop.mapreduce.ExportJobBase.isSequenceFiles(ExportJobBase.java:118) > at > org.apache.sqoop.mapreduce.ExportJobBase.inputIsSequenceFiles(ExportJobBase.java:492) > at > org.apache.sqoop.mapreduce.JdbcUpdateExportJob.getMapperClass(JdbcUpdateExportJob.java:69) > at > org.apache.sqoop.mapreduce.ExportJobBase.configureMapper(ExportJobBase.java:268) > at > org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:426) > at > org.apache.sqoop.manager.OracleManager.upsertTable(OracleManager.java:467) > at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:74) > at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-2858) Sqoop export with Avro data using (--update-key and --update-mode allowinsert) fails
Markus Kemper created SQOOP-2858: Summary: Sqoop export with Avro data using (--update-key and --update-mode allowinsert) fails Key: SQOOP-2858 URL: https://issues.apache.org/jira/browse/SQOOP-2858 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper Summary: 1. sqoop export (--export-dir ) fails with error [1] 2. sqoop export (--export-dir --update-key --update-mode allowinsert) fails with error [2] 3. sqoop export (--hcatalog-database --hcatalog-table --update-key --update-mode allowinsert) with error [3] See attachment for full test cases. [1] sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 16/02/24 13:29:51 INFO mapreduce.Job: Task Id : attempt_1456318803987_0015_m_00_0, Status : FAILED Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.math.BigDecimal at T2_EXPORT.setField(T2_EXPORT.java:288) [2] sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2_EXPORT --export-dir /user/root/t1_avro --num-mappers 1 --map-column-java C1_INT=Integer --update-key C1_INT --update-mode allowinsert 16/02/24 13:58:29 INFO mapreduce.Job: Task Id : attempt_1456318803987_0022_m_00_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: Can't parse input data: 'Objavro.schema�{"type":"record"' [3] sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2_EXPORT --hcatalog-database db1 --hcatalog-table t1_avro --num-mappers 1 --update-key C1_INT --update-mode allowinsert 16/02/24 13:35:06 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at org.apache.sqoop.mapreduce.ExportJobBase.getFileType(ExportJobBase.java:127) at org.apache.sqoop.mapreduce.ExportJobBase.isSequenceFiles(ExportJobBase.java:118) at org.apache.sqoop.mapreduce.ExportJobBase.inputIsSequenceFiles(ExportJobBase.java:492) at org.apache.sqoop.mapreduce.JdbcUpdateExportJob.getMapperClass(JdbcUpdateExportJob.java:69) at org.apache.sqoop.mapreduce.ExportJobBase.configureMapper(ExportJobBase.java:268) at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:426) at org.apache.sqoop.manager.OracleManager.upsertTable(OracleManager.java:467) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:74) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data
[ https://issues.apache.org/jira/browse/SQOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154773#comment-15154773 ] Markus Kemper commented on SQOOP-2847: -- Thank you [~jarcec]. > Sqoop --incremental + missing parent --target-dir reports success with no data > -- > > Key: SQOOP-2847 > URL: https://issues.apache.org/jira/browse/SQOOP-2847 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper >Assignee: Jarek Jarcec Cecho > Fix For: 1.4.7 > > Attachments: SQOOP-2847.patch, SQOOP-2847_TestCase.txt > > > The test case attached demonstrates inconsistent behavior between import > (incremental and non-incremental) when the path for --target-dir contains > parent directories that do not exist. > 1. Incremental reports import success however the data is not found in HDFS > 2. Non-incremental reports success, creates the missing directories and > writes the data to HDFS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data
[ https://issues.apache.org/jira/browse/SQOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Kemper updated SQOOP-2847: - Attachment: SQOOP-2847_TestCase.txt Test case > Sqoop --incremental + missing parent --target-dir reports success with no data > -- > > Key: SQOOP-2847 > URL: https://issues.apache.org/jira/browse/SQOOP-2847 > Project: Sqoop > Issue Type: Bug > Reporter: Markus Kemper > Attachments: SQOOP-2847_TestCase.txt > > > The test case attached demonstrates inconsistent behavior between import > (incremental and non-incremental) when the path for --target-dir contains > parent directories that do not exist. > 1. Incremental reports import success however the data is not found in HDFS > 2. Non-incremental reports success, creates the missing directories and > writes the data to HDFS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SQOOP-2847) Sqoop --incremental + missing parent --target-dir reports success with no data
Markus Kemper created SQOOP-2847: Summary: Sqoop --incremental + missing parent --target-dir reports success with no data Key: SQOOP-2847 URL: https://issues.apache.org/jira/browse/SQOOP-2847 Project: Sqoop Issue Type: Bug Reporter: Markus Kemper The test case attached demonstrates inconsistent behavior between import (incremental and non-incremental) when the path for --target-dir contains parent directories that do not exist. 1. Incremental reports import success however the data is not found in HDFS 2. Non-incremental reports success, creates the missing directories and writes the data to HDFS -- This message was sent by Atlassian JIRA (v6.3.4#6332)