[jira] [Updated] (SQOOP-1094) Add Avro support to merge tool
[ https://issues.apache.org/jira/browse/SQOOP-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated SQOOP-1094: -- Attachment: SQOOP-1094.1.patch Previous patch was not generated correctly. Re-attach a new one. Add Avro support to merge tool -- Key: SQOOP-1094 URL: https://issues.apache.org/jira/browse/SQOOP-1094 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.6 Reporter: Jarek Jarcec Cecho Assignee: Yibing Shi Attachments: SQOOP-1094.1.patch, SQOOP-1094.patch Current [merge tool|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/MergeJob.java#L117] do not seem to be supporting Avro format even though that the [documentation|http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_merge_literal] is stating that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Incorrect link for 1.4.6 documentation at http://sqoop.apache.org
Hi, Sqoop team. From the first page at sqoop.apache.org, the link for documentation of the 1.4.6 release is incorrect. For now, that is pointing the documentation for 1.4.5. I believe it should be fixed to avoid confusion. BTW, Congratulations Sqoop team on your recent releases, 1.4.6 and 1.99.6 :-) Thanks, Youngwoo
Re: Incorrect link for 1.4.6 documentation at http://sqoop.apache.org
Hey man, Fixed. Thanks! -Abe On Mon, May 18, 2015 at 12:57 AM, 김영우 warwit...@gmail.com wrote: Hi, Sqoop team. From the first page at sqoop.apache.org, the link for documentation of the 1.4.6 release is incorrect. For now, that is pointing the documentation for 1.4.5. I believe it should be fixed to avoid confusion. BTW, Congratulations Sqoop team on your recent releases, 1.4.6 and 1.99.6 :-) Thanks, Youngwoo
[jira] [Commented] (SQOOP-2161) Incremental append on to Hive Parquet tables doesn't work
[ https://issues.apache.org/jira/browse/SQOOP-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549278#comment-14549278 ] Mark Grover commented on SQOOP-2161: Hi Qian, Thanks for working on this. Truth be told, I don't have the bandwidth in the next weeks to test this out. I'd appreciate if someone else could pick up my slack. Otherwise, I am happy to take your word that it's resolved:-) Thanks! Incremental append on to Hive Parquet tables doesn't work - Key: SQOOP-2161 URL: https://issues.apache.org/jira/browse/SQOOP-2161 Project: Sqoop Issue Type: Bug Components: hive-integration Affects Versions: 1.4.5 Reporter: Mark Grover Assignee: Qian Xu Attachments: append_pre_created_verbose.log, new_pre_created_verbose.log I have some code that does incremental append to a Hive parquet table. I am able to get the first time automatic table creation in Hive to work, but subsequent appends return errors. Also, ideally, I would like to create the Hive table explicitly myself and only just do appends to it. More concretely, the code I am using is at https://gist.github.com/markgrover/86f54663ece0943bc8ed I am also attaching two verbose error logs. Each of them contain, at the top, the command ran. new_pre_created_verbose.log contains the error which occurs if I try to import data into an empty Hive parquet table that have been created in Hive. I ran the create table statement in the above gist and it matches the source schema one-to-one. In order to get past the above error, I don't run the hive create table command explicitly. Sqoop then successfully creates the table and adds data to it. However, on the next run when I want to append more data, I get another error which is detailed in append_pre_created_verbose.log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-1369) Avro export ignores --columns option
[ https://issues.apache.org/jira/browse/SQOOP-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549219#comment-14549219 ] Sqoop QA bot commented on SQOOP-1369: - Testing file [SQOOP-1369.patch|https://issues.apache.org/jira/secure/attachment/12730924/SQOOP-1369.patch] against branch sqoop2 took 0:00:09.215420. {color:red}Overall:{color} -1 due to an error {color:red}ERROR:{color} failed to apply patch (exit code 128): {code}fatal: corrupt patch at line 85 {code} {color:green}SUCCESS:{color} Clean was successful Console output is available [here|https://builds.apache.org/job/PreCommit-SQOOP-Build/1358/console]. This message is automatically generated. Avro export ignores --columns option Key: SQOOP-1369 URL: https://issues.apache.org/jira/browse/SQOOP-1369 Project: Sqoop Issue Type: Bug Reporter: Lars Francke Fix For: 1.99.7 Attachments: SQOOP-1369.patch In JdbcExportJob AVRO_COLUMN_TYPES_MAP is being set with the full schema of the output table. This causes the AvroExportMapper to fail with unknown fields if --columns was used to restrict the columns to export (it then tries to set a value on the generated class which doesn't exist). There are multiple ways I can see to solve this. * Filter the columnTypes passed on to the Mapper in JdbcExportJob.configureInputFormat * Pass the --columns value on to the AvroExportMapper and let it ignore things that are not in there * Let AvroExportMapper not fail when it can't set a field. I might be able to provide a patch and I'd go with the simplest (the first one probably) if there are no objections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-1094) Add Avro support to merge tool
[ https://issues.apache.org/jira/browse/SQOOP-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated SQOOP-1094: -- Attachment: SQOOP-1094.2.patch Add Avro support to merge tool -- Key: SQOOP-1094 URL: https://issues.apache.org/jira/browse/SQOOP-1094 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.6 Reporter: Jarek Jarcec Cecho Assignee: Yibing Shi Attachments: SQOOP-1094.1.patch, SQOOP-1094.2.patch, SQOOP-1094.patch Current [merge tool|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/MergeJob.java#L117] do not seem to be supporting Avro format even though that the [documentation|http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_merge_literal] is stating that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34390: SQOOP-1094 Add avro support to merge tool
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34390/ --- Review request for Sqoop and Jarek Cecho. Repository: sqoop-trunk Description --- SQOOP-1094 Add avro support to merge tool Diffs - src/java/org/apache/sqoop/mapreduce/AvroJob.java bb4755c880d4b70f812caf5e812135400602ee36 src/java/org/apache/sqoop/mapreduce/MergeAvroReducer.java PRE-CREATION src/java/org/apache/sqoop/mapreduce/MergeAvrodMapper.java PRE-CREATION src/java/org/apache/sqoop/mapreduce/MergeJob.java 4e2a916911e7f47838366edf46b5ba5073502453 src/java/org/apache/sqoop/mapreduce/MergeReducer.java cafff8ab0609cd0580ff85e49082a59ce68e7141 src/java/org/apache/sqoop/mapreduce/MergeReducerBase.java PRE-CREATION Diff: https://reviews.apache.org/r/34390/diff/ Testing --- Thanks, Yibing Shi
[jira] [Updated] (SQOOP-1369) Avro export ignores --columns option
[ https://issues.apache.org/jira/browse/SQOOP-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abraham Elmahrek updated SQOOP-1369: Fix Version/s: 1.99.7 Avro export ignores --columns option Key: SQOOP-1369 URL: https://issues.apache.org/jira/browse/SQOOP-1369 Project: Sqoop Issue Type: Bug Reporter: Lars Francke Fix For: 1.99.7 Attachments: SQOOP-1369.patch In JdbcExportJob AVRO_COLUMN_TYPES_MAP is being set with the full schema of the output table. This causes the AvroExportMapper to fail with unknown fields if --columns was used to restrict the columns to export (it then tries to set a value on the generated class which doesn't exist). There are multiple ways I can see to solve this. * Filter the columnTypes passed on to the Mapper in JdbcExportJob.configureInputFormat * Pass the --columns value on to the AvroExportMapper and let it ignore things that are not in there * Let AvroExportMapper not fail when it can't set a field. I might be able to provide a patch and I'd go with the simplest (the first one probably) if there are no objections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SQOOP-1094) Add Avro support to merge tool
[ https://issues.apache.org/jira/browse/SQOOP-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549649#comment-14549649 ] Yibing Shi commented on SQOOP-1094: --- I have created a review board for my changes: https://reviews.apache.org/r/34390/ Add Avro support to merge tool -- Key: SQOOP-1094 URL: https://issues.apache.org/jira/browse/SQOOP-1094 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.6 Reporter: Jarek Jarcec Cecho Assignee: Yibing Shi Attachments: SQOOP-1094.1.patch, SQOOP-1094.2.patch, SQOOP-1094.patch Current [merge tool|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/MergeJob.java#L117] do not seem to be supporting Avro format even though that the [documentation|http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_merge_literal] is stating that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SQOOP-2104) Add the independent Jetty Sqoop miniCluster in the test framework
[ https://issues.apache.org/jira/browse/SQOOP-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guoquan reassigned SQOOP-2104: -- Assignee: guoquan Add the independent Jetty Sqoop miniCluster in the test framework - Key: SQOOP-2104 URL: https://issues.apache.org/jira/browse/SQOOP-2104 Project: Sqoop Issue Type: Improvement Components: test Reporter: guoquan Assignee: guoquan Attachments: SQOOP-2104.001.patch Currenly when the Tomcat Sqoop miniCluster started, it at firstly will need to download the tomcat.tar.gz. When the Tomcat tar package has been successfully downloaded, it will load the sqoop war package. That's is the problem. For example, When the sentry want to do some integration tests with sqoop, Sentry want to use the Tomcat Sqoop miniCluster to test, it must download the sqoop war package in its project. It is not the correct way. The correct method is that when Sentry want to do integration tests with Sqoop, it only to do is add a maven dependency in the pom, then it can use a independent sqoop miniCluster to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2328) Sqoop import does not recognize Primary Key of a IBM DB2 table
[ https://issues.apache.org/jira/browse/SQOOP-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2328: -- Assignee: Shashank Sqoop import does not recognize Primary Key of a IBM DB2 table -- Key: SQOOP-2328 URL: https://issues.apache.org/jira/browse/SQOOP-2328 Project: Sqoop Issue Type: Bug Affects Versions: 1.4.5 Environment: IBM DB2 9,.7V Reporter: Atul Gupta Assignee: Shashank Fix For: 1.4.7 Currently Sqoop import query does not recognize the PK of IBM DB2 table. When any sqoop query runs for DB2 table, it is not able to recognize the primary key(PK) of that table, which is used as --split-by column implicitly. To run the Sqoop query it is mandatory to give --split-by colname explicitly. Query Given below: {code} sqoop import -Dmapred.job.queue.name=edwdev --connect jdbc:db2://dle-db2edw01.dl.karmalab.net:50001/EXPDEV01 --username='' -P --table edwdev.mytable --hive-import --hive-table platdev.mytable_dynapart --hive-external-table --target-dir /user/lbansal/hive_data_dynapart/testdir --hive-dynamic-partition --hive-partition-key DATE --hive-partition-key-format aa- --incremental lastmodified --check-column DATE --last-value '2014-10-22' --append gives the following error message: 14/11/21 02:17:32 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM edwdev.mytable AS t WHERE 1=0 14/11/21 02:17:33 INFO tool.ImportTool: Incremental import based on column DATE 14/11/21 02:17:33 INFO tool.ImportTool: Lower bound value: '2014-10-22' 14/11/21 02:17:33 INFO tool.ImportTool: Upper bound value: '2014-11-21 02:17:33.680951' 14/11/21 02:17:34 INFO hive.metastore: Trying to connect to metastore with URI thrift://cheledwhdd004.karmalab.net:9083 14/11/21 02:17:34 INFO hive.metastore: Connected to metastore. 14/11/21 02:17:34 WARN tool.HiveUtil: platdev.mytable_dynapart table not found 14/11/21 02:17:34 WARN tool.HiveUtil: platdev.mytable_dynapart table not found 14/11/21 02:17:34 ERROR tool.ImportTool: Error during import: No primary key could be found for table edwdev.mytable. Please specify one with --split-by or perform a sequential import with '-m 1'. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2334) Sqoop Volume Per Mapper
[ https://issues.apache.org/jira/browse/SQOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2334: -- Assignee: Rakesh Sharma Sqoop Volume Per Mapper --- Key: SQOOP-2334 URL: https://issues.apache.org/jira/browse/SQOOP-2334 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.5 Reporter: Atul Gupta Assignee: Rakesh Sharma Fix For: 1.4.7 There is no way where user can define the upper limit of volume that each mapper can handle. Current Sqoop does the calculation based on mapper by Switch -m and --split-by but this does not give control user to specify the upper limit of volume handle by the mapper . if we can add such functionality in the Sqoop that would help us to load the bigger data set in case we don't have continuous key data available and there is a huge gap in maximum and minimum data set value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2335) Support for Hive External Table in Sqoop - HCatalog
[ https://issues.apache.org/jira/browse/SQOOP-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2335: -- Assignee: Rakesh Sharma Support for Hive External Table in Sqoop - HCatalog --- Key: SQOOP-2335 URL: https://issues.apache.org/jira/browse/SQOOP-2335 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.5 Reporter: Atul Gupta Assignee: Rakesh Sharma Fix For: 1.4.7 Currently Apache Sqoop tool support only Hive managed table using Hcatalog. it would be nice if we have support to create a Hive external table in Sqoop Hcatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2328) Sqoop import does not recognize Primary Key of a IBM DB2 table
[ https://issues.apache.org/jira/browse/SQOOP-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2328: -- Assignee: Atul Gupta (was: Shashank) Sqoop import does not recognize Primary Key of a IBM DB2 table -- Key: SQOOP-2328 URL: https://issues.apache.org/jira/browse/SQOOP-2328 Project: Sqoop Issue Type: Bug Affects Versions: 1.4.5 Environment: IBM DB2 9,.7V Reporter: Atul Gupta Assignee: Atul Gupta Fix For: 1.4.7 Currently Sqoop import query does not recognize the PK of IBM DB2 table. When any sqoop query runs for DB2 table, it is not able to recognize the primary key(PK) of that table, which is used as --split-by column implicitly. To run the Sqoop query it is mandatory to give --split-by colname explicitly. Query Given below: {code} sqoop import -Dmapred.job.queue.name=edwdev --connect jdbc:db2://dle-db2edw01.dl.karmalab.net:50001/EXPDEV01 --username='' -P --table edwdev.mytable --hive-import --hive-table platdev.mytable_dynapart --hive-external-table --target-dir /user/lbansal/hive_data_dynapart/testdir --hive-dynamic-partition --hive-partition-key DATE --hive-partition-key-format aa- --incremental lastmodified --check-column DATE --last-value '2014-10-22' --append gives the following error message: 14/11/21 02:17:32 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM edwdev.mytable AS t WHERE 1=0 14/11/21 02:17:33 INFO tool.ImportTool: Incremental import based on column DATE 14/11/21 02:17:33 INFO tool.ImportTool: Lower bound value: '2014-10-22' 14/11/21 02:17:33 INFO tool.ImportTool: Upper bound value: '2014-11-21 02:17:33.680951' 14/11/21 02:17:34 INFO hive.metastore: Trying to connect to metastore with URI thrift://cheledwhdd004.karmalab.net:9083 14/11/21 02:17:34 INFO hive.metastore: Connected to metastore. 14/11/21 02:17:34 WARN tool.HiveUtil: platdev.mytable_dynapart table not found 14/11/21 02:17:34 WARN tool.HiveUtil: platdev.mytable_dynapart table not found 14/11/21 02:17:34 ERROR tool.ImportTool: Error during import: No primary key could be found for table edwdev.mytable. Please specify one with --split-by or perform a sequential import with '-m 1'. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-1094) Add Avro support to merge tool
[ https://issues.apache.org/jira/browse/SQOOP-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated SQOOP-1094: -- Attachment: SQOOP-1094.patch Submit a patch for this problem. Add Avro support to merge tool -- Key: SQOOP-1094 URL: https://issues.apache.org/jira/browse/SQOOP-1094 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.6 Reporter: Jarek Jarcec Cecho Attachments: SQOOP-1094.patch Current [merge tool|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/MergeJob.java#L117] do not seem to be supporting Avro format even though that the [documentation|http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_literal_sqoop_merge_literal] is stating that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2332) Dynamic Partition in Sqoop HCatalog- if Hive table does not exists add support for Partition Date Format
[ https://issues.apache.org/jira/browse/SQOOP-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2332: -- Assignee: Shashank Dynamic Partition in Sqoop HCatalog- if Hive table does not exists add support for Partition Date Format -- Key: SQOOP-2332 URL: https://issues.apache.org/jira/browse/SQOOP-2332 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.5 Reporter: Atul Gupta Assignee: Shashank Fix For: 1.4.7 Apache Sqoop 1.4.5 with Hcatalog supports Dynamic partitions when hive hive partition table already exists. it would be nice if we have Dynamic Partition support in Apache Sqoop Hcatalog even if Hive table does not exist during data import. as a suggestion could add option like --hive-dynamic-partition Also there is no support for date formats on partition column having type like date/time/timestamp. As a suggestion it would be great if we add Option like --hive-partition-key-format to support the partition date formats based on java simple available date formats like -MM-dd, -MM, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2331) Snappy Compression Support in Sqoop-HCatalog
[ https://issues.apache.org/jira/browse/SQOOP-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2331: -- Assignee: Shashank Snappy Compression Support in Sqoop-HCatalog Key: SQOOP-2331 URL: https://issues.apache.org/jira/browse/SQOOP-2331 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.5 Reporter: Atul Gupta Assignee: Shashank Fix For: 1.4.7 Current Apache Sqoop 1.4.5 does not compress in gzip format with --compress option while using with --hcatalog-table option. It also does not support option --compression-codec snappy with --hcatalog-table option. it would be nice if we add both the options in the Sqoop future releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SQOOP-2333) Sqoop to support Custom options for User Defined Plugins(Tool)
[ https://issues.apache.org/jira/browse/SQOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atul Gupta updated SQOOP-2333: -- Assignee: Rakesh Sharma Sqoop to support Custom options for User Defined Plugins(Tool) -- Key: SQOOP-2333 URL: https://issues.apache.org/jira/browse/SQOOP-2333 Project: Sqoop Issue Type: New Feature Affects Versions: 1.4.5 Reporter: Atul Gupta Assignee: Rakesh Sharma Fix For: 1.4.7 Sqoop currently does not provide any mechanism to define custom switches for the user defined tools. it would be nice if we can enhance Sqoop to support user defined Custom Tool options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)