[jira] [Commented] (HIVE-7110) TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile
[ https://issues.apache.org/jira/browse/HIVE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019784#comment-14019784 ] David Chen commented on HIVE-7110: -- Interesting. Though, something is causing this test to fail when building on OS X 1.9.3, and I have reproduced the failure on two different machines, which I think does indicate that something strange is going on in the build script and should be fixed. I will see if this reproduces on my Ubuntu 12.04 VM and RHEL 6.4 dev box. If it does not, then we can de-prioritize/postpone this issue. TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile - Key: HIVE-7110 URL: https://issues.apache.org/jira/browse/HIVE-7110 Project: Hive Issue Type: Bug Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7110.1.patch, HIVE-7110.2.patch, HIVE-7110.3.patch, HIVE-7110.4.patch I got the following TestHCatPartitionPublish test failure when running all unit tests against Hadoop 1. This also appears when testing against Hadoop 2. {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.06 sec FAILURE! - in org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish testPartitionPublish(org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish) Time elapsed: 1.361 sec ERROR! org.apache.hive.hcatalog.common.HCatException: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : java.io.IOException: No FileSystem for scheme: pfile at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:212) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70) at org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.runMRCreateFail(TestHCatPartitionPublish.java:191) at org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:155) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
[ https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut resolved HIVE-1019. Resolution: Won't Fix Hiveserver2 doesn't suffer from this. java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) Key: HIVE-1019 URL: https://issues.apache.org/jira/browse/HIVE-1019 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.6.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt I keep getting errors like this: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) and : java.io.IOException: cannot find dir = hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in partToPartitionInfo! when running multiple threads with roughly similar queries. I have a patch for this which works for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr. Wendell Urth updated HIVE-7175: --- Attachment: HIVE-7175.patch I've added a patch that provides this ability akin to Sqoop's mechanism (minus the encrypted/obfuscated file loader options, as those could be better handled by Larry's proposal). This would be useful in the immediate future, until what Larry proposes can be compatibly added to Hive in future upon completion upstream. Please review. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr. Wendell Urth updated HIVE-7175: --- Release Note: Added an --password-file (or, -w) option to BeeLine CLI, to read a password from a permission-protected file instead of supplying it in plaintext form as part of the command (-p). Status: Patch Available (was: Open) Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7186) Unable to perform join on table
Alex Nastetsky created HIVE-7186: Summary: Unable to perform join on table Key: HIVE-7186 URL: https://issues.apache.org/jira/browse/HIVE-7186 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Hortonworks Data Platform 2.0 Reporter: Alex Nastetsky Occasionally, a table will start exhibiting behavior that will prevent it from being used in a JOIN. When doing a map join, it will just stall at Starting to launch local task to process map join; . When doing a regular join, it will make progress but then error out with a IndexOutOfBoundsException: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) ... 9 more Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:532) at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334) ... 15 more Doing simple selects against this table work fine and do not show any apparent problems with the data. Assume that the table in question is called tableA and was created by queryA. Doing either of the following has helped resolve the issue in the past. 1) create table tableB as select * from tableA; Then just use tableB instead in the JOIN. 2) regenerate tableA using queryA Then use tableA in the JOIN again. It usually works the second time. When doing a describe formatted on the tables, the totalSize will be different between the original tableA and tableB, and sometimes (but not always) between the original tableA and the regenerated tableA. The numRows will be the same across all versions of the tables. This problem can not be reproduced consistently, but the issue always happens when we try to use an affected table in a JOIN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7136: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Sumit! Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7135) Fix test fail of TestTezTask.testSubmit
[ https://issues.apache.org/jira/browse/HIVE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7135: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Fix test fail of TestTezTask.testSubmit --- Key: HIVE-7135 URL: https://issues.apache.org/jira/browse/HIVE-7135 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7135.1.patch, HIVE-7135.2.patch.txt HIVE-7043 broke a tez test case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7176) FileInputStream is not closed in Commands#properties()
[ https://issues.apache.org/jira/browse/HIVE-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7176: --- Resolution: Fixed Fix Version/s: 0.14.0 Assignee: Navis Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! FileInputStream is not closed in Commands#properties() -- Key: HIVE-7176 URL: https://issues.apache.org/jira/browse/HIVE-7176 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Navis Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7176.1.patch.txt NO PRECOMMIT TESTS In beeline.Commands, around line 834: {code} props.load(new FileInputStream(parts[i])); {code} The FileInputStream is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22170/ --- (Updated June 6, 2014, 4:30 p.m.) Review request for hive and Prasanth_J. Changes --- Fixed last failing test. Bugs: HIVE-7168 https://issues.apache.org/jira/browse/HIVE-7168 Repository: hive-git Description --- analyze table T compute statistics for columns; will now compute stats for all columns. Diffs (updated) - metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1245d80 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 5b77e6f ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f Diff: https://reviews.apache.org/r/22170/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Attachment: HIVE-7168.2.patch Fixed last failing test. Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Status: Open (was: Patch Available) Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Status: Patch Available (was: Open) Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020023#comment-14020023 ] Vaibhav Gumashta commented on HIVE-7040: Thanks for the patch [~nicothieb]! There is another jira: HIVE-6679, which looks at doing this for binary mode (with and without SSL). Is it possible to handle the SSL case as well in this jira? TCP KeepAlive for HiveServer2 - Key: HIVE-7040 URL: https://issues.apache.org/jira/browse/HIVE-7040 Project: Hive Issue Type: Improvement Components: HiveServer2, Server Infrastructure Reporter: Nicolas ThiƩbaud Attachments: HIVE-7040.patch, HIVE-7040.patch.2 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections. A setting could be added {code} property namehive.server2.tcp.keepalive/name valuetrue/value descriptionWhether to enable TCP keepalive for Hive Server 2/description /property {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7143) Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval)
[ https://issues.apache.org/jira/browse/HIVE-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020035#comment-14020035 ] Ashutosh Chauhan commented on HIVE-7143: +1 Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval) - Key: HIVE-7143 URL: https://issues.apache.org/jira/browse/HIVE-7143 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7143.1.patch, HIVE-7143.3.patch Provided implementations for Streaming for the above fns. Min/Max based on Alg by Daniel Lemire: http://www.archipel.uqam.ca/309/1/webmaximinalgo.pdf -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7186) Unable to perform join on table
[ https://issues.apache.org/jira/browse/HIVE-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Nastetsky updated HIVE-7186: - Environment: Hortonworks Data Platform 2.0.6.0 (was: Hortonworks Data Platform 2.0) Unable to perform join on table --- Key: HIVE-7186 URL: https://issues.apache.org/jira/browse/HIVE-7186 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Hortonworks Data Platform 2.0.6.0 Reporter: Alex Nastetsky Occasionally, a table will start exhibiting behavior that will prevent it from being used in a JOIN. When doing a map join, it will just stall at Starting to launch local task to process map join; . When doing a regular join, it will make progress but then error out with a IndexOutOfBoundsException: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) ... 9 more Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:532) at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334) ... 15 more Doing simple selects against this table work fine and do not show any apparent problems with the data. Assume that the table in question is called tableA and was created by queryA. Doing either of the following has helped resolve the issue in the past. 1) create table tableB as select * from tableA; Then just use tableB instead in the JOIN. 2) regenerate tableA using queryA Then use tableA in the JOIN again. It usually works the second time. When doing a describe formatted on the tables, the totalSize will be different between the original tableA and tableB, and sometimes (but not always) between the original tableA and the regenerated tableA. The numRows will be the same across all versions of the tables. This problem can not be reproduced consistently, but the issue always happens when we try to use an affected table in a JOIN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020049#comment-14020049 ] Vaibhav Gumashta commented on HIVE-7040: Actually HIVE-6679 looks like focussed just on timeouts, so please ignore the jira. TCP KeepAlive for HiveServer2 - Key: HIVE-7040 URL: https://issues.apache.org/jira/browse/HIVE-7040 Project: Hive Issue Type: Improvement Components: HiveServer2, Server Infrastructure Reporter: Nicolas ThiƩbaud Attachments: HIVE-7040.patch, HIVE-7040.patch.2 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections. A setting could be added {code} property namehive.server2.tcp.keepalive/name valuetrue/value descriptionWhether to enable TCP keepalive for Hive Server 2/description /property {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition
[ https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7117: -- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Partitions not inheriting table permissions after alter rename partition Key: HIVE-7117 URL: https://issues.apache.org/jira/browse/HIVE-7117 Project: Hive Issue Type: Bug Components: Security Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Fix For: 0.14.0 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, HIVE-7117.patch On altering/renaming a partition it must inherit permission of the parent directory, if the flag hive.warehouse.subdir.inherit.perms is set. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition
[ https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020145#comment-14020145 ] Xuefu Zhang commented on HIVE-7117: --- Patch committed to trunk. Thanks to Ashish for the contribution. Partitions not inheriting table permissions after alter rename partition Key: HIVE-7117 URL: https://issues.apache.org/jira/browse/HIVE-7117 Project: Hive Issue Type: Bug Components: Security Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Fix For: 0.14.0 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, HIVE-7117.patch On altering/renaming a partition it must inherit permission of the parent directory, if the flag hive.warehouse.subdir.inherit.perms is set. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7187) Reconcile jetty versions in hive
Vaibhav Gumashta created HIVE-7187: -- Summary: Reconcile jetty versions in hive Key: HIVE-7187 URL: https://issues.apache.org/jira/browse/HIVE-7187 Project: Hive Issue Type: Bug Components: HiveServer2, Web UI, WebHCat Reporter: Vaibhav Gumashta Hive root pom has 3 parameters for specifying jetty dependency versions: {code} jetty.version6.1.26/jetty.version jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version {code} 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). We should probably use the same jetty version for all hive components. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive
[ https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020170#comment-14020170 ] Eugene Koifman commented on HIVE-7187: -- also, the current release of Jetty is 9.x. Reconcile jetty versions in hive Key: HIVE-7187 URL: https://issues.apache.org/jira/browse/HIVE-7187 Project: Hive Issue Type: Bug Components: HiveServer2, Web UI, WebHCat Reporter: Vaibhav Gumashta Hive root pom has 3 parameters for specifying jetty dependency versions: {code} jetty.version6.1.26/jetty.version jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version {code} 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). We should probably use the same jetty version for all hive components. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7063: Attachment: HIVE-7063.1.patch preliminary patch: this adds code to WdwTabFn to react to a rank limit. Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS fails on hive-13 for hadoop-2
This is passing in the builds, and also for me. Looks like some environment issue. Are you running in eclipse or maven? Thanks Szehon On Thu, Jun 5, 2014 at 5:51 PM, pankit thapar thapar.pan...@gmail.com wrote: Hi, I am trying to build hive on my local desktop. I am facing an issue with test case : TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS The issue is only with hadoop-2 and not with hadoop-1 Has anyone been able to run this test case? Trace : org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1406) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527) Thanks, Pankit
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020290#comment-14020290 ] Hive QA commented on HIVE-7175: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648646/HIVE-7175.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5511 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/399/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-399/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648646 Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-538) make hive_jdbc.jar self-containing
[ https://issues.apache.org/jira/browse/HIVE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-538: -- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Nick! make hive_jdbc.jar self-containing -- Key: HIVE-538 URL: https://issues.apache.org/jira/browse/HIVE-538 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.3.0, 0.4.0, 0.6.0, 0.13.0 Reporter: Raghotham Murthy Assignee: Nick White Fix For: 0.14.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-538.D2553.2.patch, HIVE-538.patch Currently, most jars in hive/build/dist/lib and the hadoop-*-core.jar are required in the classpath to run jdbc applications on hive. We need to do atleast the following to get rid of most unnecessary dependencies: 1. get rid of dynamic serde and use a standard serialization format, maybe tab separated, json or avro 2. dont use hadoop configuration parameters 3. repackage thrift and fb303 classes into hive_jdbc.jar -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Attachment: hike-vector-sum-bug.tgz sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial outputColumnNames: _col0
[jira] [Created] (HIVE-7188) sum(if()) returns wrong results with vectorization
Hari Sankar Sivarama Subramaniyan created HIVE-7188: --- Summary: sum(if()) returns wrong results with vectorization Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File
[jira] [Created] (HIVE-7189) Hive does not store column names in ORC
Chris Drome created HIVE-7189: - Summary: Hive does not store column names in ORC Key: HIVE-7189 URL: https://issues.apache.org/jira/browse/HIVE-7189 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0, 0.12.0 Reporter: Chris Drome We uncovered the following discrepancy between writing ORC files through Pig and Hive: ORCFile header contains the name of the columns. Storing through Pig (ORCStorage or HCatStorer), the column names are stored fine. But when stored through hive they are stored as _col0, _col1,,_col99 and hive uses the partition schema to map the column names. Reading the same file through Pig then has problems as user will have to manually map columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
Ivan Mitic created HIVE-7190: Summary: WebHCat launcher task failure can cause two concurent user jobs to run Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Ivan Mitic Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020358#comment-14020358 ] Ivan Mitic commented on HIVE-7190: -- Will attach a patch in a bit, feel free to assign the Jira to me as I don't have the right to do so yet. WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Ivan Mitic Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7065: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution Eugene! Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb
Sergey Shelukhin created HIVE-7191: -- Summary: optimized map join hash table has a bug when it reaches 2Gb Key: HIVE-7191 URL: https://issues.apache.org/jira/browse/HIVE-7191 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Via [~t3rmin4t0r]: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -204 at java.util.ArrayList.elementData(ArrayList.java:371) at java.util.ArrayList.get(ArrayList.java:384) at org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95) at org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb
[ https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7191: --- Attachment: HIVE-7191.patch Some casts are in order optimized map join hash table has a bug when it reaches 2Gb --- Key: HIVE-7191 URL: https://issues.apache.org/jira/browse/HIVE-7191 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7191.patch Via [~t3rmin4t0r]: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -204 at java.util.ArrayList.elementData(ArrayList.java:371) at java.util.ArrayList.get(ArrayList.java:384) at org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95) at org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb
[ https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7191: --- Status: Patch Available (was: Open) +1 optimized map join hash table has a bug when it reaches 2Gb --- Key: HIVE-7191 URL: https://issues.apache.org/jira/browse/HIVE-7191 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7191.patch Via [~t3rmin4t0r]: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -204 at java.util.ArrayList.elementData(ArrayList.java:371) at java.util.ArrayList.get(ArrayList.java:384) at org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95) at org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated HIVE-7190: - Attachment: HIVE-7190.patch Attaching the initial patch. Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new {{templeton.job.launch.time}} property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Ivan Mitic Attachments: HIVE-7190.patch Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7190: - Affects Version/s: 0.13.0 WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Ivan Mitic Attachments: HIVE-7190.patch Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
Roshan Naik created HIVE-7192: - Summary: Hive Streaming - Some required settings are not mentioned in the documentation Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7167) Hive Metastore fails to start with SQLServerException
[ https://issues.apache.org/jira/browse/HIVE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020409#comment-14020409 ] Sergey Shelukhin commented on HIVE-7167: 1) Can you post SQLServerException you are getting? 2) Why these 3 methods of all methods? 3) It seems like and a hacky way to solve the problem. It can still fail again, right? Hive Metastore fails to start with SQLServerException - Key: HIVE-7167 URL: https://issues.apache.org/jira/browse/HIVE-7167 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Labels: patch,, test Fix For: 0.13.0 Attachments: HIVE-7167.1.patch In the case that hiveserver2 uses embedded metastore and hiveserver uses remote metastore, this exception comes up when hiveserver2 and hiveserver are started simultaneously. metastore service status is running but when I launch hive cli, I get following metastore connection error: C:\apps\dist\hive-0.13.0.2.1.2.0-1660\binhive.cmd 14/05/09 17:40:03 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no l onger has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.2. 0-1660/conf/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.lang.RuntimeExceptio n: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav a:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.h ive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore Utils.java:1413) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry ingMetaStoreClient.java:62) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret ryingMetaStoreClient.java:72) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja va:2444) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav a:341) ... 7 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore Utils.java:1411) ... 12 more Caused by: MetaException(message:Could not connect to meta store using any of th e URIs provided. Most recent failure: org.apache.thrift.transport.TTransportExce ption: java.net.ConnectException: Connection refused: connect at org.apache.thrift.transport.TSocket.open(TSocket.java:185) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaSto reClient.java:336) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaS toreClient.java:214) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStore Utils.java:1411) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(Retry ingMetaStoreClient.java:62) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Ret ryingMetaStoreClient.java:72) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.ja va:2444) at
[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
[ https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7192: -- Attachment: HIVE-7192.patch uploading patch Hive Streaming - Some required settings are not mentioned in the documentation -- Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7192.patch Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
[ https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7192: -- Status: Patch Available (was: Open) Hive Streaming - Some required settings are not mentioned in the documentation -- Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7192.patch Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive
[ https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7187: --- Assignee: Ashutosh Chauhan Status: Patch Available (was: Open) Reconcile jetty versions in hive Key: HIVE-7187 URL: https://issues.apache.org/jira/browse/HIVE-7187 Project: Hive Issue Type: Bug Components: HiveServer2, Web UI, WebHCat Reporter: Vaibhav Gumashta Assignee: Ashutosh Chauhan Attachments: HIVE-7187.patch Hive root pom has 3 parameters for specifying jetty dependency versions: {code} jetty.version6.1.26/jetty.version jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version {code} 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). We should probably use the same jetty version for all hive components. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7187) Reconcile jetty versions in hive
[ https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7187: --- Attachment: HIVE-7187.patch Reconcile jetty versions in hive Key: HIVE-7187 URL: https://issues.apache.org/jira/browse/HIVE-7187 Project: Hive Issue Type: Bug Components: HiveServer2, Web UI, WebHCat Reporter: Vaibhav Gumashta Attachments: HIVE-7187.patch Hive root pom has 3 parameters for specifying jetty dependency versions: {code} jetty.version6.1.26/jetty.version jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version {code} 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). We should probably use the same jetty version for all hive components. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22170/#review44974 --- Ship it! Ship It! - Prasanth_J On June 6, 2014, 4:30 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22170/ --- (Updated June 6, 2014, 4:30 p.m.) Review request for hive and Prasanth_J. Bugs: HIVE-7168 https://issues.apache.org/jira/browse/HIVE-7168 Repository: hive-git Description --- analyze table T compute statistics for columns; will now compute stats for all columns. Diffs - metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1245d80 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 5b77e6f ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 03b536f Diff: https://reviews.apache.org/r/22170/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
Review Request 22328: Make hive use one jetty version.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22328/ --- Review request for hive, Eugene Koifman and Vaibhav Gumashta. Bugs: HIVE-7187 https://issues.apache.org/jira/browse/HIVE-7187 Repository: hive Description --- Make hive use one jetty version. Diffs - trunk/hcatalog/webhcat/svr/pom.xml 1600966 trunk/hwi/pom.xml 1600966 trunk/pom.xml 1600992 trunk/service/pom.xml 1600966 trunk/shims/0.20/pom.xml 1600966 trunk/shims/0.20S/pom.xml 1600966 trunk/shims/0.23/pom.xml 1600966 Diff: https://reviews.apache.org/r/22328/diff/ Testing --- Manually built and ran few tests. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020423#comment-14020423 ] Prasanth J commented on HIVE-7168: -- +1 Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020435#comment-14020435 ] Ivan Mitic commented on HIVE-7190: -- Review board: https://reviews.apache.org/r/22329/ WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Ivan Mitic Attachments: HIVE-7190.2.patch, HIVE-7190.patch Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated HIVE-7190: - Attachment: HIVE-7190.2.patch Rebasing patch against latest hive trunk. WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Ivan Mitic Attachments: HIVE-7190.2.patch, HIVE-7190.patch Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: Hive Streaming Ingest API for v4 patch.pdf) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: Hive Streaming Ingest API for v4 patch.pdf updating 'Hive Streaming Ingest API for v4 patch.pdf' document with requirements Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7138) add row index dump capability to ORC file dump
[ https://issues.apache.org/jira/browse/HIVE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020445#comment-14020445 ] Owen O'Malley commented on HIVE-7138: - +1, but I'd like to use --rowindex instead of -rowindex add row index dump capability to ORC file dump -- Key: HIVE-7138 URL: https://issues.apache.org/jira/browse/HIVE-7138 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7138.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020474#comment-14020474 ] Hive QA commented on HIVE-7168: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648654/HIVE-7168.2.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 5585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/400/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-400/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648654 Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7193) Hive should support additional LDAP authentication parameters
Mala Chikka Kempanna created HIVE-7193: -- Summary: Hive should support additional LDAP authentication parameters Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2. property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property We need to include other LDAP properties as part of hive-LDAP authentication like below a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22328: Make hive use one jetty version.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22328/#review44984 --- Ship it! Ship It! - Vaibhav Gumashta On June 6, 2014, 9:54 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22328/ --- (Updated June 6, 2014, 9:54 p.m.) Review request for hive, Eugene Koifman and Vaibhav Gumashta. Bugs: HIVE-7187 https://issues.apache.org/jira/browse/HIVE-7187 Repository: hive Description --- Make hive use one jetty version. Diffs - trunk/hcatalog/webhcat/svr/pom.xml 1600966 trunk/hwi/pom.xml 1600966 trunk/pom.xml 1600992 trunk/service/pom.xml 1600966 trunk/shims/0.20/pom.xml 1600966 trunk/shims/0.20S/pom.xml 1600966 trunk/shims/0.23/pom.xml 1600966 Diff: https://reviews.apache.org/r/22328/diff/ Testing --- Manually built and ran few tests. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-7187) Reconcile jetty versions in hive
[ https://issues.apache.org/jira/browse/HIVE-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020531#comment-14020531 ] Vaibhav Gumashta commented on HIVE-7187: +1 (pending tests). [~ekoifman] How about we handle the upgrade to new jetty version in a new jira? Reconcile jetty versions in hive Key: HIVE-7187 URL: https://issues.apache.org/jira/browse/HIVE-7187 Project: Hive Issue Type: Bug Components: HiveServer2, Web UI, WebHCat Reporter: Vaibhav Gumashta Assignee: Ashutosh Chauhan Attachments: HIVE-7187.patch Hive root pom has 3 parameters for specifying jetty dependency versions: {code} jetty.version6.1.26/jetty.version jetty.webhcat.version7.6.0.v20120127/jetty.webhcat.version jetty.hive-service.version7.6.0.v20120127/jetty.hive-service.version {code} 1st one is used by HWI, 2nd by WebHCat and 3rd by HiveServer2 (in http mode). We should probably use the same jetty version for all hive components. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6394: Attachment: HIVE-6394.5.patch Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6394: Attachment: (was: HIVE-6394.5.patch) Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6394: Attachment: HIVE-6394.6.patch Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020564#comment-14020564 ] Szehon Ho commented on HIVE-6394: - Attaching another patch. Was using a parquet-example class, now explicitly adding that logic in the serde layer. Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.5.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/ --- (Updated June 7, 2014, 12:06 a.m.) Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. Changes --- One more change, adding the 'NanoTime' class in Hive, as it was an example class in parquet. Let's go with using un-annotated INT96 for parquet, that's what other consuming applications have been doing. When the annotation does come, we'll move to that. Bugs: HIVE-6394 https://issues.apache.org/jira/browse/HIVE-6394 Repository: hive-git Description --- This uses the Jodd library to convert java.sql.Timestamp type used by Hive into the {julian-day:nanos} format expected by parquet, and vice-versa. Diffs (updated) - data/files/parquet_types.txt 0be390b pom.xml 4bb8880 ql/pom.xml 13c477a ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 4da0d30 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 29f7e11 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 57161d8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java fb2f5a8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 3490061 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_types.q 5d6333c ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 Diff: https://reviews.apache.org/r/22174/diff/ Testing --- Unit tests the new libraries, and also added timestamp data in the parquet_types q-test. Thanks, Szehon Ho
[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-7094: - Component/s: HCatalog Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review44992 --- 1. I think webhcat-default.xml should be modified to include the jars that are now required in templeton.libjars to minimize out-of-the-box config for end users. 2. Is there any test (e2e) that can be added for this? (with reasonable amount of effort) 3. When you tested that Pig/Hive jobs get properly tagged, you mean you tested that MR jobs that are generated by Pig/Hive are tagged, correct? hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java https://reviews.apache.org/r/22329/#comment79625 I think it would be useful to add a more detailed description of these props. Something like what is in the JIRA ticket. I would have added the ticket number to the comment, but Hive prohibits that. hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java https://reviews.apache.org/r/22329/#comment79632 Which user will this use? Is it the user running WebHCat or the value of 'doAs' parameter? shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java https://reviews.apache.org/r/22329/#comment79613 Is LOG.info() the right log level? Seems like it will pollute the log file. shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java https://reviews.apache.org/r/22329/#comment79615 Is LOG.info() the right level? shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java https://reviews.apache.org/r/22329/#comment79631 log level - Eugene Koifman On June 6, 2014, 10:02 p.m., Ivan Mitic wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- (Updated June 6, 2014, 10:02 p.m.) Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Fix Version/s: 0.14.0 Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Attachment: HIVE-6473.6.patch Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. Will address it in a follow-on ticket. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit
[ https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020636#comment-14020636 ] Eugene Koifman commented on HIVE-7155: -- +1 WebHCat controller job exceeds container memory limit - Key: HIVE-7155 URL: https://issues.apache.org/jira/browse/HIVE-7155 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HIVE-7155.1.patch, HIVE-7155.patch Submit a Hive query on a large table via WebHCat results in failure because the WebHCat controller job is killed by Yarn since it exceeds the memory limit (set by mapreduce.map.memory.mb, defaults to 1GB): {code} INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from Stage_InjusticeEvents where LogTimestamp '2014-03-01 00:00:00' and LogTimestamp = '2014-03-01 01:00:00'; {code} We could increase mapreduce.map.memory.mb to solve this problem, but this way we are changing this setting system wise. We need to provide a WebHCat configuration to overwrite mapreduce.map.memory.mb when submitting the controller job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2365) SQL support for bulk load into HBase
[ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-2365: --- Attachment: HIVE-2365.3.patch Rebased onto HIVE-6473 patch v6. SQL support for bulk load into HBase Key: HIVE-2365 URL: https://issues.apache.org/jira/browse/HIVE-2365 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: John Sichi Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-2365.2.patch.txt, HIVE-2365.3.patch, HIVE-2365.3.patch, HIVE-2365.WIP.00.patch, HIVE-2365.WIP.01.patch, HIVE-2365.WIP.01.patch Support the as simple as this SQL for bulk load from Hive into HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
On June 7, 2014, 1:05 a.m., Eugene Koifman wrote: 1. I think webhcat-default.xml should be modified to include the jars that are now required in templeton.libjars to minimize out-of-the-box config for end users. 2. Is there any test (e2e) that can be added for this? (with reasonable amount of effort) 3. When you tested that Pig/Hive jobs get properly tagged, you mean you tested that MR jobs that are generated by Pig/Hive are tagged, correct? 4. Actually, instead of doing 1, could WebHCat dynamically figure out which hadoop version it's talking to and add only the necessary shim jar, rather than shipping all of them? It reduces the amount of config needed. It would also be better if we can only ship the minimal set of jars. - Eugene --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review44992 --- On June 6, 2014, 10:02 p.m., Ivan Mitic wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- (Updated June 6, 2014, 10:02 p.m.) Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020642#comment-14020642 ] Dr. Wendell Urth commented on HIVE-7175: Hi [~hiveqa], none of the failed tests appear related to the small additive change specific to BeeLine done here. These tests appear to be generally failing on trunk, and are not caused by this patch. Let me know if I am wrong. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7191) optimized map join hash table has a bug when it reaches 2Gb
[ https://issues.apache.org/jira/browse/HIVE-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020660#comment-14020660 ] Hive QA commented on HIVE-7191: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648727/HIVE-7191.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5510 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/401/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-401/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648727 optimized map join hash table has a bug when it reaches 2Gb --- Key: HIVE-7191 URL: https://issues.apache.org/jira/browse/HIVE-7191 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7191.patch Via [~t3rmin4t0r]: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -204 at java.util.ArrayList.elementData(ArrayList.java:371) at java.util.ArrayList.get(ArrayList.java:384) at org.apache.hadoop.hive.serde2.WriteBuffers.setReadPoint(WriteBuffers.java:95) at org.apache.hadoop.hive.serde2.WriteBuffers.hashCode(WriteBuffers.java:100) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:203) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:266) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:124) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
[ https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020695#comment-14020695 ] Hive QA commented on HIVE-7192: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648732/HIVE-7192.patch {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 5585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_part org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/402/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/402/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648732 Hive Streaming - Some required settings are not mentioned in the documentation -- Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7192.patch Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition
[ https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020720#comment-14020720 ] Ashish Kumar Singh commented on HIVE-7117: -- Thanks [~szehon], [~xuefuz] and [~swarnim] for reviewing. Partitions not inheriting table permissions after alter rename partition Key: HIVE-7117 URL: https://issues.apache.org/jira/browse/HIVE-7117 Project: Hive Issue Type: Bug Components: Security Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Fix For: 0.14.0 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, HIVE-7117.patch On altering/renaming a partition it must inherit permission of the parent directory, if the flag hive.warehouse.subdir.inherit.perms is set. -- This message was sent by Atlassian JIRA (v6.2#6252)