[jira] [Commented] (HIVE-9766) Add JavaConstantXXXObjectInspector
[ https://issues.apache.org/jira/browse/HIVE-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381654#comment-14381654 ] Jason Dere commented on HIVE-9766: -- +1 if tests pass Add JavaConstantXXXObjectInspector -- Key: HIVE-9766 URL: https://issues.apache.org/jira/browse/HIVE-9766 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-9766.1.patch, HIVE-9766.2.patch, HIVE-9766.3.patch Need JavaConstantXXXObjectInspector when implementing PIG-3294. There are two approaches: 1. Add those classes in Pig. However, most construct of the base class JavaXXXObjectInspector is default scope, need to change them to protected 2. Add those classes in Hive Approach 2 should be better since those classes might be useful to Hive as well. Attach a patch to provide them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10095) format_number udf throws NPE
[ https://issues.apache.org/jira/browse/HIVE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381669#comment-14381669 ] Jason Dere commented on HIVE-10095: --- +1 if the tests pass format_number udf throws NPE Key: HIVE-10095 URL: https://issues.apache.org/jira/browse/HIVE-10095 Project: Hive Issue Type: Bug Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10095.1.patch For example {code} select format_number(cast(null as int), 0); FAILED: NullPointerException null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9767) Fixes in Hive UDF to be usable in Pig
[ https://issues.apache.org/jira/browse/HIVE-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381621#comment-14381621 ] Hive QA commented on HIVE-9767: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707385/HIVE-9767.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8346 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3163/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3163/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3163/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707385 - PreCommit-HIVE-TRUNK-Build Fixes in Hive UDF to be usable in Pig - Key: HIVE-9767 URL: https://issues.apache.org/jira/browse/HIVE-9767 Project: Hive Issue Type: Bug Components: UDF Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-9767.1.patch, HIVE-9767.2.patch, HIVE-9767.3.patch There are issues in UDF never get exposed because the execution path is never tested: # Assume the ObjectInspector to be WritableObjectInspector not the ObjectInspector pass to UDF # Assume the input parameter to be Writable not respecting the ObjectInspector pass to UDF # Assume ConstantObjectInspector to be WritableConstantXXXObjectInspector # The InputObjectInspector does not match OutputObjectInspector of previous stage in UDAF # The execution path involving convertIfNecessary is never been tested Attach a patch to fix those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10091: - Attachment: HIVE-10091.2.patch Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381640#comment-14381640 ] Thejas M Nair commented on HIVE-10091: -- Addressing review comments in 2.patch Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10072) Add vectorization support for Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381702#comment-14381702 ] Hive QA commented on HIVE-10072: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707384/HIVE-10072.06.patch {color:green}SUCCESS:{color} +1 8347 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3164/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3164/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3164/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12707384 - PreCommit-HIVE-TRUNK-Build Add vectorization support for Hybrid Grace Hash Join Key: HIVE-10072 URL: https://issues.apache.org/jira/browse/HIVE-10072 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Fix For: 1.2.0 Attachments: HIVE-10072.01.patch, HIVE-10072.02.patch, HIVE-10072.03.patch, HIVE-10072.04.patch, HIVE-10072.05.patch, HIVE-10072.06.patch This task is to enable vectorization support for Hybrid Grace Hash Join feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6963) Beeline logs are printing on the console
[ https://issues.apache.org/jira/browse/HIVE-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381700#comment-14381700 ] Bing Li commented on HIVE-6963: --- Hi, Chinna Have you uploaded the latest patch? I tried the patch attached in this Jira, and found: 1. In order to launch bin/beeline, I need to add the following jars to HADOOP_CLASSPATH in bin/ext/beeline.sh hive/lib/hive-shims-0.23.jar hive/lib/hive-shims-common-secure.jar hive/lib/hive-shims-common.jar 2. The log file doesn't contain much info as the one for HiveCLI in its log file, it only has the following lines: [biadmin@bdvs1100 biadmin]$ cat hive.log 2015-02-13 06:53:50,145 INFO jdbc.Utils (Utils.java:parseURL(285)) - Supplied authorities: bdvs1100.svl.ibm.com:1 2015-02-13 06:53:50,149 INFO jdbc.Utils (Utils.java:parseURL(372)) - Resolved authority: bdvs1100.svl.ibm.com:1 2015-02-13 06:53:50,184 INFO jdbc.HiveConnection (HiveConnection.java:openTransport(191)) - Will try to open client transport with JDBC Uri: jdbc:hive2://9.123.2.21:1 Are they known issue or worked as design? Thank you. - Bing Beeline logs are printing on the console Key: HIVE-6963 URL: https://issues.apache.org/jira/browse/HIVE-6963 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-6963.patch beeline logs are not redirected to the log file. If log is redirected to log file, only required information will print on the console. This way it is more easy to read the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9766) Add JavaConstantXXXObjectInspector
[ https://issues.apache.org/jira/browse/HIVE-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381795#comment-14381795 ] Hive QA commented on HIVE-9766: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707391/HIVE-9766.3.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3165/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3165/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3165/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707391 - PreCommit-HIVE-TRUNK-Build Add JavaConstantXXXObjectInspector -- Key: HIVE-9766 URL: https://issues.apache.org/jira/browse/HIVE-9766 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-9766.1.patch, HIVE-9766.2.patch, HIVE-9766.3.patch Need JavaConstantXXXObjectInspector when implementing PIG-3294. There are two approaches: 1. Add those classes in Pig. However, most construct of the base class JavaXXXObjectInspector is default scope, need to change them to protected 2. Add those classes in Hive Approach 2 should be better since those classes might be useful to Hive as well. Attach a patch to provide them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10112) LLAP: query 17 tasks fail due to mapjoin issue
[ https://issues.apache.org/jira/browse/HIVE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383213#comment-14383213 ] Sergey Shelukhin commented on HIVE-10112: - I wonder if recent patch on trunk broke it... although I don't see problems without LLAP LLAP: query 17 tasks fail due to mapjoin issue -- Key: HIVE-10112 URL: https://issues.apache.org/jira/browse/HIVE-10112 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin {noformat} 2015-03-26 18:16:38,833 [TezTaskRunner_attempt_1424502260528_1696_1_07_00_0(container_1_1696_01_000220_sershe_20150326181607_188ab263-0a13-4528-b778-c803f378640d:1_Map 1_0_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: java.lang.AssertionError: Length is negative: -54 at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: Length is negative: -54 at org.apache.hadoop.hive.serde2.WriteBuffers$ByteSegmentRef.init(WriteBuffers.java:339) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:270) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.setFromOutput(MapJoinBytesTableContainer.java:429) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromVector(MapJoinBytesTableContainer.java:349) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.setMapJoinKey(VectorMapJoinOperator.java:222) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:310) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:252) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:163) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) {noformat} Tasks do appear to pass on retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383219#comment-14383219 ] Xuefu Zhang commented on HIVE-10073: Okay. Makes sense. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9859) Create bitwise left/right shift UDFs
[ https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9859: - Labels: TODOC1.2 (was: ) Create bitwise left/right shift UDFs Key: HIVE-9859 URL: https://issues.apache.org/jira/browse/HIVE-9859 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch, HIVE-9859.5.patch Signature: a b a b a b For example: {code} select 1 4, 8 2, 8 2; OK 16 2 2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10110) LLAP: port updates from HIVE-9555 to llap branch in preparation for trunk merge
[ https://issues.apache.org/jira/browse/HIVE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10110. - Resolution: Fixed Fix Version/s: llap LLAP: port updates from HIVE-9555 to llap branch in preparation for trunk merge --- Key: HIVE-10110 URL: https://issues.apache.org/jira/browse/HIVE-10110 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap Some stuff was updated based on CR feedback -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9859) Create bitwise left/right shift UDFs
[ https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383321#comment-14383321 ] Lefty Leverenz commented on HIVE-9859: -- Doc note: shiftleft(), shiftright(), and shiftrightunsigned() should be documented in the Built-in Functions section of Operators and UDFs, with version information and a link to this issue. * [Hive Operators and UDFs -- Built-in Functions | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inFunctions] Create bitwise left/right shift UDFs Key: HIVE-9859 URL: https://issues.apache.org/jira/browse/HIVE-9859 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch, HIVE-9859.5.patch Signature: a b a b a b For example: {code} select 1 4, 8 2, 8 2; OK 16 2 2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383238#comment-14383238 ] Chengxiang Li commented on HIVE-10073: -- +1 Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10091: - Attachment: HIVE-10091.4.patch 4.patch - fix classcast exception when non string first partitioning column is used Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch, HIVE-10091.3.patch, HIVE-10091.4.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382162#comment-14382162 ] Xuefu Zhang commented on HIVE-10073: Hi [~jxiang] and [~chengxiang li], before we patch this on Hive side, I think it's better to find the root cause. If the problem is due to Spark, we can bring up the problem to that community. So far, I'm not convinced that the problem is on hive side. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8817) Create unit test where we insert into an encrypted table and then read from it with pig
[ https://issues.apache.org/jira/browse/HIVE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382112#comment-14382112 ] Sergio Peña commented on HIVE-8817: --- Looks good. +1 Create unit test where we insert into an encrypted table and then read from it with pig --- Key: HIVE-8817 URL: https://issues.apache.org/jira/browse/HIVE-8817 Project: Hive Issue Type: Sub-task Affects Versions: encryption-branch Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: encryption-branch Attachments: HIVE-8817.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10078) Optionally allow logging of records processed in fixed intervals
[ https://issues.apache.org/jira/browse/HIVE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382252#comment-14382252 ] Gunther Hagleitner commented on HIVE-10078: --- Test failure is unrelated. Optionally allow logging of records processed in fixed intervals Key: HIVE-10078 URL: https://issues.apache.org/jira/browse/HIVE-10078 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10078.1.patch, HIVE-10078.2.patch Tasks today log progress (records in/records out) on an exponential scale (1, 10, 100, ...). Sometimes it's helpful to be able to switch to fixed interval. That can help debugging certain issues that look like a hang, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10085) Lateral view on top of a view throws RuntimeException
[ https://issues.apache.org/jira/browse/HIVE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10085: Attachment: HIVE-10085.patch Fixed some unit tests baseline. The failure from other 2 unit tests seems unrelated. Lateral view on top of a view throws RuntimeException - Key: HIVE-10085 URL: https://issues.apache.org/jira/browse/HIVE-10085 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10085.patch Following the following sqls to create table and view and execute a select statement. It will throw the runtime exception: {noformat} FAILED: RuntimeException org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: map or list is expected at function SIZE, but int is found {noformat} {noformat} CREATE TABLE t1( symptom STRING, pattern ARRAYINT, occurrence INT, index INT); CREATE OR REPLACE VIEW v1 AS SELECT TRIM(pd.symptom) AS symptom, pd.index, pd.pattern, pd.occurrence, pd.occurrence as cnt from t1 pd; SELECT pattern_data.symptom, pattern_data.index, pattern_data.occurrence, pattern_data.cnt, size(pattern_data.pattern) as pattern_length, pattern.pattern_id FROM v1 pattern_data LATERAL VIEW explode(pattern) pattern AS pattern_id; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10038) Add Calcite's ProjectMergeRule.
[ https://issues.apache.org/jira/browse/HIVE-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382255#comment-14382255 ] Hive QA commented on HIVE-10038: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707399/HIVE-10038.4.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 8347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_id1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_leadlag org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3168/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3168/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3168/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707399 - PreCommit-HIVE-TRUNK-Build Add Calcite's ProjectMergeRule. --- Key: HIVE-10038 URL: https://issues.apache.org/jira/browse/HIVE-10038 Project: Hive Issue Type: New Feature Components: CBO, Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10038.2.patch, HIVE-10038.3.patch, HIVE-10038.4.patch, HIVE-10038.patch Helps to improve latency by shortening operator pipeline. Folds adjacent projections in one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382253#comment-14382253 ] Jimmy Xiang commented on HIVE-10073: [~xuefuz], I think it's an issue on Hive side. In SparkRecordHandler, we use the job conf passed in from Hive. So it should be Hive's responsibility to make sure it has all the needed information. [~chengxiang li], though I called checkOutputSpecs for both MapWork and ReduceWork, I agree with you that it is better to call it in SparkPlanGenerator::generate(BaseWork work). Let me upload a new patch. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9766) Add JavaConstantXXXObjectInspector
[ https://issues.apache.org/jira/browse/HIVE-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-9766: - Attachment: HIVE-9766.4.patch Don't believe test failures are related. Rerun the tests. Add JavaConstantXXXObjectInspector -- Key: HIVE-9766 URL: https://issues.apache.org/jira/browse/HIVE-9766 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-9766.1.patch, HIVE-9766.2.patch, HIVE-9766.3.patch, HIVE-9766.4.patch Need JavaConstantXXXObjectInspector when implementing PIG-3294. There are two approaches: 1. Add those classes in Pig. However, most construct of the base class JavaXXXObjectInspector is default scope, need to change them to protected 2. Add those classes in Hive Approach 2 should be better since those classes might be useful to Hive as well. Attach a patch to provide them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10100) Warning yarn jar instead of hadoop jar in hadoop 2.7.0
[ https://issues.apache.org/jira/browse/HIVE-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10100: -- Priority: Blocker (was: Major) Warning yarn jar instead of hadoop jar in hadoop 2.7.0 -- Key: HIVE-10100 URL: https://issues.apache.org/jira/browse/HIVE-10100 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Priority: Blocker HADOOP-11257 adds a warning to stdout {noformat} WARNING: Use yarn jar to launch YARN applications. {noformat} which will cause issues if untreated with folks that programatically parse stdout for query results (i.e.: CLI, silent mode, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9780) Add another level of explain for RDBMS audience
[ https://issues.apache.org/jira/browse/HIVE-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9780: -- Attachment: HIVE-9780.04.patch Add another level of explain for RDBMS audience --- Key: HIVE-9780 URL: https://issues.apache.org/jira/browse/HIVE-9780 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-9780.01.patch, HIVE-9780.02.patch, HIVE-9780.03.patch, HIVE-9780.04.patch Current Hive Explain (default) is targeted at MR Audience. We need a new level of explain plan to be targeted at RDBMS audience. The explain requires these: 1) The focus needs to be on what part of the query is being executed rather than internals of the engines 2) There needs to be a clearly readable tree of operations 3) Examples - Table scan should mention the table being scanned, the Sarg, the size of table and expected cardinality after the Sarg'ed read. The join should mention the table being joined with and the join condition. The aggregate should mention the columns in the group-by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join
[ https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9937: --- Attachment: HIVE-9937.07.patch LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join -- Key: HIVE-9937 URL: https://issues.apache.org/jira/browse/HIVE-9937 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, HIVE-9937.06.patch, HIVE-9937.07.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HIVE-10100) Warning yarn jar instead of hadoop jar in hadoop 2.7.0
[ https://issues.apache.org/jira/browse/HIVE-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner moved HADOOP-11756 to HIVE-10100: Key: HIVE-10100 (was: HADOOP-11756) Project: Hive (was: Hadoop Common) Warning yarn jar instead of hadoop jar in hadoop 2.7.0 -- Key: HIVE-10100 URL: https://issues.apache.org/jira/browse/HIVE-10100 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner HADOOP-11257 adds a warning to stdout {noformat} WARNING: Use yarn jar to launch YARN applications. {noformat} which will cause issues if untreated with folks that programatically parse stdout for query results (i.e.: CLI, silent mode, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
[ https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382588#comment-14382588 ] Pengcheng Xiong commented on HIVE-10062: The two failed test cases are unrelated and they passed on my laptop. HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data - Key: HIVE-10062 URL: https://issues.apache.org/jira/browse/HIVE-10062 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-10062.01.patch In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1500 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-10093: --- Assignee: Aihua Xu Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
[ https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382458#comment-14382458 ] Hive QA commented on HIVE-10062: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707419/HIVE-10062.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8349 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_skewtable {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3169/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3169/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3169/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707419 - PreCommit-HIVE-TRUNK-Build HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data - Key: HIVE-10062 URL: https://issues.apache.org/jira/browse/HIVE-10062 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-10062.01.patch In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1500 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10053) Override new init API fom ReadSupport instead of the deprecated one
[ https://issues.apache.org/jira/browse/HIVE-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382118#comment-14382118 ] Sergio Peña commented on HIVE-10053: +1 Override new init API fom ReadSupport instead of the deprecated one --- Key: HIVE-10053 URL: https://issues.apache.org/jira/browse/HIVE-10053 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10053.1.patch, HIVE-10053.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382149#comment-14382149 ] Alan Gates commented on HIVE-10091: --- When I run this against a real hbase instance I get: {code} Caused by: MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5141) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4369) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4352) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByExpr(HiveMetaStoreClient.java:1079) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.HiveMetaStoreClientTimingProxy.invoke(HiveMetaStoreClientTimingProxy.java:102) at com.sun.proxy.$Proxy14.listPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:2129) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:371) ... 48 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.hbase.HBaseFilterPlanUtil.getFilterPlan(HBaseFilterPlanUtil.java:486) at org.apache.hadoop.hive.metastore.hbase.HBaseStore.getPartitionsByExprInternal(HBaseStore.java:487) at org.apache.hadoop.hive.metastore.hbase.HBaseStore.getPartitionsByExpr(HBaseStore.java:474) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4347) {code} Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382401#comment-14382401 ] Szehon Ho commented on HIVE-10093: -- FYI [~aihuaxu] Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Priority: Minor When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10093: - Description: When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. was: When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HS2, causing confusion. This could potentially be skipped if MemoryTokenStore is used. Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Priority: Minor When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10097) CBO (Calcite Return Path): Upgrade to new Calcite snapshot [CBO Branch]
[ https://issues.apache.org/jira/browse/HIVE-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran resolved HIVE-10097. --- Resolution: Fixed CBO (Calcite Return Path): Upgrade to new Calcite snapshot [CBO Branch] --- Key: HIVE-10097 URL: https://issues.apache.org/jira/browse/HIVE-10097 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: cbo-branch Attachments: HIVE-10097.cbo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382413#comment-14382413 ] Thejas M Nair commented on HIVE-10091: -- I am able to reproduce this using a simpler query, by just having a condition on non partitioning column in where condition- {code} create table t1( i int) partitioned by (dt string); select * from t1 where i 0 and dt '1'; FAILED: SemanticException MetaException(message:java.lang.NullPointerException) {code} In the logs {code} 015-03-26 11:31:43,641 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(709)) - Pushdown Predicates of TS For Alias : t1 2015-03-26 11:31:43,641 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(712)) - (i 0) 2015-03-26 11:31:43,641 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(712)) - (dt '1') 2015-03-26 11:31:43,642 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner 2015-03-26 11:31:43,643 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(743)) - 0: get_partitions_by_expr : db=default tbl=t1 2015-03-26 11:31:43,643 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(356)) - ugi=thejas ip=unknown-ip-addr cmd=get_partitions_by_expr : db=default tbl=t1 2015-03-26 11:31:43,643 INFO [main]: metastore.PartFilterExprUtil (PartFilterExprUtil.java:makeExpressionTree(99)) - Unable to make the expression tree from expression string [(null and (dt '1'))]Error parsing partition filter; lexer error: null; exception NoViableAltException(11@[]) {code} The right fix for long term is to be able to correctly generate the expression tree for this query as well, and use it to just get the right partitions. I think this sort of query is common, and optimizing it would be useful for tables with large number of partitions. What ObjectStore does when the above parsing fails, is to get all partition names for the table from the RDBMS, evaluate the expr on the partition names to get a pruned set of partition names, and then again get the partitions with those names from RDBMS. Looking into using similar approach in this case. Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10099) Enable constant folding for Decimal
[ https://issues.apache.org/jira/browse/HIVE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10099: Attachment: HIVE-10099.patch Enable constant folding for Decimal --- Key: HIVE-10099 URL: https://issues.apache.org/jira/browse/HIVE-10099 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10099.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382380#comment-14382380 ] Mohit Sabharwal commented on HIVE-9518: --- [~apivovarov] Left a comment on RB. Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The time part should be ignored. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10085) Lateral view on top of a view throws RuntimeException
[ https://issues.apache.org/jira/browse/HIVE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383029#comment-14383029 ] Hive QA commented on HIVE-10085: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707520/HIVE-10085.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8677 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3172/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3172/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3172/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707520 - PreCommit-HIVE-TRUNK-Build Lateral view on top of a view throws RuntimeException - Key: HIVE-10085 URL: https://issues.apache.org/jira/browse/HIVE-10085 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10085.patch Following the following sqls to create table and view and execute a select statement. It will throw the runtime exception: {noformat} FAILED: RuntimeException org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: map or list is expected at function SIZE, but int is found {noformat} {noformat} CREATE TABLE t1( symptom STRING, pattern ARRAYINT, occurrence INT, index INT); CREATE OR REPLACE VIEW v1 AS SELECT TRIM(pd.symptom) AS symptom, pd.index, pd.pattern, pd.occurrence, pd.occurrence as cnt from t1 pd; SELECT pattern_data.symptom, pattern_data.index, pattern_data.occurrence, pattern_data.cnt, size(pattern_data.pattern) as pattern_length, pattern.pattern_id FROM v1 pattern_data LATERAL VIEW explode(pattern) pattern AS pattern_id; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9766) Add JavaConstantXXXObjectInspector
[ https://issues.apache.org/jira/browse/HIVE-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383032#comment-14383032 ] Hive QA commented on HIVE-9766: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707543/HIVE-9766.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3173/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3173/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3173/ Messages: {noformat} This message was trimmed, see log for full details [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-common --- [INFO] Compiling 18 source files to /data/hive-ptest/working/apache-svn-trunk-source/common/target/test-classes [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java: /data/hive-ptest/working/apache-svn-trunk-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-common --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-common --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/common/target/hive-common-1.2.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-common --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-common --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/common/target/hive-common-1.2.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-common/1.2.0-SNAPSHOT/hive-common-1.2.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/common/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-common/1.2.0-SNAPSHOT/hive-common-1.2.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Serde 1.2.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/serde (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-serde --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde --- [INFO] Source directory: /data/hive-ptest/working/apache-svn-trunk-source/serde/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-svn-trunk-source/serde/src/gen/thrift/gen-javabean added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-serde --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/serde/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde --- [INFO] Compiling 399 source files to /data/hive-ptest/working/apache-svn-trunk-source/serde/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaConstantShortObjectInspector.java:[57,1] class, interface, or enum expected [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaConstantShortObjectInspector.java:[59,1] class, interface, or enum expected [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaConstantShortObjectInspector.java:[60,1] class, interface, or enum expected [ERROR]
[jira] [Updated] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
[ https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10066: -- Attachment: HIVE-10066.2.patch patch 2 addresses comments from [~thejas] Hive on Tez job submission through WebHCat doesn't ship Tez artifacts - Key: HIVE-10066 URL: https://issues.apache.org/jira/browse/HIVE-10066 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10066.2.patch, HIVE-10066.patch From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9752) Documentation for HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383092#comment-14383092 ] Lefty Leverenz commented on HIVE-9752: -- [~thejas] added preliminary documents to the design docs In Progress section: * [HBase Metastore Development Guide | https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide] * [Hbase execution plans for RawStore partition filter condition | https://cwiki.apache.org/confluence/display/Hive/Hbase+execution+plans+for+RawStore+partition+filter+condition] Documentation for HBase metastore - Key: HIVE-9752 URL: https://issues.apache.org/jira/browse/HIVE-9752 Project: Hive Issue Type: Sub-task Components: Documentation Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates All of the documentation we will need to write for the HBase metastore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9752) Documentation for HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383092#comment-14383092 ] Lefty Leverenz edited comment on HIVE-9752 at 3/27/15 12:52 AM: [~thejas] added preliminary documents to the design docs In Progress section: * [HBase Metastore Development Guide | https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide] * [HBaseMetastoreApproach.pdf | https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf] * [Hbase execution plans for RawStore partition filter condition | https://cwiki.apache.org/confluence/display/Hive/Hbase+execution+plans+for+RawStore+partition+filter+condition] was (Author: le...@hortonworks.com): [~thejas] added preliminary documents to the design docs In Progress section: * [HBase Metastore Development Guide | https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide] * [Hbase execution plans for RawStore partition filter condition | https://cwiki.apache.org/confluence/display/Hive/Hbase+execution+plans+for+RawStore+partition+filter+condition] Documentation for HBase metastore - Key: HIVE-9752 URL: https://issues.apache.org/jira/browse/HIVE-9752 Project: Hive Issue Type: Sub-task Components: Documentation Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates All of the documentation we will need to write for the HBase metastore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9582) HCatalog should use IMetaStoreClient interface
[ https://issues.apache.org/jira/browse/HIVE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381910#comment-14381910 ] Hive QA commented on HIVE-9582: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707334/HIVE-9582.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8347 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3166/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3166/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3166/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707334 - PreCommit-HIVE-TRUNK-Build HCatalog should use IMetaStoreClient interface -- Key: HIVE-9582 URL: https://issues.apache.org/jira/browse/HIVE-9582 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Affects Versions: 0.14.0, 0.13.1 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Labels: hcatalog, metastore, rolling_upgrade Attachments: HIVE-9582.1.patch, HIVE-9582.2.patch, HIVE-9582.3.patch, HIVE-9582.4.patch, HIVE-9582.5.patch, HIVE-9583.1.patch Hive uses IMetaStoreClient and it makes using RetryingMetaStoreClient easy. Hence during a failure, the client retries and possibly succeeds. But HCatalog has long been using HiveMetaStoreClient directly and hence failures are costly, especially if they are during the commit stage of a job. Its also not possible to do rolling upgrade of MetaStore Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9518: -- Attachment: HIVE-9518.7.patch patch #7. UDF considers the difference in time components date1 and date2 now Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The time part should be ignored. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9518: -- Description: This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. was: This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The time part should be ignored. The result should be rounded to 8 decimal places. Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
[ https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10066: -- Attachment: HIVE-10066.patch added a new property to webhcat config to specify additional artifacts to include with Hive job submission. This can be used, in particular, to ship Tez client to the node actually executing the command. [~thejas], could you review please? Hive on Tez job submission through WebHCat doesn't ship Tez artifacts - Key: HIVE-10066 URL: https://issues.apache.org/jira/browse/HIVE-10066 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10066.patch From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
[ https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10066: -- Description: From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. was: From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which would also include sending hive-site.xml to LaunchMapper so that there is no duplication between hive-site.xml and templeton.hive.properties in webhcat-site.xml. Hive on Tez job submission through WebHCat doesn't ship Tez artifacts - Key: HIVE-10066 URL: https://issues.apache.org/jira/browse/HIVE-10066 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10101) LLAP: enable yourkit profiling of tasks
[ https://issues.apache.org/jira/browse/HIVE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382561#comment-14382561 ] Sergey Shelukhin commented on HIVE-10101: - TezProcessor has most of runtime changes, YourkitDumper is some xml file parser, used for prototype stuff LLAP: enable yourkit profiling of tasks --- Key: HIVE-10101 URL: https://issues.apache.org/jira/browse/HIVE-10101 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10101.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10101) LLAP: enable yourkit profiling of tasks
[ https://issues.apache.org/jira/browse/HIVE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382598#comment-14382598 ] Sergey Shelukhin commented on HIVE-10101: - For reference, the YK binaries in question use the 3-clause BSD licence: {noformat} $ cat ./Resources/license-redist.txt The following files can be redistributed under the license below: yjpagent.dll libyjpagent.so libyjpagent.jnilib yjp-controller-api-redist.jar --- Copyright (c) 2003-2015, YourKit All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of YourKit nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY YOURKIT AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL YOURKIT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. {noformat} LLAP: enable yourkit profiling of tasks --- Key: HIVE-10101 URL: https://issues.apache.org/jira/browse/HIVE-10101 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10101.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster
[ https://issues.apache.org/jira/browse/HIVE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-10098: Attachment: HIVE-10098.1.patch HS2 local task for map join fails in KMS encrypted cluster -- Key: HIVE-10098 URL: https://issues.apache.org/jira/browse/HIVE-10098 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10098.1.patch Env: KMS was enabled after cluster was kerberos secured. Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails with a java.lang.reflect.UndeclaredThrowableException from KMSClientProvider.addDelegationTokens. {code} 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map local work failed java.io.IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559) ... 9 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808) ... 18 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
[ https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382599#comment-14382599 ] Pengcheng Xiong commented on HIVE-10062: The RB is ready. [~hagleitn] and [~vikram.dixit], could you please take a look? Thanks. HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data - Key: HIVE-10062 URL: https://issues.apache.org/jira/browse/HIVE-10062 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-10062.01.patch In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1500 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382771#comment-14382771 ] Aihua Xu commented on HIVE-10093: - RB: https://reviews.apache.org/r/32551/ Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382638#comment-14382638 ] Hive QA commented on HIVE-9518: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707453/HIVE-9518.6.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8680 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hive.spark.client.TestSparkClient.testJobSubmission {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3170/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3170/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3170/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707453 - PreCommit-HIVE-TRUNK-Build Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The time part should be ignored. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9727) GroupingID translation from Calcite
[ https://issues.apache.org/jira/browse/HIVE-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9727: --- Component/s: Query Planning GroupingID translation from Calcite --- Key: HIVE-9727 URL: https://issues.apache.org/jira/browse/HIVE-9727 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-9727.01.patch, HIVE-9727.02.patch, HIVE-9727.03.patch, HIVE-9727.04.patch, HIVE-9727.patch The translation from Calcite back to Hive might produce wrong results while interacting with other Calcite optimization rules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382727#comment-14382727 ] Aihua Xu commented on HIVE-10093: - [~szehon] Can you take a look at the change? Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10027) Use descriptions from Avro schema files in column comments
[ https://issues.apache.org/jira/browse/HIVE-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10027: --- Attachment: HIVE-10027.1.patch Uploaded a new patch based on review. Thanks [~szehon] and [~xuefuz] for review. Use descriptions from Avro schema files in column comments -- Key: HIVE-10027 URL: https://issues.apache.org/jira/browse/HIVE-10027 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.13.1 Reporter: Jeremy Beard Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10027.1.patch, HIVE-10027.patch Avro schema files can include field descriptions using the doc tag. It would be helpful if the Hive metastore would use these descriptions as the comments for a field when the table is backed by such a schema file, instead of the default from deserializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
[ https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10066: -- Component/s: WebHCat Tez Hive on Tez job submission through WebHCat doesn't ship Tez artifacts - Key: HIVE-10066 URL: https://issues.apache.org/jira/browse/HIVE-10066 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10091: - Attachment: HIVE-10091.3.patch I workaround that NPE in 3.patch, by doing full scan of all partitions in the hbase table. ie, it degrades to the situation without the patch, if there is a condition on any non partitioning column in the where clause. I have created a followup jira for HIVE-10102, to do further pruning as in ObjectStore. I need to think some more about the approach to follow in hbase metastore. Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch, HIVE-10091.3.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9688) Support SAMPLE operator in hive
[ https://issues.apache.org/jira/browse/HIVE-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9688: Labels: hive java (was: gsoc gsoc2015 hive java) Support SAMPLE operator in hive --- Key: HIVE-9688 URL: https://issues.apache.org/jira/browse/HIVE-9688 Project: Hive Issue Type: New Feature Reporter: Prasanth Jayachandran Labels: hive, java Hive needs SAMPLE operator to support parallel order by, skew joins and count + distinct optimizations. Random, Reservoir and Stratified sampling should cover most of the cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs
[ https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10104: -- Attachment: HIVE-10104.1.txt Patch to order the original splits by size and name. Location is based on a hash of the filename and start position. [~hagleitn] - could you please take a quick look for sanity. Will commit after I'm able to test it a bit on a cluster larger than 1 node. LLAP: Generate consistent splits and locations for the same split across jobs - Key: HIVE-10104 URL: https://issues.apache.org/jira/browse/HIVE-10104 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10104.1.txt Locations for splits are currently randomized. Also, the order of splits is random - depending on how threads end up generating the splits. Add an option to sort the splits, and generate repeatable locations - assuming all other factors are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10086: --- Attachment: (was: HIVE-10086.2.patch) Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.3.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10086: --- Attachment: HIVE-10086.3.patch Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.3.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382712#comment-14382712 ] Hive QA commented on HIVE-10073: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707558/HIVE-10073.3-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7644 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-807/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707558 - PreCommit-HIVE-SPARK-Build Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382730#comment-14382730 ] Szehon Ho commented on HIVE-10093: -- Can you create a rb for this? Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10096) Investigate the random failure of TestCliDriver.testCliDriver_udaf_percentile_approx_23
[ https://issues.apache.org/jira/browse/HIVE-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-10096: --- Assignee: Aihua Xu Investigate the random failure of TestCliDriver.testCliDriver_udaf_percentile_approx_23 --- Key: HIVE-10096 URL: https://issues.apache.org/jira/browse/HIVE-10096 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor The unit test sometimes seems to fail with the following problem: Running: diff -a /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_percentile_approx_23.q.out /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out 628c628 256.0 --- 255.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs
[ https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10104: -- Attachment: HIVE-10104.2.txt Updated patch with the sort removed from the scheduler. Tested on a multi-node cluster. Will commit after the next rebase of the LLAP branch. LLAP: Generate consistent splits and locations for the same split across jobs - Key: HIVE-10104 URL: https://issues.apache.org/jira/browse/HIVE-10104 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10104.1.txt, HIVE-10104.2.txt Locations for splits are currently randomized. Also, the order of splits is random - depending on how threads end up generating the splits. Add an option to sort the splits, and generate repeatable locations - assuming all other factors are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join
[ https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382857#comment-14382857 ] Gopal V commented on HIVE-9937: --- [~mmccline]: Ran a few scale tests last night and there seems to be no visible issues with the patch from last night. General comment about asserts - the regular runtime turns off asserts, so you should be using Preconditions.check operations particularly if it is outside the core loop (like the futures.size). Need to re-verify the TODO in VectorAppMasterEventOperator - make sure nothing in the super.process actually buffers the Object[] row, since now the data is modified in-place, while earlier it was generating a new array for each row. This has no safety switch to turn off other than turn off vectorization, I'd like to see if [~mmokhtar] can get a full TPC-DS run for this. With this epic patch, the slowest part of a group-by is now the full-sort, which gives me something else to fix :) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join -- Key: HIVE-9937 URL: https://issues.apache.org/jira/browse/HIVE-9937 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, HIVE-9937.06.patch, HIVE-9937.07.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10069) CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch]
[ https://issues.apache.org/jira/browse/HIVE-10069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-10069: - Assignee: Laljo John Pullokkaran (was: Jesus Camacho Rodriguez) CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch] Key: HIVE-10069 URL: https://issues.apache.org/jira/browse/HIVE-10069 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran Fix For: cbo-branch Attachments: HIVE-10069.cbo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
[ https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382824#comment-14382824 ] Eugene Koifman commented on HIVE-10066: --- this is the way hadoop jar -files /foo/bar works. If bar is a directory, it will create bar/ in CWD of the task with contents of bar/. If bar is a file, it will create ./bar. Hive on Tez job submission through WebHCat doesn't ship Tez artifacts - Key: HIVE-10066 URL: https://issues.apache.org/jira/browse/HIVE-10066 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10066.patch From [~hitesh]: Tez is a client-side only component ( no daemons, etc ) and therefore it is meant to be installed on the gateway box ( or where its client libraries are needed by any other services’ daemons). It does not have any cluster dependencies both in terms of libraries/jars as well as configs. When it runs on a worker node, everything was pre-packaged and made available to the worker node via the distributed cache via the client code. Hence, its client-side configs are also only needed on the same (client) node as where it is installed. The only other install step needed is to have the tez tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which points to the HDFS path. We need a way to pass client jars and tez-site.xml to the LaunchMapper. We should create a general purpose mechanism here which can supply additional artifacts per job type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10069) CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch]
[ https://issues.apache.org/jira/browse/HIVE-10069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran resolved HIVE-10069. --- Resolution: Fixed CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch] Key: HIVE-10069 URL: https://issues.apache.org/jira/browse/HIVE-10069 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran Fix For: cbo-branch Attachments: HIVE-10069.1.patch, HIVE-10069.cbo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10069) CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch]
[ https://issues.apache.org/jira/browse/HIVE-10069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10069: -- Attachment: HIVE-10069.1.patch CBO (Calcite Return Path): Ambiguity table name causes problem in field trimmer [CBO Branch] Key: HIVE-10069 URL: https://issues.apache.org/jira/browse/HIVE-10069 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran Fix For: cbo-branch Attachments: HIVE-10069.1.patch, HIVE-10069.cbo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383152#comment-14383152 ] Chengxiang Li commented on HIVE-10073: -- [~xuefuz], the root cause should be just like Jimmy mentioned, some hbase table properties are set to JobConf during checkOutputSpecs, and this method is not invoked in HoS. Actually Spark checkout output specs while user build RDD graph with certain actions, like PairRDDFunctions::saveAsHadoopDataset, PairRDDFunctions::saveAsNewAPIHadoopDataset, in HoS, we use foreach as action, and write data to hadoop storage inside Hive, so it should be Hive's reponsbility to check output specs. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10093: Attachment: HIVE-10093.patch Address comments. Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10093: Attachment: (was: HIVE-10093.patch) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10114) Split strategies for ORC
[ https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10114: - Attachment: HIVE-10114.1.patch [~gopalv] fyi.. this is the first take of the patch.. Split strategies for ORC Key: HIVE-10114 URL: https://issues.apache.org/jira/browse/HIVE-10114 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10114.1.patch ORC split generation does not have clearly defined strategies for different scenarios (many small orc files, few small orc files, many large files etc.). Few strategies like storing the file footer in orc split, making entire file as a orc split already exists. This JIRA to make the split generation simpler, support different strategies for various use cases (BI, ETL, ACID etc.) and to lay the foundation for HIVE-7428. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10093) Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
[ https://issues.apache.org/jira/browse/HIVE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383199#comment-14383199 ] Szehon Ho commented on HIVE-10093: -- Thanks, +1 on latest patch pending test Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2 - Key: HIVE-10093 URL: https://issues.apache.org/jira/browse/HIVE-10093 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10093.patch When the HiveAuthFactory is constructed in HS2, it initializes a HMSHandler unnecessarily right before the call to: HadoopThriftAuthBridge.startDelegationTokenSecretManager(). If the DelegationTokenStore is configured to be a memoryTokenStore, this step is not needed. Side effect is creation of useless derby database file on HiveServer2 in secure clusters, causing confusion. This could potentially be skipped if MemoryTokenStore is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10112) LLAP: query 17 tasks fail due to mapjoin issue
[ https://issues.apache.org/jira/browse/HIVE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383205#comment-14383205 ] Sergey Shelukhin commented on HIVE-10112: - Probably something related {noformat} 0150326184304_716d1a10-3cf8-46d7-99f7-7892d1655bad:6_Map 1_3_0)] ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unexpected exception: 10385093 java.lang.ArrayIndexOutOfBoundsException: 10385093 at org.apache.hadoop.hive.serde2.WriteBuffers.readVLong(WriteBuffers.java:58) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.isSameKey(BytesBytesMultiHashMap.java:454) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.findKeyRefToRead(BytesBytesMultiHashMap.java:380) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:258) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.setFromOutput(MapJoinBytesTableContainer.java:429) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromVector(MapJoinBytesTableContainer.java:349) {noformat} LLAP: query 17 tasks fail due to mapjoin issue -- Key: HIVE-10112 URL: https://issues.apache.org/jira/browse/HIVE-10112 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin {noformat} 2015-03-26 18:16:38,833 [TezTaskRunner_attempt_1424502260528_1696_1_07_00_0(container_1_1696_01_000220_sershe_20150326181607_188ab263-0a13-4528-b778-c803f378640d:1_Map 1_0_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: java.lang.AssertionError: Length is negative: -54 at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: Length is negative: -54 at org.apache.hadoop.hive.serde2.WriteBuffers$ByteSegmentRef.init(WriteBuffers.java:339) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:270) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.setFromOutput(MapJoinBytesTableContainer.java:429) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromVector(MapJoinBytesTableContainer.java:349) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.setMapJoinKey(VectorMapJoinOperator.java:222) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:310) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:252) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:163) at
[jira] [Comment Edited] (HIVE-10112) LLAP: query 17 tasks fail due to mapjoin issue
[ https://issues.apache.org/jira/browse/HIVE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383205#comment-14383205 ] Sergey Shelukhin edited comment on HIVE-10112 at 3/27/15 2:04 AM: -- Probably something related {noformat} java.lang.ArrayIndexOutOfBoundsException: 10385093 at org.apache.hadoop.hive.serde2.WriteBuffers.readVLong(WriteBuffers.java:58) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.isSameKey(BytesBytesMultiHashMap.java:454) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.findKeyRefToRead(BytesBytesMultiHashMap.java:380) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:258) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.setFromOutput(MapJoinBytesTableContainer.java:429) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromVector(MapJoinBytesTableContainer.java:349) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.setMapJoinKey(VectorMapJoinOperator.java:222) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:310) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:252) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) {noformat} was (Author: sershe): Probably something related {noformat} 0150326184304_716d1a10-3cf8-46d7-99f7-7892d1655bad:6_Map 1_3_0)] ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unexpected exception: 10385093 java.lang.ArrayIndexOutOfBoundsException: 10385093 at org.apache.hadoop.hive.serde2.WriteBuffers.readVLong(WriteBuffers.java:58) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.isSameKey(BytesBytesMultiHashMap.java:454) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.findKeyRefToRead(BytesBytesMultiHashMap.java:380) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:258) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.setFromOutput(MapJoinBytesTableContainer.java:429) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromVector(MapJoinBytesTableContainer.java:349) {noformat} LLAP: query 17 tasks fail due to mapjoin issue -- Key: HIVE-10112 URL: https://issues.apache.org/jira/browse/HIVE-10112 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin {noformat} 2015-03-26 18:16:38,833 [TezTaskRunner_attempt_1424502260528_1696_1_07_00_0(container_1_1696_01_000220_sershe_20150326181607_188ab263-0a13-4528-b778-c803f378640d:1_Map 1_0_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: java.lang.AssertionError: Length is negative: -54 at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
[jira] [Commented] (HIVE-10099) Enable constant folding for Decimal
[ https://issues.apache.org/jira/browse/HIVE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383204#comment-14383204 ] Hive QA commented on HIVE-10099: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707554/HIVE-10099.patch {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 8676 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_round_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_expressions org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3174/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3174/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3174/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707554 - PreCommit-HIVE-TRUNK-Build Enable constant folding for Decimal --- Key: HIVE-10099 URL: https://issues.apache.org/jira/browse/HIVE-10099 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10099.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10074) Ability to run HCat Client Unit tests in a system test setting
[ https://issues.apache.org/jira/browse/HIVE-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381467#comment-14381467 ] Hive QA commented on HIVE-10074: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707316/HIVE-10074.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8337 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3161/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3161/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3161/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707316 - PreCommit-HIVE-TRUNK-Build Ability to run HCat Client Unit tests in a system test setting -- Key: HIVE-10074 URL: https://issues.apache.org/jira/browse/HIVE-10074 Project: Hive Issue Type: Bug Components: Tests Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-10074.1.patch, HIVE-10074.patch Following testsuite {{hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java}} is a JUnit testsuite to test some basic HCat client API. During setup it brings up a Hive Metastore with embedded Derby. The testsuite however will be even more useful if it can be run against a running Hive Metastore (transparent to whatever backing DB its running against). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381424#comment-14381424 ] Chengxiang Li commented on HIVE-10073: -- Hi, [~jxiang], I saw you only call checkOutputSpecs for ReduceWork, but there may be a FileSinkOperator in map-only job as well, so we may also need to checkOutputSpecs for MapWork. Besides, the checkOutputSpecs is invoked at SparkRecordHandler::init which would be executed for each task, SparkPlanGenerator::generate(BaseWork work) may be a better place to do this, we can checkOutputSpecs between clone jobconf and serialized jobconf, so this would only be checked once time at RSC side. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10091: - Summary: Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes (was: Generate Hbase execution plan for partition filter conditions in HbaseStore api calls) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9518: -- Attachment: HIVE-9518.6.patch patch #6. replaced javaStringObjectInspector with writableStringObjectInspector in junit test Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The time part should be ignored. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381517#comment-14381517 ] Thejas M Nair commented on HIVE-10091: -- I should have mentioned the remaining work along with the patch. Here it is - # Handle conditions that cannot be represented using the Scan api startRow or stopRow. This includes all conditions on partition columns that are not the first partition column, and != and LIKE expressions on first partition column. =, ,,=,= on first partition column are handled. These unsupported conditions need to be converted to a Filter in the Scan api call. Right now, these unsupported conditions are treated like a 'true' boolean value. # Handle conditions on the first partition column where the data type is not a string type. This currently works for cases where the string representation byte order for the type is same as the real order for the type. ie, it does not work for types such as integer. To support this we need to change the serialization type of the keys so that the byte order of keys is same as the data type order. For this, I propose changing the key serialization to BinarySortableSerde format. bq. If I read HBaseFilterPlanUtil correctly this can handle non-boolean expressions on initial keys right now, but not booleans When you say boolean expressions, do you mean AND/OR expressions ? They are supported. Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10078) Optionally allow logging of records processed in fixed intervals
[ https://issues.apache.org/jira/browse/HIVE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381536#comment-14381536 ] Hive QA commented on HIVE-10078: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707321/HIVE-10078.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3162/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3162/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3162/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707321 - PreCommit-HIVE-TRUNK-Build Optionally allow logging of records processed in fixed intervals Key: HIVE-10078 URL: https://issues.apache.org/jira/browse/HIVE-10078 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10078.1.patch, HIVE-10078.2.patch Tasks today log progress (records in/records out) on an exponential scale (1, 10, 100, ...). Sometimes it's helpful to be able to switch to fixed interval. That can help debugging certain issues that look like a hang, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10091) Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes
[ https://issues.apache.org/jira/browse/HIVE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382089#comment-14382089 ] Alan Gates commented on HIVE-10091: --- +1 for this patch. Let's create JIRAs for the additional functionality. Generate Hbase execution plan for partition filter conditions in HbaseStore api calls - initial changes --- Key: HIVE-10091 URL: https://issues.apache.org/jira/browse/HIVE-10091 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: hbase-metastore-branch Attachments: HIVE-10091.1.patch, HIVE-10091.2.patch RawStore functions that support partition filtering are the following - getPartitionsByExpr getPartitionsByFilter (takes filter string as argument, used from hcatalog) We need to generate a query execution plan in terms of Hbase scan api calls for a given filter condition. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10076) Update parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6
[ https://issues.apache.org/jira/browse/HIVE-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382110#comment-14382110 ] Sergio Peña commented on HIVE-10076: +1 Thanks [~Ferd] Update parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6 -- Key: HIVE-10076 URL: https://issues.apache.org/jira/browse/HIVE-10076 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10076.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10095) format_number udf throws NPE
[ https://issues.apache.org/jira/browse/HIVE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382057#comment-14382057 ] Hive QA commented on HIVE-10095: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707393/HIVE-10095.1.patch {color:green}SUCCESS:{color} +1 8347 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3167/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3167/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3167/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12707393 - PreCommit-HIVE-TRUNK-Build format_number udf throws NPE Key: HIVE-10095 URL: https://issues.apache.org/jira/browse/HIVE-10095 Project: Hive Issue Type: Bug Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10095.1.patch For example {code} select format_number(cast(null as int), 0); FAILED: NullPointerException null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10097) CBO (Calcite Return Path): Upgrade to new Calcite snapshot [CBO Branch]
[ https://issues.apache.org/jira/browse/HIVE-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10097: --- Affects Version/s: cbo-branch CBO (Calcite Return Path): Upgrade to new Calcite snapshot [CBO Branch] --- Key: HIVE-10097 URL: https://issues.apache.org/jira/browse/HIVE-10097 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: cbo-branch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match
[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10086: --- Attachment: (was: HIVE-10086.1.patch) Hive throws error when accessing Parquet file schema using field name match --- Key: HIVE-10086 URL: https://issues.apache.org/jira/browse/HIVE-10086 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10086.2.patch, HiveGroup.parquet When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead. This is the example and how to reproduce: First, create a parquet table, and add some values on it: {code} CREATE TABLE test1 (id int, name string, address structnumber:int,street:string,zip:string) STORED AS PARQUET; INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1; {code} Note: {{srcpart}} could be any table. It is just used to leverage the INSERT statement. The above table example generates the following Parquet file schema: {code} message hive_schema { optional int32 id; optional binary name (UTF8); optional group address { optional int32 number; optional binary street (UTF8); optional binary zip (UTF8); } } {code} Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table: {code} CREATE TABLE test1 (name string, address structstreet:string) STORED AS PARQUET; LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; hive SELECT name FROM test1; OK Roger Time taken: 0.071 seconds, Fetched: 1 row(s) hive SELECT address FROM test1; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable Time taken: 0.085 seconds {code} I would expect that Parquet can access the matched names, but Hive throws an error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)