[jira] [Commented] (HIVE-11669) OrcFileDump service should support directories
[ https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718115#comment-14718115 ] Hive QA commented on HIVE-11669: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752835/HIVE-11669.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5095/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5095/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5095/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752835 - PreCommit-HIVE-TRUNK-Build OrcFileDump service should support directories -- Key: HIVE-11669 URL: https://issues.apache.org/jira/browse/HIVE-11669 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11669.1.patch orcfiledump service does not support directories. If directory is specified then the program should iterate through all the files in the directory and perform file dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11538) Add an option to skip init script while running tests
[ https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11538: -- Labels: TODOC2.0 (was: ) Add an option to skip init script while running tests - Key: HIVE-11538 URL: https://issues.apache.org/jira/browse/HIVE-11538 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of time. When debugging a particular query which doesn't need such initialization, this delay is annoyance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10175: --- Fix Version/s: (was: 2.0.0) (was: 1.3.0) DynamicPartitionPruning lacks a fast-path exit for large IN() queries - Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11652: --- Attachment: HIVE-11652.02.patch Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reopened HIVE-10175: Reverting this patch as this needs to cycle-break fold out the SyntheticJoin predicates even if the target table is unpartitioned. DynamicPartitionPruning lacks a fast-path exit for large IN() queries - Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718285#comment-14718285 ] Gopal V commented on HIVE-10175: The issue is only visible in TPC-H q21, which hit the following error in the nightly runs. {code} Caused by: java.lang.RuntimeException: Cannot find ExprNodeEvaluator for the exprNodeDesc = RS[4] at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:57) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.init(ExprNodeGenericFuncEvaluator.java:100) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:51) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.init(ExprNodeGenericFuncEvaluator.java:100) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:51) at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:59) ... 21 more {code} DynamicPartitionPruning lacks a fast-path exit for large IN() queries - Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V resolved HIVE-10175. Resolution: Fixed Fix Version/s: 2.0.0 1.3.0 Pushed to master and branch-1, thanks [~jcamachorodriguez]! DynamicPartitionPruning lacks a fast-path exit for large IN() queries - Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11629) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join
[ https://issues.apache.org/jira/browse/HIVE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718368#comment-14718368 ] Hive QA commented on HIVE-11629: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752839/HIVE-11629.02.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9376 tests executed *Failed tests:* {noformat} TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5097/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5097/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5097/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752839 - PreCommit-HIVE-TRUNK-Build CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join -- Key: HIVE-11629 URL: https://issues.apache.org/jira/browse/HIVE-11629 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11629.01.patch, HIVE-11629.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718181#comment-14718181 ] Hive QA commented on HIVE-11634: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752837/HIVE-11634.4.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9381 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup2 org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5096/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5096/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5096/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752837 - PreCommit-HIVE-TRUNK-Build Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) -- Key: HIVE-11634 URL: https://issues.apache.org/jira/browse/HIVE-11634 Project: Hive Issue Type: Bug Components: CBO Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, HIVE-11634.3.patch, HIVE-11634.4.patch Currently, we do not support partition pruning for the following scenario {code} create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10'). The optimization is to rewrite the above query into the following. {code} explain extended select ds from pcr_t1 where (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) is used by partition pruner to prune the columns which otherwise will not be pruned. This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException
[ https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718576#comment-14718576 ] Hive QA commented on HIVE-11544: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752834/HIVE-11544.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5098/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5098/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5098/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752834 - PreCommit-HIVE-TRUNK-Build LazyInteger should avoid throwing NumberFormatException --- Key: HIVE-11544 URL: https://issues.apache.org/jira/browse/HIVE-11544 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0 Reporter: William Slacum Assignee: Gopal V Priority: Minor Labels: Performance Attachments: HIVE-11544.1.patch, HIVE-11544.2.patch, HIVE-11544.3.patch, HIVE-11544.4.patch {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these conditions: # bytes are null # radix is invalid # length is 0 # the string is '+' or '-' # {{LazyInteger#parse}} throws a {{NumberFormatException}} Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the exception is caught, swallowed, and {{isNull}} is set to {{true}}. This is generally a bad workflow, as exception creation is a performance bottleneck, and potentially repeating for many rows in a query can have a drastic performance consequence. It would be better if this method returned an {{OptionalInteger}}, which would provide similar functionality with a higher throughput rate. I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so I've marked those as affected. Any version in between would also suffer from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
[ https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720007#comment-14720007 ] Ashutosh Chauhan commented on HIVE-10021: - +1 pending tests Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled -- Key: HIVE-10021 URL: https://issues.apache.org/jira/browse/HIVE-10021 Project: Hive Issue Type: Bug Components: HiveServer2, Indexing Affects Versions: 0.13.1, 2.0.0 Environment: CDH 5.3.2 Reporter: Richard Williams Assignee: Aihua Xu Attachments: HIVE-10021.2.patch, HIVE-10021.patch When HiveServer2 is configured to authorize submitted queries and statements through Sentry, any attempt to issue an alter index rebuild statement fails with a SemanticException caused by a NullPointerException. This occurs regardless of whether the index is a compact or bitmap index. The root cause of the problem appears to be the fact that the static createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils creates a new org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and this new Driver object, unlike the one used by HiveServer2 to compile the submitted statement, is used without having its userName field initialized with the submitting user's username. Adding null checks to the Sentry code is insufficient to solve this problem, because Sentry needs the userName to determine whether or not the submitting user should be able to execute the index rebuild statement. Example stack trace from the HiveServer2 logs: {noformat} FAILED: NullPointerException null java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.security.Groups.getGroups(Groups.java:161) at org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46) at org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370) at org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440) at org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258) at org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149) at org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TServlet.doPost(TServlet.java:83) at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99) at
[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow
[ https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720001#comment-14720001 ] Ashutosh Chauhan commented on HIVE-11617: - FYI : [~jcamachorodriguez] , [~hsubramaniyan] Explain plan for multiple lateral views is very slow Key: HIVE-11617 URL: https://issues.apache.org/jira/browse/HIVE-11617 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11617.patch, HIVE-11617.patch The following explain job will be very slow or never finish if there are many lateral views involved. High CPU usage is also noticed. {noformat} CREATE TABLE `t1`(`pattern` arrayint); explain select * from t1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1; {noformat} From jstack, the job is busy with preorder tree traverse. {noformat} at java.util.regex.Matcher.getTextLength(Matcher.java:1234) at java.util.regex.Matcher.reset(Matcher.java:308) at java.util.regex.Matcher.init(Matcher.java:228) at java.util.regex.Pattern.matcher(Pattern.java:1088) at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at
[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow
[ https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14719966#comment-14719966 ] Hive QA commented on HIVE-11617: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752844/HIVE-11617.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5099/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5099/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5099/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752844 - PreCommit-HIVE-TRUNK-Build Explain plan for multiple lateral views is very slow Key: HIVE-11617 URL: https://issues.apache.org/jira/browse/HIVE-11617 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11617.patch, HIVE-11617.patch The following explain job will be very slow or never finish if there are many lateral views involved. High CPU usage is also noticed. {noformat} CREATE TABLE `t1`(`pattern` arrayint); explain select * from t1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1 lateral view explode(pattern) tbl1 as col1; {noformat} From jstack, the job is busy with preorder tree traverse. {noformat} at java.util.regex.Matcher.getTextLength(Matcher.java:1234) at java.util.regex.Matcher.reset(Matcher.java:308) at java.util.regex.Matcher.init(Matcher.java:228) at java.util.regex.Pattern.matcher(Pattern.java:1088) at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72) at
[jira] [Updated] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
[ https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10021: Attachment: HIVE-10021.2.patch Rather than passing the userName around, use the one saved in the SessionState. Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled -- Key: HIVE-10021 URL: https://issues.apache.org/jira/browse/HIVE-10021 Project: Hive Issue Type: Bug Components: HiveServer2, Indexing Affects Versions: 0.13.1, 2.0.0 Environment: CDH 5.3.2 Reporter: Richard Williams Assignee: Aihua Xu Attachments: HIVE-10021.2.patch, HIVE-10021.patch When HiveServer2 is configured to authorize submitted queries and statements through Sentry, any attempt to issue an alter index rebuild statement fails with a SemanticException caused by a NullPointerException. This occurs regardless of whether the index is a compact or bitmap index. The root cause of the problem appears to be the fact that the static createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils creates a new org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and this new Driver object, unlike the one used by HiveServer2 to compile the submitted statement, is used without having its userName field initialized with the submitting user's username. Adding null checks to the Sentry code is insufficient to solve this problem, because Sentry needs the userName to determine whether or not the submitting user should be able to execute the index rebuild statement. Example stack trace from the HiveServer2 logs: {noformat} FAILED: NullPointerException null java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.security.Groups.getGroups(Groups.java:161) at org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46) at org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370) at org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440) at org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258) at org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149) at org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TServlet.doPost(TServlet.java:83) at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99) at
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720972#comment-14720972 ] Hive QA commented on HIVE-11668: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753065/HIVE-11668.01.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5109/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5109/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5109/ Messages: {noformat} This message was trimmed, see log for full details [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/common/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/common/target/tmp/conf [copy] Copying 10 files to /data/hive-ptest/working/apache-github-source-source/common/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-common --- [INFO] Compiling 21 source files to /data/hive-ptest/working/apache-github-source-source/common/target/test-classes [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java: /data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-common --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-common --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.0.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-common --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-common --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.0.0-SNAPSHOT.jar to /home/hiveptest/.m2/repository/org/apache/hive/hive-common/2.0.0-SNAPSHOT/hive-common-2.0.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/common/pom.xml to /home/hiveptest/.m2/repository/org/apache/hive/hive-common/2.0.0-SNAPSHOT/hive-common-2.0.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Serde 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-serde --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/serde/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/serde/src/gen/thrift/gen-javabean added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-serde --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/serde/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde --- [INFO] Compiling 405 source files to /data/hive-ptest/working/apache-github-source-source/serde/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java: Recompile with -Xlint:deprecation for details. [WARNING]
[jira] [Updated] (HIVE-11669) OrcFileDump service should support directories
[ https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11669: -- Labels: TODOC1.3 (was: ) OrcFileDump service should support directories -- Key: HIVE-11669 URL: https://issues.apache.org/jira/browse/HIVE-11669 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11669.1.patch orcfiledump service does not support directories. If directory is specified then the program should iterate through all the files in the directory and perform file dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file
[ https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720895#comment-14720895 ] Eugene Koifman commented on HIVE-11320: --- I'm not sure I follow, can you explain ACID enable predicate pushdown for insert-only delta file - Key: HIVE-11320 URL: https://issues.apache.org/jira/browse/HIVE-11320 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 1.3.0 Attachments: HIVE-11320.patch Given ACID table T against which some Insert/Update/Delete has been executed but not Major Compaction. This table will have some number of delta files. (and possibly base files). Given a query: select * from T where c1 = 5; OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to the delta file via eventOptions.searchArgument(null, null); When a delta file is known to only have Insert events we can safely push the predicate. ORC maintains stats in a footer which have counts of insert/update/delete events in the file - this can be used to determine that a given delta file only has Insert events. See OrcRecordUpdate.parseAcidStats() This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by definition only generate Insert events. PPD for deltas with arbitrary types of events can be achieved but it is more complicated and will be addressed separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11689) minor changes to ORC split generation
[ https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11689: Attachment: HIVE-11689.patch Patch minor changes to ORC split generation - Key: HIVE-11689 URL: https://issues.apache.org/jira/browse/HIVE-11689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11689.patch There are two changes that would help future work on split PPD into HBase metastore. 1) Move non-HDFS split strategy determination logic into main thread from threadpool. 2) Instead of iterating thru the futures and waiting, use CompletionService to get futures in order of completion. That might be useful by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10924) add support for MERGE statement
[ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720975#comment-14720975 ] Lefty Leverenz commented on HIVE-10924: --- Nice doc, [~ekoifman]. But the MERGE statement has a THEN THEN typo. add support for MERGE statement --- Key: HIVE-10924 URL: https://issues.apache.org/jira/browse/HIVE-10924 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman add support for MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720974#comment-14720974 ] Wei Zheng commented on HIVE-11510: -- Test failure not related. Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11678) Add AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/HIVE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11678: Attachment: HIVE-11678.2.patch With updated golden files. Add AggregateProjectMergeRule - Key: HIVE-11678 URL: https://issues.apache.org/jira/browse/HIVE-11678 Project: Hive Issue Type: New Feature Components: CBO, Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11678.2.patch, HIVE-11678.patch This will help to get rid of extra projects on top of Aggregation, thus compacting query plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720978#comment-14720978 ] Lefty Leverenz commented on HIVE-11595: --- Any doc needed for this? refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11689) minor flow changes to ORC split generation
[ https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11689: Summary: minor flow changes to ORC split generation (was: minor changes to ORC split generation) minor flow changes to ORC split generation -- Key: HIVE-11689 URL: https://issues.apache.org/jira/browse/HIVE-11689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11689.patch There are two changes that would help future work on split PPD into HBase metastore. 1) Move non-HDFS split strategy determination logic into main thread from threadpool. 2) Instead of iterating thru the futures and waiting, use CompletionService to get futures in order of completion. That might be useful by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11689) minor flow changes to ORC split generation
[ https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720898#comment-14720898 ] Sergey Shelukhin commented on HIVE-11689: - [~prasanth_j] can you review https://reviews.apache.org/r/37917/ minor flow changes to ORC split generation -- Key: HIVE-11689 URL: https://issues.apache.org/jira/browse/HIVE-11689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11689.patch There are two changes that would help future work on split PPD into HBase metastore. 1) Move non-HDFS split strategy determination logic into main thread from threadpool. 2) Instead of iterating thru the futures and waiting, use CompletionService to get futures in order of completion. That might be useful by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file
[ https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720886#comment-14720886 ] Sergey Shelukhin commented on HIVE-11320: - I wonder if it would also apply to stripe elimination during split generation... ACID enable predicate pushdown for insert-only delta file - Key: HIVE-11320 URL: https://issues.apache.org/jira/browse/HIVE-11320 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 1.3.0 Attachments: HIVE-11320.patch Given ACID table T against which some Insert/Update/Delete has been executed but not Major Compaction. This table will have some number of delta files. (and possibly base files). Given a query: select * from T where c1 = 5; OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to the delta file via eventOptions.searchArgument(null, null); When a delta file is known to only have Insert events we can safely push the predicate. ORC maintains stats in a footer which have counts of insert/update/delete events in the file - this can be used to determine that a given delta file only has Insert events. See OrcRecordUpdate.parseAcidStats() This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by definition only generate Insert events. PPD for deltas with arbitrary types of events can be achieved but it is more complicated and will be addressed separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration
[ https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720918#comment-14720918 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11670: -- The failures are unrelated to the change. Thanks Hari Strip out password information from TezSessionState configuration - Key: HIVE-11670 URL: https://issues.apache.org/jira/browse/HIVE-11670 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11670.1.patch Remove password information from configuration copy that is sent to Yarn/Tez. We don't need it there. The config entries can potentially be visible to other users. HIVE-10508 had the fix which removed this in certain places, however, when I initiated a session via Hive Cli, I could still see the password information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720936#comment-14720936 ] Hive QA commented on HIVE-11510: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753071/HIVE-11510.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5107/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5107/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5107/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753071 - PreCommit-HIVE-TRUNK-Build Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11684) Implement limit pushdown through outer join in CBO
[ https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720969#comment-14720969 ] Hive QA commented on HIVE-11684: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753056/HIVE-11684.patch {color:red}ERROR:{color} -1 due to 125 failed/errored test(s), 9381 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_pad_convert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_double org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_macro org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch_threshold org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_extended org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regex_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_str_to_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_windowing_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_type_cast_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_type_widening org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_context_aware org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_example_add org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_isnull_isnotnull org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_date_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_cast org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_round org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_if_expr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
[jira] [Commented] (HIVE-11669) OrcFileDump service should support directories
[ https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720981#comment-14720981 ] Lefty Leverenz commented on HIVE-11669: --- Doc note: This should be documented, with version information, in the ORC File Dump Utility section of the ORC doc. * [ORC Files -- ORC File Dump Utility | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility] By the way, I don't see anything about the file dump utility in the ORC project documentation (https://orc.apache.org/docs/). OrcFileDump service should support directories -- Key: HIVE-11669 URL: https://issues.apache.org/jira/browse/HIVE-11669 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11669.1.patch orcfiledump service does not support directories. If directory is specified then the program should iterate through all the files in the directory and perform file dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11689) minor flow changes to ORC split generation
[ https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720898#comment-14720898 ] Sergey Shelukhin edited comment on HIVE-11689 at 8/29/15 1:50 AM: -- [~prasanth_j] can you review https://reviews.apache.org/r/37917/ the code in determineSplitStrategy is entirely old, except for list of files variable being renamed was (Author: sershe): [~prasanth_j] can you review https://reviews.apache.org/r/37917/ minor flow changes to ORC split generation -- Key: HIVE-11689 URL: https://issues.apache.org/jira/browse/HIVE-11689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11689.patch There are two changes that would help future work on split PPD into HBase metastore. 1) Move non-HDFS split strategy determination logic into main thread from threadpool. 2) Instead of iterating thru the futures and waiting, use CompletionService to get futures in order of completion. That might be useful by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10924) add support for MERGE statement
[ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10924: -- Component/s: Transactions add support for MERGE statement --- Key: HIVE-10924 URL: https://issues.apache.org/jira/browse/HIVE-10924 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman add support for MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720281#comment-14720281 ] Siddharth Seth commented on HIVE-10615: --- That sounds about right. ContainerId string was changed in Hadoop 2.6 iirc. Having multiple versions of the jar in the classpath will cause such issues. Running against clusters with a different version of the client libs could also cause such problems. LLAP: Invalid containerId prefix Key: HIVE-10615 URL: https://issues.apache.org/jira/browse/HIVE-10615 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran I encountered this error when I ran a simple query in llap mode today. {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalArgumentException: Invalid ContainerId prefix: at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) at org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted while waiting for task to complete. Interrupting task 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] INFO task.TezTaskRunner : Encounted an error while executing task: attempt_1430816501738_0034_1_00_00_0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at
[jira] [Updated] (HIVE-11672) Hive Streaming API handles bucketing incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11672: -- Component/s: Transactions Hive HCatalog Hive Streaming API handles bucketing incorrectly Key: HIVE-11672 URL: https://issues.apache.org/jira/browse/HIVE-11672 Project: Hive Issue Type: Bug Components: HCatalog, Hive, Transactions Affects Versions: 1.2.1 Reporter: Raj Bains Assignee: Roshan Naik Priority: Critical Hive Streaming API allows the clients to get a random bucket and then insert data into it. However, this leads to incorrect bucketing as Hive expects data to be distributed into buckets based on a hash function applied to bucket key. The data is inserted randomly by the clients right now. They have no way of # Knowing what bucket a row (tuple) belongs to # Asking for a specific bucket There are optimization such as Sort Merge Join and Bucket Map Join that rely on the data being correctly distributed across buckets and these will cause incorrect read results if the data is not distributed correctly. There are two obvious design choices # Hive Streaming API should fix this internally by distributing the data correctly # Hive Streaming API should expose data distribution scheme to the clients and allow them to distribute the data correctly The first option will mean every client thread will write to many buckets, causing many small files in each bucket and too many connections open. this does not seem feasible. The second option pushes more functionality into the client of the Hive Streaming API, but can maintain high throughput and write good sized ORC files. This option seems preferable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-10615. --- Resolution: Not A Problem LLAP: Invalid containerId prefix Key: HIVE-10615 URL: https://issues.apache.org/jira/browse/HIVE-10615 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran I encountered this error when I ran a simple query in llap mode today. {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalArgumentException: Invalid ContainerId prefix: at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) at org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted while waiting for task to complete. Interrupting task 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] INFO task.TezTaskRunner : Encounted an error while executing task: attempt_1430816501738_0034_1_00_00_0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at
[jira] [Commented] (HIVE-10978) Document fs.trash.interval wrt Hive and HDFS Encryption
[ https://issues.apache.org/jira/browse/HIVE-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720235#comment-14720235 ] Eugene Koifman commented on HIVE-10978: --- I think mentioning this in Drop Table/Partition section is a good idea, but the most critical part is that fs.trash.interval has to be set in core-site.xml (i.e. the hadoop config file) not any hive-site.xml or at CLI. Document fs.trash.interval wrt Hive and HDFS Encryption --- Key: HIVE-10978 URL: https://issues.apache.org/jira/browse/HIVE-10978 Project: Hive Issue Type: Bug Components: Documentation, Security Affects Versions: 1.2.0 Reporter: Eugene Koifman Priority: Critical Labels: TODOC1.2 This should be documented in 1.2.1 Release Notes When HDFS is encrypted (TDE is enabled), DROP TABLE and DROP PARTITION have unexpected behavior when Hadoop Trash feature is enabled. The later is enabled by setting fs.trash.interval 0 in core-site.xml. When Trash is enabled, the data file for the table, should be moved to Trash bin. If the table is inside an Encryption Zone, this move operation is not allowed. There are 2 ways to deal with this: 1. use PURGE, as in DROP TABLE blah PURGE. This skips the Trash bin even if enabled. 2. set fs.trash.interval = 0. It is critical that this config change is done in core-site.xml. Setting it in hive-site.xml may lead to very strange behavior where the table metadata is deleted but the data file remains. This will lead to data corruption if a table with the same name is later created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11672) Hive Streaming API handles bucketing incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11672: -- Fix Version/s: (was: 1.2.2) Hive Streaming API handles bucketing incorrectly Key: HIVE-11672 URL: https://issues.apache.org/jira/browse/HIVE-11672 Project: Hive Issue Type: Bug Components: HCatalog, Hive, Transactions Affects Versions: 1.2.1 Reporter: Raj Bains Assignee: Roshan Naik Priority: Critical Hive Streaming API allows the clients to get a random bucket and then insert data into it. However, this leads to incorrect bucketing as Hive expects data to be distributed into buckets based on a hash function applied to bucket key. The data is inserted randomly by the clients right now. They have no way of # Knowing what bucket a row (tuple) belongs to # Asking for a specific bucket There are optimization such as Sort Merge Join and Bucket Map Join that rely on the data being correctly distributed across buckets and these will cause incorrect read results if the data is not distributed correctly. There are two obvious design choices # Hive Streaming API should fix this internally by distributing the data correctly # Hive Streaming API should expose data distribution scheme to the clients and allow them to distribute the data correctly The first option will mean every client thread will write to many buckets, causing many small files in each bucket and too many connections open. this does not seem feasible. The second option pushes more functionality into the client of the Hive Streaming API, but can maintain high throughput and write good sized ORC files. This option seems preferable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration
[ https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720230#comment-14720230 ] Hive QA commented on HIVE-11670: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752863/HIVE-11670.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9378 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5101/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5101/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5101/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752863 - PreCommit-HIVE-TRUNK-Build Strip out password information from TezSessionState configuration - Key: HIVE-11670 URL: https://issues.apache.org/jira/browse/HIVE-11670 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11670.1.patch Remove password information from configuration copy that is sent to Yarn/Tez. We don't need it there. The config entries can potentially be visible to other users. HIVE-10508 had the fix which removed this in certain places, however, when I initiated a session via Hive Cli, I could still see the password information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720248#comment-14720248 ] Daniel Dai commented on HIVE-10615: --- It turns out there is a hadoop-2.5.jar in my classpath. So this is no longer an issue for me. LLAP: Invalid containerId prefix Key: HIVE-10615 URL: https://issues.apache.org/jira/browse/HIVE-10615 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran I encountered this error when I ran a simple query in llap mode today. {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalArgumentException: Invalid ContainerId prefix: at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) at org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted while waiting for task to complete. Interrupting task 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] INFO task.TezTaskRunner : Encounted an error while executing task: attempt_1430816501738_0034_1_00_00_0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720275#comment-14720275 ] Daniel Dai commented on HIVE-10615: --- Note my cluster is on Hadoop 2.7.1. LLAP: Invalid containerId prefix Key: HIVE-10615 URL: https://issues.apache.org/jira/browse/HIVE-10615 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran I encountered this error when I ran a simple query in llap mode today. {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalArgumentException: Invalid ContainerId prefix: at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) at org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) at org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted while waiting for task to complete. Interrupting task 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] INFO task.TezTaskRunner : Encounted an error while executing task: attempt_1430816501738_0034_1_00_00_0 java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
[jira] [Resolved] (HIVE-11682) LLAP: Merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-11682. -- Resolution: Fixed Fix Version/s: llap LLAP: Merge master into branch -- Key: HIVE-11682 URL: https://issues.apache.org/jira/browse/HIVE-11682 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720352#comment-14720352 ] Wei Zheng commented on HIVE-11510: -- [~sushanth] Can you review? Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4198) Move HCatalog code into Hive
[ https://issues.apache.org/jira/browse/HIVE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Sabharwal updated HIVE-4198: -- Fix Version/s: 0.11.0 Move HCatalog code into Hive Key: HIVE-4198 URL: https://issues.apache.org/jira/browse/HIVE-4198 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.11.0 The HCatalog code needs to be moved into Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4198) Move HCatalog code into Hive
[ https://issues.apache.org/jira/browse/HIVE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Sabharwal updated HIVE-4198: -- Fix Version/s: (was: 0.11.0) Move HCatalog code into Hive Key: HIVE-4198 URL: https://issues.apache.org/jira/browse/HIVE-4198 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates The HCatalog code needs to be moved into Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720412#comment-14720412 ] Sushanth Sowmyan commented on HIVE-11668: - This patch will still have an issue, as observed by [~wzheng] earlier today: {noformat} Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' FailedObject:org.datanucleus.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396) at org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:196) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:137) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:335) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:57) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:601) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:579) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:632) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:468) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:66) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5815) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:203) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) ... 19 more {noformat} The issue here is this. Earlier, the runDbCheck() function was instantiating a transaction if it wasn't already open. So, as long as we were determining the db type by using runDbCheck, we were opening the txn as a side-effect (ugh). Now, by determining the product name by the jdbc provider, we're not calling runDbCheck, and thus, the txn is never opened. You need the following in your chain, hopefully in a more sane place than in runDbCheck(): {noformat} Transaction tx = pm.currentTransaction(); +if (!tx.isActive()) { + tx.begin(); +} {noformat} make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11636) NPE in stats conversion with HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720364#comment-14720364 ] Sergey Shelukhin commented on HIVE-11636: - Thanks! NPE in stats conversion with HBase metastore Key: HIVE-11636 URL: https://issues.apache.org/jira/browse/HIVE-11636 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11636.01.patch, HIVE-11636.patch NO PRECOMMIT TESTS {noformat} 2015-08-24T20:37:22,285 ERROR [main]: ql.Driver (SessionState.java:printError(963)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:740) at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:731) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:186) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:139) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:127) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:110) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:249) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:123) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:212) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310) {noformat} Fails after importing some databases from regular metastore and running TPCDS Q27. Simple select-where-limit query (not FetchTask) appears to run fine. With standalone Hbase metastore (might be the same issue): {noformat} 2015-08-25 14:41:04,793 ERROR [pool-6-thread-53] server.TThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:393) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
[jira] [Commented] (HIVE-11217) CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column
[ https://issues.apache.org/jira/browse/HIVE-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720387#comment-14720387 ] Prasanth Jayachandran commented on HIVE-11217: -- OrcProto is generated file and should never be modified. Also I don't think VOID is valid supported type in hive https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types IMO, it should be handled at the hive level. Hive should not pass any type other than the ones supported. CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column -- Key: HIVE-11217 URL: https://issues.apache.org/jira/browse/HIVE-11217 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Gaurav Kohli Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11217.1.patch, HIVE-11217.2.patch If you try to use create-table-as-select (CTAS) statement and create a ORC File format based table, then you can't use NULL as a column value in select clause CREATE TABLE empty (x int); CREATE TABLE orc_table_with_null STORED AS ORC AS SELECT x, null FROM empty; Error: {quote} 347084 [main] ERROR hive.ql.exec.DDLTask - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:643) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4242) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:285) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:530) at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.init(OrcStruct.java:195) at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:534) at org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:519) at
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720402#comment-14720402 ] Sushanth Sowmyan commented on HIVE-11123: - Also, this patch broke hive working against mysql and potentially other dbs - I will follow up with comments on HIVE-11668. Testing with derby alone in unit test mode is problematic. Sorry I didn't catch this before it was committed. Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11510: - Attachment: HIVE-11510.1.patch Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720380#comment-14720380 ] Hive QA commented on HIVE-11587: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752866/HIVE-11587.01.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_nulls org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_nullsafe_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join0 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5102/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752866 - PreCommit-HIVE-TRUNK-Build Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates
[jira] [Commented] (HIVE-11217) CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column
[ https://issues.apache.org/jira/browse/HIVE-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720391#comment-14720391 ] Prasanth Jayachandran commented on HIVE-11217: -- For insert queries hive maps null values to destination column types. For CTAS, may be it should default to some type (string?). CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column -- Key: HIVE-11217 URL: https://issues.apache.org/jira/browse/HIVE-11217 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Gaurav Kohli Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11217.1.patch, HIVE-11217.2.patch If you try to use create-table-as-select (CTAS) statement and create a ORC File format based table, then you can't use NULL as a column value in select clause CREATE TABLE empty (x int); CREATE TABLE orc_table_with_null STORED AS ORC AS SELECT x, null FROM empty; Error: {quote} 347084 [main] ERROR hive.ql.exec.DDLTask - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:643) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4242) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:285) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:530) at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.init(OrcStruct.java:195) at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:534) at org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:519) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:345) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:292) at
[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720752#comment-14720752 ] Sergio Peña commented on HIVE-11504: Thanks [~Ferd] for the patch. I committed the patch from HIVE-11618 that uses INTEGER/LONG. Could you take a look at the current commit and make the changes for FLOAT/DOUBLE? I think that patch look easier. Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720762#comment-14720762 ] Jesus Camacho Rodriguez commented on HIVE-11652: [~hsubramaniyan]/[~ashutoshc], could you take a look? Thanks Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11587: - Attachment: HIVE-11587.02.patch Attach patch 2 for testing. Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11657) HIVE-2573 introduces some issues during metastore init (and CLI init)
[ https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11657: Summary: HIVE-2573 introduces some issues during metastore init (and CLI init) (was: HIVE-2573 introduces some issues) HIVE-2573 introduces some issues during metastore init (and CLI init) - Key: HIVE-11657 URL: https://issues.apache.org/jira/browse/HIVE-11657 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-11657.patch HIVE-2573 introduced static reload functions call. It has a few problems: 1) When metastore client is initialized using an externally supplied config (i.e. Hive.get(HiveConf)), it still gets called during static init using the main service config. In my case, even though I have uris in the supplied config to connect to remote MS (which eventually happens), the static call creates objectstore, which is undesirable. 2) It breaks compat - old metastores do not support this call so new clients will fail, and there's no workaround like not using a new feature because the static call is always made -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720789#comment-14720789 ] Sergey Shelukhin commented on HIVE-11587: - Left some feedback. Mostly, such internals as wbs should not be exposed externally Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
[ https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720785#comment-14720785 ] Hive QA commented on HIVE-10021: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753007/HIVE-10021.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5105/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5105/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5105/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753007 - PreCommit-HIVE-TRUNK-Build Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled -- Key: HIVE-10021 URL: https://issues.apache.org/jira/browse/HIVE-10021 Project: Hive Issue Type: Bug Components: HiveServer2, Indexing Affects Versions: 0.13.1, 2.0.0 Environment: CDH 5.3.2 Reporter: Richard Williams Assignee: Aihua Xu Attachments: HIVE-10021.2.patch, HIVE-10021.patch When HiveServer2 is configured to authorize submitted queries and statements through Sentry, any attempt to issue an alter index rebuild statement fails with a SemanticException caused by a NullPointerException. This occurs regardless of whether the index is a compact or bitmap index. The root cause of the problem appears to be the fact that the static createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils creates a new org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and this new Driver object, unlike the one used by HiveServer2 to compile the submitted statement, is used without having its userName field initialized with the submitting user's username. Adding null checks to the Sentry code is insufficient to solve this problem, because Sentry needs the userName to determine whether or not the submitting user should be able to execute the index rebuild statement. Example stack trace from the HiveServer2 logs: {noformat} FAILED: NullPointerException null java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.security.Groups.getGroups(Groups.java:161) at org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46) at org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370) at org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440) at org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258) at org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149) at org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173) at
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720790#comment-14720790 ] Sergey Shelukhin commented on HIVE-11595: - Will commit after HiveQA refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10113) LLAP: reducers running in LLAP starve out map retries
[ https://issues.apache.org/jira/browse/HIVE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10113. - Resolution: Done Fix Version/s: llap That was fixed elsewhere long ago LLAP: reducers running in LLAP starve out map retries - Key: HIVE-10113 URL: https://issues.apache.org/jira/browse/HIVE-10113 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth Fix For: llap When query 17 is run, some mappers from Map 1 currently fail (due to unwrap issue, and also due to HIVE-10112). This query has 1000+ reducers; if they are ran in llap, they all queue up, and the query locks up. If only mappers run in LLAP, query completes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720810#comment-14720810 ] Ashutosh Chauhan commented on HIVE-11652: - +1 LGTM, while your committing it will be good to add comments describing role of following data structures in walker: * opStack * opQueue * toWalk Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11688) OrcRawRecordMerger does not close primary reader if not fully consumed
[ https://issues.apache.org/jira/browse/HIVE-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated HIVE-11688: --- Attachment: HIVE-11688.patch OrcRawRecordMerger does not close primary reader if not fully consumed -- Key: HIVE-11688 URL: https://issues.apache.org/jira/browse/HIVE-11688 Project: Hive Issue Type: Bug Components: File Formats Reporter: Sudheesh Katkam Assignee: Sudheesh Katkam Labels: orc Attachments: HIVE-11688.patch If {{OrcRawRecordMerger#close}} is called before fully reading an orc file, the {{primary}} reader is not closed. The {{primary}} reader is assigned using {{readers.pollFirstEntry()}} which deletes the reader from {{readers}}, and currently the {{OrcRawRecordMerger#close}} method only closes readers in the map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11657) HIVE-2573 introduces some issues
[ https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11657: Attachment: HIVE-11657.patch This changes the reloadFunctions call to be done once globally, but on the object, so that it is done after the proper config is set. It also improves retry logic to not retry on some non-recoverable errors, like a missing method. [~gopalv] [~ashutoshc] can you take a look HIVE-2573 introduces some issues Key: HIVE-11657 URL: https://issues.apache.org/jira/browse/HIVE-11657 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-11657.patch HIVE-2573 introduced static reload functions call. It has a few problems: 1) When metastore client is initialized using an externally supplied config (i.e. Hive.get(HiveConf)), it still gets called during static init using the main service config. In my case, even though I have uris in the supplied config to connect to remote MS (which eventually happens), the static call creates objectstore, which is undesirable. 2) It breaks compat - old metastores do not support this call so new clients will fail, and there's no workaround like not using a new feature because the static call is always made -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11688) OrcRawRecordMerger does not close primary reader if not fully consumed
[ https://issues.apache.org/jira/browse/HIVE-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720723#comment-14720723 ] Sudheesh Katkam commented on HIVE-11688: Review board [link|https://reviews.apache.org/r/37909/]. OrcRawRecordMerger does not close primary reader if not fully consumed -- Key: HIVE-11688 URL: https://issues.apache.org/jira/browse/HIVE-11688 Project: Hive Issue Type: Bug Components: File Formats Reporter: Sudheesh Katkam Labels: orc Attachments: HIVE-11688.patch If {{OrcRawRecordMerger#close}} is called before fully reading an orc file, the {{primary}} reader is not closed. The {{primary}} reader is assigned using {{readers.pollFirstEntry()}} which deletes the reader from {{readers}}, and currently the {{OrcRawRecordMerger#close}} method only closes readers in the map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11510: - Attachment: HIVE-11510.2.patch Thanks [~sushanth]. Updated patch as suggested. Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11668: Attachment: HIVE-11668.01.patch Cleans up the behavior in initial checks - makes sure tx is always open, and that it only commits a tx when it has opened it. make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.01.patch, HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720660#comment-14720660 ] Hive QA commented on HIVE-11652: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752973/HIVE-11652.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5104/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752973 - PreCommit-HIVE-TRUNK-Build Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11678) Add AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/HIVE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720503#comment-14720503 ] Hive QA commented on HIVE-11678: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752925/HIVE-11678.patch {color:red}ERROR:{color} -1 due to 232 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_not_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_distinct_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_gby_star org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720534#comment-14720534 ] Sergey Shelukhin commented on HIVE-11668: - Hmm... make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720537#comment-14720537 ] Sergey Shelukhin commented on HIVE-11668: - Actually SQL helpers already ensure txn is always there in all other cases. It's just the DB init that requires it. I guess it rolls back on failure so there should be a txn make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11510) Metatool updateLocation warns on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11510: - Summary: Metatool updateLocation warns on views (was: Metatool updateLocation fails on views) Metatool updateLocation warns on views -- Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11510: - Summary: Metatool updateLocation warning on views (was: Metatool updateLocation warns on views) Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11621) Fix TestMiniTezCliDriver test failures when HBase Metastore is used
[ https://issues.apache.org/jira/browse/HIVE-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-11621. --- Resolution: Fixed Patch 2 committed. Thanks Daniel. Fix TestMiniTezCliDriver test failures when HBase Metastore is used --- Key: HIVE-11621 URL: https://issues.apache.org/jira/browse/HIVE-11621 Project: Hive Issue Type: Sub-task Components: HBase Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Fix For: hbase-metastore-branch Attachments: HIVE-11621.1.patch, HIVE-11621.2.patch As a first step, Fix hbase-metastore unit tests with TestMiniTezCliDriver, so we can test LLAP and hbase-metastore together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11654) After HIVE-10289, HBase metastore tests failing
[ https://issues.apache.org/jira/browse/HIVE-11654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-11654. --- Resolution: Fixed Fix Version/s: hbase-metastore-branch Patch committed. Thanks Daniel. After HIVE-10289, HBase metastore tests failing --- Key: HIVE-11654 URL: https://issues.apache.org/jira/browse/HIVE-11654 Project: Hive Issue Type: Bug Components: HBase Metastore Reporter: Alan Gates Assignee: Daniel Dai Priority: Blocker Fix For: hbase-metastore-branch Attachments: HIVE-11654.1.patch After the latest merge from trunk a number of the HBase unit tests are failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11357) ACID enable predicate pushdown for insert-only delta file 2
[ https://issues.apache.org/jira/browse/HIVE-11357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720326#comment-14720326 ] Alan Gates commented on HIVE-11357: --- +1 ACID enable predicate pushdown for insert-only delta file 2 --- Key: HIVE-11357 URL: https://issues.apache.org/jira/browse/HIVE-11357 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11357.2.patch, HIVE-11357.patch HIVE-11320 missed a case. That fix enabled PPD for insert-only delta files when a base file is present. It won't work if only delta files are present. see {{OrcInputFormat.getReader(InputSplit inputSplit, Options options)}} which only calls {{setSearchArgument()}} if there is a base file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11595: Attachment: HIVE-11595.04.patch Some more small fixes from metastore branch. ORC allocates a new buffer so patch 03 code works on normal path, but in case of non-0 position it breaks (i.e. when footer comes from HBase response w/o copy). This fixes these issues. refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720332#comment-14720332 ] Prasanth Jayachandran commented on HIVE-11595: -- +1 on the new change refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10924) add support for MERGE statement
[ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720334#comment-14720334 ] Eugene Koifman commented on HIVE-10924: --- h3. Feature design notes Hive supports [multi-insert statement|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries]. The idea is that you can execute a select statement and split the result stream into several to write to multiple targets. This matches very closely to what MERGE statement needs to do. When modeling MERGE as multi-insert, we'd split the stream into 2 stream, 1 for the insert part, 1 for update part but write both results to the same table. Section 14.12 of ISO/IEC 9075-2:2011(E) (SQL 2011) defines MERGE statement. Suppose we have tables {code:SQL} CREATE TABLE target(a int, b int, c int); CREATE TABLE source(x int, y int, z int); {code} Then an example that covers most possibilities might look like this {code:SQL} MERGE INTO target USING source ON b = y WHEN MATCHED AND c + 1 + z 0 THEN THEN UPDATE SET a = 1, c = z WHEN NOT MATCHED AND z IS NULL THEN INSERT(a,b) VALUES(z, 7) {code} \\ \\ And is interpreted as follows \\ \\ || Line || Statement Part || Notes || | 1 | {code:SQL} MERGE INTO target {code} | Specifies the table being modified | | 2 | {code:SQL} USING source {code} | specifies the source of the data which may be a table or expression such as SELECT … FROM … | | 3 | {code:SQL} ON b = y {code} | is interpreted like exactly like an ON clause of a JOIN between source and target. | | 4 | {code:SQL} WHEN MATCHED {code} | Applies if expr in ON is true | | 5 | {code:SQL} AND c + 1 + z 0 {code} | Additional predicate to test before performing the action. | | 6 | {code:SQL} THEN UPDATE SET a = 1, c = z {code} | May be UPDATE or DELETE. The later deletes the row from target. SET clause is exactly like in regular UPDATE stmt. | | 7 | {code:SQL} WHEN NOT MATCHED {code} | Applies if expr in ON is false | | 8 | {code:SQL} AND z IS NULL {code} | Additional predicate to test before performing the action. | | 9 | {code:SQL} THEN INSERT(a,b) VALUES(z, 7){code} | Insert to perform on target. | \\ \\ Then the equivalent _multi-insert statement_ looks like this: \\ \\ || Statement Part || Refernce to previous table || | {code:SQL} FROM (SELECT * FROM target RIGHT OUTER JOIN SOURCE ON b = y) {code} | Lines 1 - 3 | | {code:SQL} INSERT INTO target(a,c) SELECT 1, z {code} | This represents the update part of merge; Line 6 | | {code:SQL} WHERE c + 1 + z 0 {code} | Line 5 | | {code:SQL} AND b = y {code} | Only include ‘matched’ rows; Line 4 | | {code:SQL} INSERT INTO target(a,b) SELECT z, 7 {code} | This represents the ‘insert’ part of merge; Line 9 | | {code:SQL} WHERE z IS NULL {code} | Line 8 | | {code:SQL} AND a = null AND b = null AND c = null; {code} | Only include ‘not matched’ rows; Line 7 | h4. Some caveats # Current multi-insert doesn’t support writing to the same table more than once. Can we fix this? # This requires the same change as for multi-statement txn, that is to support multiple delta files per transaction. (HIVE-11030) # Requires annotating each insert (of multi-insert) with whether it’s doing update/delete or insert Since Hive can already compile an operator pipeline for such a _multi-insert statement_ (almost) support for MERGE doesn't require additional operators. Also, Update/Delete are actually compiled int Insert statements. add support for MERGE statement --- Key: HIVE-10924 URL: https://issues.apache.org/jira/browse/HIVE-10924 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman add support for MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO
[ https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11684: --- Component/s: CBO Implement limit pushdown through outer join in CBO -- Key: HIVE-11684 URL: https://issues.apache.org/jira/browse/HIVE-11684 Project: Hive Issue Type: New Feature Components: CBO Affects Versions: 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO
[ https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11684: --- Attachment: HIVE-11684.patch Implement limit pushdown through outer join in CBO -- Key: HIVE-11684 URL: https://issues.apache.org/jira/browse/HIVE-11684 Project: Hive Issue Type: New Feature Components: CBO Affects Versions: 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11684.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO
[ https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11684: --- Attachment: HIVE-11684.patch Implement limit pushdown through outer join in CBO -- Key: HIVE-11684 URL: https://issues.apache.org/jira/browse/HIVE-11684 Project: Hive Issue Type: New Feature Components: CBO Affects Versions: 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11684.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720427#comment-14720427 ] Sergey Shelukhin commented on HIVE-11668: - Hmm. make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720434#comment-14720434 ] Sushanth Sowmyan commented on HIVE-11510: - With the current patch, the metastore will do a LOG.debug for every single null record, which can be a lot, and will also slow down that process a lot. Would it be possible to simply update the UpdateMStorageDescriptorTblURIRetVal class with a int numNullRecords initialized to zero and incremented each time you get a null? Also, in that case, I would imagine that we shouldn't add that location to badRecords, since that would bloat up the size of badRecords unnecessarily. After we do that, we can then do a singular log in HiveMetaTool.printTblURIUpdateSummary along with the other statistics, mentioning how many null records we found, and that that is okay if the user has that many indexes/views. Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO
[ https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11684: --- Attachment: (was: HIVE-11684.patch) Implement limit pushdown through outer join in CBO -- Key: HIVE-11684 URL: https://issues.apache.org/jira/browse/HIVE-11684 Project: Hive Issue Type: New Feature Components: CBO Affects Versions: 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11553) use basic file metadata cache in ETLSplitStrategy-related paths
[ https://issues.apache.org/jira/browse/HIVE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720813#comment-14720813 ] Sergey Shelukhin commented on HIVE-11553: - Actually, I misread some code. Something is easier to do than I thought. I may yet update this patch. It does work now, still :) use basic file metadata cache in ETLSplitStrategy-related paths --- Key: HIVE-11553 URL: https://issues.apache.org/jira/browse/HIVE-11553 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11553.01.patch, HIVE-11553.02.patch, HIVE-11553.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11660) LLAP: TestTaskExecutorService is flaky
[ https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-11660: -- Attachment: HIVE-11660.1.txt Attaching patch to fix the tests. Have run 100 iterations of both on a Linux box - where the failures are normally seen - with all of them passing. There's some real bugs which were causing TestLlapTaskSchedulerService to fail. The last allocateTaskRequest for a dag could've ended up being ignored. Also in TaskScheduler, the waitQueue can be improved - filed a separate jira for this. [~sershe] - please review. LLAP: TestTaskExecutorService is flaky -- Key: HIVE-11660 URL: https://issues.apache.org/jira/browse/HIVE-11660 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Siddharth Seth Attachments: HIVE-11660.1.txt {noformat} java.lang.Exception: test timed out after 1 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168) {noformat} Cannot repro locally. See HIVE-11642 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11660) LLAP: TestTaskExecutorService is flaky
[ https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720833#comment-14720833 ] Siddharth Seth commented on HIVE-11660: --- On the TaskExecutor, this mainly moves some code around - removing the scheduled task from the waitQueue is now in the same sync block instead of being in separate synchronized blocks. LLAP: TestTaskExecutorService is flaky -- Key: HIVE-11660 URL: https://issues.apache.org/jira/browse/HIVE-11660 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Siddharth Seth Attachments: HIVE-11660.1.txt {noformat} java.lang.Exception: test timed out after 1 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168) {noformat} Cannot repro locally. See HIVE-11642 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11660) LLAP: TestTaskExecutorService is flaky
[ https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720859#comment-14720859 ] Sergey Shelukhin commented on HIVE-11660: - can you post rb LLAP: TestTaskExecutorService is flaky -- Key: HIVE-11660 URL: https://issues.apache.org/jira/browse/HIVE-11660 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Siddharth Seth Attachments: HIVE-11660.1.txt {noformat} java.lang.Exception: test timed out after 1 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168) {noformat} Cannot repro locally. See HIVE-11642 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720870#comment-14720870 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11652: -- +1, this should nullify HIVE-11341. Thanks Hari Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720883#comment-14720883 ] Hive QA commented on HIVE-11595: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12753038/HIVE-11595.04.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5106/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5106/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5106/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12753038 - PreCommit-HIVE-TRUNK-Build refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)