[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)
[ https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700277#comment-14700277 ] Ashutosh Chauhan commented on HIVE-11375: - Yeah.. that seems like a bug to me. Broken processing of queries containing NOT (x IS NOT NULL and x 0) -- Key: HIVE-11375 URL: https://issues.apache.org/jira/browse/HIVE-11375 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0 Reporter: Mariusz Sakowski Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11375.2.patch, HIVE-11375.patch When running query like this: {code}explain select * from test where (val is not null and val 0);{code} hive will simplify expression in parenthesis and omit is not null check: {code} Filter Operator predicate: (val 0) (type: boolean) {code} which is fine. but if we negate condition using NOT operator: {code}explain select * from test where not (val is not null and val 0);{code} hive will also simplify thing, but now it will break stuff: {code} Filter Operator predicate: (not (val 0)) (type: boolean) {code} because valid predicate should be *val == 0 or val is null*, while above row is equivalent to *val == 0* only, filtering away rows where val is null simple example: {code} CREATE TABLE example ( val bigint ); INSERT INTO example VALUES (1), (NULL), (0); -- returns 2 rows - NULL and 0 select * from example where (val is null or val == 0); -- returns 1 row - 0 select * from example where not (val is not null and val 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11526) LLAP: implement LLAP UI as a separate service
[ https://issues.apache.org/jira/browse/HIVE-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700723#comment-14700723 ] Kai Sasaki commented on HIVE-11526: --- [~sershe] Thank you so much for detail information. I can figure out where we should start at the beginning. I'll break down this JIRA into a few tasks as you suggested. * Create LLAP Monitor Daemon * Running script and Slider integration * Selecting metrics * Sophisticate UI LLAP: implement LLAP UI as a separate service - Key: HIVE-11526 URL: https://issues.apache.org/jira/browse/HIVE-11526 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Kai Sasaki The specifics are vague at this point. Hadoop metrics can be output, as well as metrics we collect and output in jmx, as well as those we collect per fragment and log right now. This service can do LLAP-specific views, and per-query aggregation. [~gopalv] may have some information on how to reuse existing solutions for part of the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10810) Document Beeline/CLI changes
[ https://issues.apache.org/jira/browse/HIVE-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700767#comment-14700767 ] Shannon Ladymon commented on HIVE-10810: Thanks! I'll watch this jira. Document Beeline/CLI changes Key: HIVE-10810 URL: https://issues.apache.org/jira/browse/HIVE-10810 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Xuefu Zhang Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699300#comment-14699300 ] Hive QA commented on HIVE-11383: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750765/HIVE-11383.9.patch {color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_exists org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4984/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4984/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4984/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 22 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750765 - PreCommit-HIVE-TRUNK-Build Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699392#comment-14699392 ] Hive QA commented on HIVE-11424: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750767/HIVE-11424.01.patch {color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_auto_join1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join3 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4985/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4985/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4985/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 35 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750767 - PreCommit-HIVE-TRUNK-Build Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11591) change thrift generation to use undated annotations
[ https://issues.apache.org/jira/browse/HIVE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11591: Attachment: HIVE-11591.WIP.patch The patch. We can commit it now, and then presumably get the benefit when someone upgrades to 0.9.3 and re-gens the files, or we can wait and regen/commit after 0.9.3 upgrade. change thrift generation to use undated annotations --- Key: HIVE-11591 URL: https://issues.apache.org/jira/browse/HIVE-11591 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11591.WIP.patch Thrift has added class annotations to generated classes; these contain generation date. Because of this, all the Java thrift files change on every re-gen, even if you only make a small change that should not affect bazillion files. This depends on upgrading to Thrift 0.9.3, which doesn't exist yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree
[ https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700514#comment-14700514 ] Hive QA commented on HIVE-11341: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750916/HIVE-11341.8.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map_multi_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4990/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4990/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4990/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750916 - PreCommit-HIVE-TRUNK-Build Avoid expensive resizing of ASTNode tree - Key: HIVE-11341 URL: https://issues.apache.org/jira/browse/HIVE-11341 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, HIVE-11341.6.patch, HIVE-11341.7.patch, HIVE-11341.8.patch {code} Stack TraceSample CountPercentage(%) parse.BaseSemanticAnalyzer.analyze(ASTNode, Context) 1,605 90 parse.CalcitePlanner.analyzeInternal(ASTNode) 1,605 90 parse.SemanticAnalyzer.analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContext) 1,605 90 parse.CalcitePlanner.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genPlan(QB) 1,604 90 parse.SemanticAnalyzer.genPlan(QB, boolean) 1,604 90 parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map) 1,604 90 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, Operator, Map, boolean) 1,603 90 parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, Operator, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, TypeCheckProcFactory) 1,603 90 lib.DefaultGraphWalker.startWalking(Collection, HashMap) 1,579 89 lib.DefaultGraphWalker.walk(Node) 1,571 89 java.util.ArrayList.removeAll(Collection) 1,433 81
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700516#comment-14700516 ] Ferdinand Xu commented on HIVE-10304: - I am working on the smoke test and waiting for the fix of HIVE-11579. I think we can begin the process of merging beeline-cli into the master in 1~2 weeks if smoke test goes well. Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700593#comment-14700593 ] Xuefu Zhang commented on HIVE-11579: [~Ferd], thanks for looking into this. I'm not quite sure of the root cause. Error output is part of client, and why is error output is closed when statement is closed? I'm wondering if we should fix that part instead. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700458#comment-14700458 ] Sergey Shelukhin commented on HIVE-11552: - I was going to re-gen thrift files after master merge to make patch less huge, but due to issue discussed in HIVE-11591 the re-gen is still going to touch all Java files. nogen patch is ready for review, attaching a new generated patch after merge implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11591) change thrift generation to use undated annotations
[ https://issues.apache.org/jira/browse/HIVE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11591: Description: Thrift has added class annotations to generated classes; these contain generation date. Because of this, all the Java thrift files change on every re-gen, even if you only make a small change that should not affect bazillion files. We should use undated annotations to avoid this problem. This depends on upgrading to Thrift 0.9.3, which doesn't exist yet. was: Thrift has added class annotations to generated classes; these contain generation date. Because of this, all the Java thrift files change on every re-gen, even if you only make a small change that should not affect bazillion files. This depends on upgrading to Thrift 0.9.3, which doesn't exist yet. change thrift generation to use undated annotations --- Key: HIVE-11591 URL: https://issues.apache.org/jira/browse/HIVE-11591 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11591.WIP.patch Thrift has added class annotations to generated classes; these contain generation date. Because of this, all the Java thrift files change on every re-gen, even if you only make a small change that should not affect bazillion files. We should use undated annotations to avoid this problem. This depends on upgrading to Thrift 0.9.3, which doesn't exist yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11579: --- Comment: was deleted (was: When I am debuging, I see the error output is closed after invoking the closeClientOperation method in class HiveStatement. The main purpose for this method is to send a request to the server to close handler. You can see that the error output is redirected to the standard error output. When the SQLOperation closes a driver, it will close the driver with resStream closed. When in the non-embedded mode, it works well since the two output stream are not the same one while it failed for embedded mode.) Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700689#comment-14700689 ] Xuefu Zhang commented on HIVE-11579: [~Ferd], is there any resource leaking if the statement isn't closed? Here the problem seems to be that error output shouldn't be closed when a statement is closed. If we fix that by not closing the statement, I'm not sure of any inadvertent consequences. Any further thoughts? Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11589) Invalid value such as '-1' should be checked for 'hive.txn.timeout'.
[ https://issues.apache.org/jira/browse/HIVE-11589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takahiko Saito updated HIVE-11589: -- Description: When an user accidentally set an invalid value such as '-1' for 'hive.txn.timeout', the query simply fails throwing 'NoSuchLockException' {noformat} 2015-08-16 23:25:43,149 ERROR [HiveServer2-Background-Pool: Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 40) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:501) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:5571) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.unlock(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1876) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.unlock(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:134) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.releaseLocks(DbLockManager.java:153) at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1208) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} The better way to handle such an invalid value is to check the value beforehand instead of throwing NoSuchLockException. was: When an user accidentally set an invalid value such as '-1' for 'hive.txn.timeout', the query simply fails throwing 'NoSuchLockException' {noformat} 2015-08-16 23:25:43,149 ERROR [HiveServer2-Background-Pool: Thread-206]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such lock: 40) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710) at org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:501) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:5571) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy7.unlock(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1876) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy8.unlock(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:134) at
[jira] [Commented] (HIVE-11555) Beeline sends password in clear text if we miss -ssl=true flag in the connect string
[ https://issues.apache.org/jira/browse/HIVE-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700412#comment-14700412 ] Thejas M Nair commented on HIVE-11555: -- HIVE-11581 addresses this to an extent, but it is applicable only when zookeeper HA mode is enabled. Beeline sends password in clear text if we miss -ssl=true flag in the connect string Key: HIVE-11555 URL: https://issues.apache.org/jira/browse/HIVE-11555 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0 Reporter: bharath v {code} I used tcpdump to display the network traffic: [root@fe01 ~]# beeline Beeline version 0.13.1-cdh5.3.2 by Apache Hive beeline !connect jdbc:hive2://fe01.sectest.poc:1/default Connecting to jdbc:hive2://fe01.sectest.poc:1/default Enter username for jdbc:hive2://fe01.sectest.poc:1/default: tdaranyi Enter password for jdbc:hive2://fe01.sectest.poc:1/default: * (I entered cleartext as the password) The tcpdump in a different window tdara...@fe01.sectest.poc:~$ sudo tcpdump -n -X -i lo port 1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes (...) 10:25:16.329974 IP 192.168.32.102.54322 192.168.32.102.ndmp: Flags [P.], seq 11:35, ack 1, win 512, options [nop,nop,TS val 2412851969 ecr 2412851969], length 24 0x: 4500 004c 3dd3 4000 4006 3abc c0a8 2066 E..L=.@.@.:f 0x0010: c0a8 2066 d432 2710 714c 0edc b45c 9268 ...f.2'.qL...\.h 0x0020: 8018 0200 c25b 0101 080a 8fd1 3301 .[3. 0x0030: 8fd1 3301 0500 1300 7464 6172 616e ..3...tdaran 0x0040: 7969 0063 6c65 6172 7465 7874 yi.cleartext (...) {code} We rely on the user supplied configuration to decide whether to open an SSL socket or a Plain one. Instead we can negotiate this information from the HS2 and connect accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11552: Attachment: HIVE-11552.01.patch implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11588) merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700326#comment-14700326 ] Sergey Shelukhin edited comment on HIVE-11588 at 8/17/15 10:20 PM: --- Need to pick up HIVE-11542, also log4j-related fixes for tests was (Author: sershe): Need to pick up HIVE-11542 merge master into branch Key: HIVE-11588 URL: https://issues.apache.org/jira/browse/HIVE-11588 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11588) merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-11588. - Resolution: Fixed pushed to branch merge master into branch Key: HIVE-11588 URL: https://issues.apache.org/jira/browse/HIVE-11588 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700569#comment-14700569 ] Siddharth Seth commented on HIVE-11515: --- [~navis] - Do you have more details on how the query fails in the situations where you see this occur ? Is there an exception or does the query hang or some other manifestation ? I think the patch is primarily moving the registerForVertexNotifications into the constructor - which is fine. However, the condition where an event is received before registering for vertex success notifications is already handled in checkForSourceCompletion {code} int expectedEvents = numExpectedEventsPerSource.get(name).getValue(); if (expectedEvents 0) { // Expected events not updated yet - vertex SUCCESS notification not received. return; } else { {code} Even if all the events were to come in before registering for the notification, when prune is finally called - and a notification is received, these events will be processed. On the Tez side, it makes sure to send in events only after the Initializer has been constructed - that's the HiveSplitGenerator. Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor, Tez Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10810) Document Beeline/CLI changes
[ https://issues.apache.org/jira/browse/HIVE-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700655#comment-14700655 ] Ferdinand Xu commented on HIVE-10810: - Hi [~sladymon], this is the jira tracking the wiki(https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline ). Yours, Ferd Document Beeline/CLI changes Key: HIVE-10810 URL: https://issues.apache.org/jira/browse/HIVE-10810 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Xuefu Zhang Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11245) LLAP: Fix the LLAP to ORC APIs
[ https://issues.apache.org/jira/browse/HIVE-11245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-11245. - Resolution: Done Fix Version/s: llap All the encoded path related code has been moved to orc.encoded package as described above. Main ORC package doesn't have any encoded dependencies anymore. LLAP: Fix the LLAP to ORC APIs -- Key: HIVE-11245 URL: https://issues.apache.org/jira/browse/HIVE-11245 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Sergey Shelukhin Priority: Blocker Fix For: llap Currently the LLAP branch has refactored the ORC code to have different code paths depending on whether the data is coming from the cache or a FileSystem. We need to introduce a concept of a DataSource that is responsible for getting the necessary bytes regardless of whether they are coming from a FileSystem, in memory cache, or both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700649#comment-14700649 ] Ferdinand Xu commented on HIVE-11579: - When I am debuging, I see the error output is closed after invoking the closeClientOperation method in class HiveStatement. The main purpose for this method is to send a request to the server to close handler. You can see that the error output is redirected to the standard error output. When the SQLOperation closes a driver, it will close the driver with resStream closed. When in the non-embedded mode, it works well since the two output stream are not the same one while it failed for embedded mode. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700648#comment-14700648 ] Ferdinand Xu commented on HIVE-11579: - When I am debuging, I see the error output is closed after invoking the closeClientOperation method in class HiveStatement. The main purpose for this method is to send a request to the server to close handler. You can see that the error output is redirected to the standard error output. When the SQLOperation closes a driver, it will close the driver with resStream closed. When in the non-embedded mode, it works well since the two output stream are not the same one while it failed for embedded mode. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700683#comment-14700683 ] Swarnim Kulkarni commented on HIVE-10697: - [~hsubramaniyan] Would you mind reviewing the patch? ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)
[ https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700322#comment-14700322 ] Hive QA commented on HIVE-11375: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750897/HIVE-11375.2.patch {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 9371 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_not org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_isnull_isnotnull org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_size org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4989/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4989/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4989/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750897 - PreCommit-HIVE-TRUNK-Build Broken processing of queries containing NOT (x IS NOT NULL and x 0) -- Key: HIVE-11375 URL: https://issues.apache.org/jira/browse/HIVE-11375 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0 Reporter: Mariusz Sakowski Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11375.2.patch, HIVE-11375.patch When running query like this: {code}explain select * from test where (val is not null and val 0);{code} hive will simplify expression in parenthesis and omit is not null check: {code} Filter Operator predicate: (val 0) (type: boolean) {code} which is fine. but if we negate condition using NOT operator: {code}explain select * from test where not (val is not null and val 0);{code} hive will also simplify thing, but now it will break stuff: {code} Filter Operator predicate: (not (val 0)) (type: boolean) {code} because valid predicate should be *val == 0 or val is null*, while above row is equivalent to *val == 0* only, filtering away rows where val is null simple example: {code} CREATE TABLE example ( val bigint ); INSERT INTO example VALUES (1), (NULL), (0); -- returns 2 rows - NULL and 0 select * from example where (val is null or val == 0); -- returns 1 row - 0 select * from example where not (val is not null and val 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700477#comment-14700477 ] Shannon Ladymon commented on HIVE-11317: Doc note: I have added/updated documentation for *hive.timedout.txn.reaper.start* and *hive.timedout.txn.reaper.interval* to the following pages in the wiki: * [Hive Transactions - Configuration | https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration] * [Configuration Properties - Transactions and Compactor | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor] If it looks okay, we can remove the TODOC1.3 label. ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: TODOC1.3, triage Fix For: 1.3.0 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch
[ https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700315#comment-14700315 ] Sergey Shelukhin commented on HIVE-11542: - Will remove HdfsUtils for now, I guess port fileId support on shims and splits from llap branch Key: HIVE-11542 URL: https://issues.apache.org/jira/browse/HIVE-11542 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch, 2.0.0 Attachments: HIVE-11542.patch This is helpful for any kind of file-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11588) merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700326#comment-14700326 ] Sergey Shelukhin commented on HIVE-11588: - Need to pick up HIVE-11542 merge master into branch Key: HIVE-11588 URL: https://issues.apache.org/jira/browse/HIVE-11588 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11590) AvroDeserializer is very chatty
[ https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11590: --- Assignee: Swarnim Kulkarni AvroDeserializer is very chatty --- Key: HIVE-11590 URL: https://issues.apache.org/jira/browse/HIVE-11590 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni It seems like AvroDeserializer is currently very chatty with it logging tons of messages at INFO level in the mapreduce logs. It would be helpful to push down some of these to debug level to keep the logs clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit
[ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11592: - Attachment: HIVE-11592.1.patch Here is a initial patch based on Sergey's suggestion. [~gopalv]/[~sershe] can someone take a look at this patch? ORC metadata section can sometimes exceed protobuf message size limit - Key: HIVE-11592 URL: https://issues.apache.org/jira/browse/HIVE-11592 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11592.1.patch If there are too many small stripes and with many columns, the overhead for storing metadata (column stats) can exceed the default protobuf message size of 64MB. Reading such files will throw the following exception {code} Exception in thread main com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228) at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} The only solution for this is to programmatically increase the CodeInputStream size limit. We should make this configurable via hive config so that the orc file is
[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11504: Attachment: HIVE-11504.3.patch Add float type to the predicate leaf. Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700470#comment-14700470 ] Gunther Hagleitner commented on HIVE-11515: --- Thanks [~navis]. [~wzheng]/[~sseth] can you help me review this - you know the code too? Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor, Tez Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700642#comment-14700642 ] Ferdinand Xu commented on HIVE-11504: - Hi [~spena], can you help me review this patch? Thanks! Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700670#comment-14700670 ] Hive QA commented on HIVE-11579: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750815/HIVE-11579-beeline-cli.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9234 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/19/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/19/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-BEELINE-Build-19/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750815 - PreCommit-HIVE-BEELINE-Build Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700373#comment-14700373 ] Shannon Ladymon commented on HIVE-10304: Can I get an update on the current status of the deprecation of HiveCLI? I would like to add more information to the [Hive wiki | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli] about the deprecation, including a Hive version number for when it was/will be deprecated. Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700419#comment-14700419 ] Thejas M Nair commented on HIVE-11581: -- It also helps in making the use of hive more secure. As discussed in HIVE-11555, admins can enable SSL option for not sending clear text passwords by setting connection parameters in zookeeper. HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11454) Using select columns vs. select * results in long running query
[ https://issues.apache.org/jira/browse/HIVE-11454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700444#comment-14700444 ] Russell Pierce commented on HIVE-11454: --- How sure are you that you are looking at a JDBC issue? On 0.13.1 HiveServer1 was still around and I'm seeing slower query performance for the select columns than select * without JDBC being (as far as I know, and I'm a novice here - so I might be wrong) involved. The slow down (as far as I can tell) is associated with the time to start the Hadoop job... that is, select columns triggers a hadoop job but select * does not. Using select columns vs. select * results in long running query --- Key: HIVE-11454 URL: https://issues.apache.org/jira/browse/HIVE-11454 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.13.1 Reporter: Steven Hawkins Originally logged as https://issues.jboss.org/browse/TEIID-3580 When using the JDBC jars for Hive 0.13.1 running on HDP 2.1, some queries executed against table 'default.sample_07' takes approximately 20-30 seconds to return. The Hive JDBC jars for version 0.13.1 can be found here : https://github.com/vchintal/hive-jdbc-jars-archive SELECT g_0.code, g_0.description, g_0.total_emp, g_0.salary FROM sample_07 AS g_0 run from a standalone JDBC project results in a 20+ second delay. However SELECT * FROM sample_07 has no delay. The same 500 are returned either way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700482#comment-14700482 ] Lefty Leverenz commented on HIVE-11317: --- Just curious: Why do the TimeValidators for these new parameters have MILLISECONDS while their default values are in seconds? Other parameters match the defaults to the TimeValidator unit. {code} +HIVE_TIMEDOUT_TXN_REAPER_START(hive.timedout.txn.reaper.start, 100s, + new TimeValidator(TimeUnit.MILLISECONDS), Time delay of 1st reaper run after metastore start), +HIVE_TIMEDOUT_TXN_REAPER_INTERVAL(hive.timedout.txn.reaper.interval, 180s, + new TimeValidator(TimeUnit.MILLISECONDS), Time interval describing how often the reaper runs), {code} ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: TODOC1.3, triage Fix For: 1.3.0 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit
[ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11592: - Description: If there are too many small stripes and with many columns, the overhead for storing metadata (column stats) can exceed the default protobuf message size of 64MB. Reading such files will throw the following exception {code} Exception in thread main com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228) at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} The only solution for this is to programmatically increase the CodeInputStream size limit. We should make this configurable via hive config so that the orc file is readable. Alternatively, we can keep increasing the size until it parsing succeeds. was: If there are too many small stripes and with many columns, the overhead for storing metadata (column stats) can exceed the default protobuf message size of 64MB. Reading such files will throw the following exception {code} Exception in thread main com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at
[jira] [Updated] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-10697: Attachment: HIVE-10697.1.patch.txt Patch attached. ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700682#comment-14700682 ] Swarnim Kulkarni commented on HIVE-10697: - RB: https://reviews.apache.org/r/37563/ ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11383: --- Attachment: HIVE-11383.10.patch Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699580#comment-14699580 ] Swarnim Kulkarni commented on HIVE-11513: - [~xuefuz] Are we good to merge this? AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, HIVE-11513.3.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11579: Attachment: HIVE-11579-beeline-cli.patch For embedded mode, we should not close the statement. [~xuefuz] any suggestions for this? Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700775#comment-14700775 ] Ferdinand Xu commented on HIVE-11579: - This will cause the operation handlers leakage in the OperationManager side. The general process is as follows: {noformat} HiveStatement#close - HiveStatement#closeClientOperation - ThriftCLIService#closeOperation - CLIService#closeOperation - HiveSessionImpl#closeOperation - HiveCommandOperation#close - HiveCommandOperation#teardownIO {noformat} And you can see the how IO is initialized in Operation: {noformat} private void setupSessionIO(SessionState sessionState) { try { LOG.info(Putting temp output to file + sessionState.getTmpOutputFile().toString()); sessionState.in = null; // hive server's session input stream is not used // open a per-session file in auto-flush mode for writing temp results sessionState.out = new PrintStream(new FileOutputStream(sessionState.getTmpOutputFile()), true, UTF-8); // TODO: for hadoop jobs, progress is printed out to session.err, // we should find a way to feed back job progress to client sessionState.err = new PrintStream(System.err, true, UTF-8); } catch (IOException e) { LOG.error(Error in creating temp output file , e); try { sessionState.in = null; sessionState.out = new PrintStream(System.out, true, UTF-8); sessionState.err = new PrintStream(System.err, true, UTF-8); } catch (UnsupportedEncodingException ee) { LOG.error(Error creating PrintStream, e); ee.printStackTrace(); sessionState.out = null; sessionState.err = null; } } } {noformat} And how it closed: {noformat} private void tearDownSessionIO() { IOUtils.cleanup(LOG, parentSession.getSessionState().out); IOUtils.cleanup(LOG, parentSession.getSessionState().err); } {noformat} Another way we can get rid of is to add a tmp err file like tmp output file instead of using the system.err file descriptor in the session level. But it will cause no error printed in the server side. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11579-beeline-cli.patch We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11542) port fileId support on shims and splits from llap branch
[ https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-11542: -- Labels: TODOC2.0 (was: ) port fileId support on shims and splits from llap branch Key: HIVE-11542 URL: https://issues.apache.org/jira/browse/HIVE-11542 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-11542.patch This is helpful for any kind of file-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700771#comment-14700771 ] Shannon Ladymon commented on HIVE-10304: Thanks for the info! Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit
[ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700724#comment-14700724 ] Hive QA commented on HIVE-11592: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750949/HIVE-11592.1.patch {color:green}SUCCESS:{color} +1 9370 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4991/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12750949 - PreCommit-HIVE-TRUNK-Build ORC metadata section can sometimes exceed protobuf message size limit - Key: HIVE-11592 URL: https://issues.apache.org/jira/browse/HIVE-11592 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11592.1.patch If there are too many small stripes and with many columns, the overhead for storing metadata (column stats) can exceed the default protobuf message size of 64MB. Reading such files will throw the following exception {code} Exception in thread main com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990) at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961) at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314) at
[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch
[ https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700754#comment-14700754 ] Lefty Leverenz commented on HIVE-11542: --- Doc note: This adds *hive.orc.splits.include.fileid* to HiveConf.java, so it needs to be documented in the ORC section of Configuration Properties for release 2.0.0. * [Configuration Properties -- ORC File Format | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat] (Typo alert: the parameter description says thaty for that which would be nice to fix sometime.) port fileId support on shims and splits from llap branch Key: HIVE-11542 URL: https://issues.apache.org/jira/browse/HIVE-11542 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-11542.patch This is helpful for any kind of file-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699625#comment-14699625 ] Hive QA commented on HIVE-11383: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750799/HIVE-11383.10.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4986/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4986/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4986/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750799 - PreCommit-HIVE-TRUNK-Build Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE
[ https://issues.apache.org/jira/browse/HIVE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-11580: --- Attachment: HIVE-11580.1.patch ThriftUnionObjectInspector#toString throws NPE -- Key: HIVE-11580 URL: https://issues.apache.org/jira/browse/HIVE-11580 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-11580.1.patch ThriftUnionObjectInspector uses toString from StructObjectInspector, which accesses uninitialized member variable fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699912#comment-14699912 ] Swarnim Kulkarni commented on HIVE-3725: [~leftylev] Mind giving me access to the wiki so I can document this and a couple others on the wiki? Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11569) Use PreOrderOnceWalker where feasible
[ https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699931#comment-14699931 ] Ashutosh Chauhan commented on HIVE-11569: - Failures are unrelated. [~jcamachorodriguez] Can you take a look? Use PreOrderOnceWalker where feasible - Key: HIVE-11569 URL: https://issues.apache.org/jira/browse/HIVE-11569 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11569.patch, HIVE-11569.patch Because of its early exit criteria it has better performance than {{PreOrderWalker}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699947#comment-14699947 ] Lefty Leverenz commented on HIVE-3725: -- Happy to, [~swarnim], but you need a Confluence username. See [How to get permission to edit | https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit]. Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699946#comment-14699946 ] Xuefu Zhang commented on HIVE-11513: +1 AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, HIVE-11513.3.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699952#comment-14699952 ] Swarnim Kulkarni commented on HIVE-3725: Thanks [~leftylev]. My confluence username is swarnim Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11572) Datanucleus loads Log4j1.x Logger from AppClassLoader
[ https://issues.apache.org/jira/browse/HIVE-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699889#comment-14699889 ] Prasanth Jayachandran commented on HIVE-11572: -- These test failures are handled in HIVE-11575. Datanucleus loads Log4j1.x Logger from AppClassLoader - Key: HIVE-11572 URL: https://issues.apache.org/jira/browse/HIVE-11572 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11572.patch As part of HIVE-11304, we moved from Log4j1.x to Log4j2. But DataNucleus log messages gets logged to console when launching the hive cli. The reason is DataNucleus is trying to load Log4j1.x Logger by traversing its class loader. Although we use log4j-1.2-api bridge we are loading log4j-1.2.16 jar that was pulled by ZooKeeper. We should make sure that there is no log4j-1.2.16 in datanucleus classloader hierarchy (classpath). DataNucleus logger has this {code} NucleusLogger.class.getClassLoader().loadClass(org.apache.log4j.Logger); loggerClass = org.datanucleus.util.Log4JLogger.class; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11569) Use PreOrderOnceWalker where feasible
[ https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699920#comment-14699920 ] Hive QA commented on HIVE-11569: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750831/HIVE-11569.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4987/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4987/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4987/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750831 - PreCommit-HIVE-TRUNK-Build Use PreOrderOnceWalker where feasible - Key: HIVE-11569 URL: https://issues.apache.org/jira/browse/HIVE-11569 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11569.patch, HIVE-11569.patch Because of its early exit criteria it has better performance than {{PreOrderWalker}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699959#comment-14699959 ] Lefty Leverenz commented on HIVE-3725: -- Done. Welcome to the Hive wiki team, Swarnim. Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11581: Affects Version/s: 2.0.0 1.3.0 HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11569) Use PreOrderOnceWalker where feasible
[ https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11569: Attachment: HIVE-11569.patch Use PreOrderOnceWalker where feasible - Key: HIVE-11569 URL: https://issues.apache.org/jira/browse/HIVE-11569 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11569.patch, HIVE-11569.patch Because of its early exit criteria it has better performance than {{PreOrderWalker}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11383: --- Attachment: HIVE-11383.9.patch Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: HIVE-11424.01.patch Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699156#comment-14699156 ] Ferdinand Xu commented on HIVE-11579: - I came up with this error when I did some smoke tests. The root cause is that the standard error output is closed when the Hive Statement is closed. I can also reproduce this bug using the upstream code. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699158#comment-14699158 ] Ferdinand Xu commented on HIVE-11579: - We can reproduce this error using beeline based on the package built by upstream. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)
[ https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11375: Attachment: HIVE-11375.2.patch Broken processing of queries containing NOT (x IS NOT NULL and x 0) -- Key: HIVE-11375 URL: https://issues.apache.org/jira/browse/HIVE-11375 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0 Reporter: Mariusz Sakowski Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11375.2.patch, HIVE-11375.patch When running query like this: {code}explain select * from test where (val is not null and val 0);{code} hive will simplify expression in parenthesis and omit is not null check: {code} Filter Operator predicate: (val 0) (type: boolean) {code} which is fine. but if we negate condition using NOT operator: {code}explain select * from test where not (val is not null and val 0);{code} hive will also simplify thing, but now it will break stuff: {code} Filter Operator predicate: (not (val 0)) (type: boolean) {code} because valid predicate should be *val == 0 or val is null*, while above row is equivalent to *val == 0* only, filtering away rows where val is null simple example: {code} CREATE TABLE example ( val bigint ); INSERT INTO example VALUES (1), (NULL), (0); -- returns 2 rows - NULL and 0 select * from example where (val is null or val == 0); -- returns 1 row - 0 select * from example where not (val is not null and val 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE
[ https://issues.apache.org/jira/browse/HIVE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700084#comment-14700084 ] Hive QA commented on HIVE-11580: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750851/HIVE-11580.1.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4988/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4988/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4988/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750851 - PreCommit-HIVE-TRUNK-Build ThriftUnionObjectInspector#toString throws NPE -- Key: HIVE-11580 URL: https://issues.apache.org/jira/browse/HIVE-11580 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11580.1.patch ThriftUnionObjectInspector uses toString from StructObjectInspector, which accesses uninitialized member variable fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11341) Avoid expensive resizing of ASTNode tree
[ https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11341: - Attachment: HIVE-11341.8.patch Avoid expensive resizing of ASTNode tree - Key: HIVE-11341 URL: https://issues.apache.org/jira/browse/HIVE-11341 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, HIVE-11341.6.patch, HIVE-11341.7.patch, HIVE-11341.8.patch {code} Stack TraceSample CountPercentage(%) parse.BaseSemanticAnalyzer.analyze(ASTNode, Context) 1,605 90 parse.CalcitePlanner.analyzeInternal(ASTNode) 1,605 90 parse.SemanticAnalyzer.analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContext) 1,605 90 parse.CalcitePlanner.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genPlan(QB) 1,604 90 parse.SemanticAnalyzer.genPlan(QB, boolean) 1,604 90 parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map) 1,604 90 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, Operator, Map, boolean) 1,603 90 parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, Operator, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, TypeCheckProcFactory) 1,603 90 lib.DefaultGraphWalker.startWalking(Collection, HashMap) 1,579 89 lib.DefaultGraphWalker.walk(Node) 1,571 89 java.util.ArrayList.removeAll(Collection) 1,433 81 java.util.ArrayList.batchRemove(Collection, boolean) 1,433 81 java.util.ArrayList.contains(Object) 1,228 69 java.util.ArrayList.indexOf(Object)1,228 69 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11587: Description: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big and hope for the best on data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memSize to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb of whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only pre-allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of this in single JIRA so we could keep track of fixing all of this. I think there are JIRAs for some of this already, feel free to link them to this one. was: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big and hope for the best on data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memSize argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 2) Rename memSize to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Cap single write buffer size to 8-16Mb. 4) For hybrid, don't pre-allocate WBs - only pre-allocate on write. 5) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11587: Description: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memSize to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb of whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only pre-allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of this in single JIRA so we could keep track of fixing all of this. I think there are JIRAs for some of this already, feel free to link them to this one. was: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big and hope for the best on data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memSize to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb of whatever,
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11587: Description: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb of whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only pre-allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of this in single JIRA so we could keep track of fixing all of this. I think there are JIRAs for some of this already, feel free to link them to this one. was: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memSize to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11587: Description: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only pre-allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of this in single JIRA so we could keep track of fixing all of this. I think there are JIRAs for some of this already, feel free to link them to this one. was: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes
[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)
[ https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700256#comment-14700256 ] Aihua Xu commented on HIVE-11375: - [~ashutoshc] I notice that GenericUDFOPEqualOrGreaterThan's flip() is GenericUDFOPEqualOrLessThan() rather than GenericUDFOPLessThan(). Also the same for the other ops. Do you know what the flip() here really meant to be? Is that a bug or intentional? Broken processing of queries containing NOT (x IS NOT NULL and x 0) -- Key: HIVE-11375 URL: https://issues.apache.org/jira/browse/HIVE-11375 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0 Reporter: Mariusz Sakowski Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11375.2.patch, HIVE-11375.patch When running query like this: {code}explain select * from test where (val is not null and val 0);{code} hive will simplify expression in parenthesis and omit is not null check: {code} Filter Operator predicate: (val 0) (type: boolean) {code} which is fine. but if we negate condition using NOT operator: {code}explain select * from test where not (val is not null and val 0);{code} hive will also simplify thing, but now it will break stuff: {code} Filter Operator predicate: (not (val 0)) (type: boolean) {code} because valid predicate should be *val == 0 or val is null*, while above row is equivalent to *val == 0* only, filtering away rows where val is null simple example: {code} CREATE TABLE example ( val bigint ); INSERT INTO example VALUES (1), (NULL), (0); -- returns 2 rows - NULL and 0 select * from example where (val is null or val == 0); -- returns 1 row - 0 select * from example where not (val is not null and val 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11587: Description: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. was: Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate
[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable
[ https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700257#comment-14700257 ] Sergey Shelukhin commented on HIVE-11587: - [~mmokhtar] [~mmccline] [~gopalv] fyi Fix memory estimates for mapjoin hashtable -- Key: HIVE-11587 URL: https://issues.apache.org/jira/browse/HIVE-11587 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Wei Zheng Due to the legacy in in-memory mapjoin and conservative planning, the memory estimation code for mapjoin hashtable is currently not very good. It allocates the probe erring on the side of more memory, not taking data into account because unlike the probe, it's free to resize, so it's better for perf to allocate big probe and hope for the best with regard to future data size. It is not true for hybrid case. There's code to cap the initial allocation based on memory available (memUsage argument), but due to some code rot, the memory estimates from planning are not even passed to hashtable anymore (there used to be two config settings, hashjoin size fraction by itself, or hashjoin size fraction for group by case), so it never caps the memory anymore below 1 Gb. Initial capacity is estimated from input key count, and in hybrid join cache can exceed Java memory due to number of segments. There needs to be a review and fix of all this code. Suggested improvements: 1) Make sure initialCapacity argument from Hybrid case is correct given the number of segments. See how it's calculated from keys for regular case; it needs to be adjusted accordingly for hybrid case if not done already. 1.5) Note that, knowing the number of rows, the maximum capacity one will ever need for probe size (in longs) is row count (assuming key per row, i.e. maximum possible number of keys) divided by load factor, plus some very small number to round up. That is for flat case. For hybrid case it may be more complex due to skew, but that is still a good upper bound for the total probe capacity of all segments. 2) Rename memUsage to maxProbeSize, or something, make sure it's passed correctly based on estimates that take into account both probe and data size, esp. in hybrid case. 3) Make sure that memory estimation for hybrid case also doesn't come up with numbers that are too small, like 1-byte hashtable. I am not very familiar with that code but it has happened in the past. Other issues we have seen: 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you should not allocate large array in advance. Even if some estimate passes 500Mb or 40Mb or whatever, it doesn't make sense to allocate that. 5) For hybrid, don't pre-allocate WBs - only allocate on write. 6) Change everywhere rounding up to power of two is used to rounding down, at least for hybrid case (?) I wanted to put all of these items in single JIRA so we could keep track of fixing all of them. I think there are JIRAs for some of these already, feel free to link them to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11585) Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise
[ https://issues.apache.org/jira/browse/HIVE-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700267#comment-14700267 ] Sergey Shelukhin commented on HIVE-11585: - There may be implications of that in metastore code... IIRC all the MBlah objects become invalid when detached, which can be an epic PITA. Fixing this will need some testing. Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise Key: HIVE-11585 URL: https://issues.apache.org/jira/browse/HIVE-11585 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan datanucleus.detachAllOnCommit has a default value of false. However, we've observed a number of objects (especially FieldSchema objects) being retained that causes us OOM issues on the metastore. Hive should prefer using a default of datanucleus.detachAllOnCommit as true, unless otherwise explicitly overridden by users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8398) ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc
[ https://issues.apache.org/jira/browse/HIVE-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay Goyal reassigned HIVE-8398: -- Assignee: Akshay Goyal (was: Zhichun Wu) ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc - Key: HIVE-8398 URL: https://issues.apache.org/jira/browse/HIVE-8398 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Zhichun Wu Assignee: Akshay Goyal Attachments: HIVE-8398.2.patch, HIVE-8398.patch The following explain statement would fail in hive 0.13 and trunk with ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc exception: {code} create table test.t2( key string, value int); explain select sum(u.value) value from test.t2 u group by u.key having sum(u.value) 30; {code} The full stack trace: {code} java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1067) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:184) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:9561) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9517) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9488) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2314) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2295) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genHavingPlan(SemanticAnalyzer.java:2139) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8170) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8133) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8963) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9216) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} I think it's due to HIVE-3107. HIVE-3107 introduces alternate mapping for a column in RowResolver. While mapping the having clause in TypeCheckProcFactory, it first maps value to col_1(output of groupby clause) which has type of ExprNodeColumnDesc (Before HIVE-3107, value is not recognized). When it comes to u.value, it finds that u is a table alias but fails to cast nodeOutputs\[1\] to ExprNodeConstantDesc. Here I think we can use the text attribute in the expr node as colAlias instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8398) ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc
[ https://issues.apache.org/jira/browse/HIVE-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699093#comment-14699093 ] Akshay Goyal commented on HIVE-8398: It seems this has been fixed as part of HIVE-9867. The explain statement mentioned by [~wzc1989] in description is getting passed with the latest master. ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc - Key: HIVE-8398 URL: https://issues.apache.org/jira/browse/HIVE-8398 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Zhichun Wu Assignee: Akshay Goyal Attachments: HIVE-8398.2.patch, HIVE-8398.patch The following explain statement would fail in hive 0.13 and trunk with ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc exception: {code} create table test.t2( key string, value int); explain select sum(u.value) value from test.t2 u group by u.key having sum(u.value) 30; {code} The full stack trace: {code} java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1067) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:184) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:9561) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9517) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9488) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2314) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2295) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genHavingPlan(SemanticAnalyzer.java:2139) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8170) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8133) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8963) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9216) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} I think it's due to HIVE-3107. HIVE-3107 introduces alternate mapping for a column in RowResolver. While mapping the having clause in TypeCheckProcFactory, it first maps value to col_1(output of groupby clause) which has type of ExprNodeColumnDesc (Before HIVE-3107, value is not recognized). When it comes to u.value, it finds that u is a table alias but fails to cast nodeOutputs\[1\] to ExprNodeConstantDesc. Here I think we can use the text attribute in the expr node as colAlias instead.
[jira] [Commented] (HIVE-11429) Increase default JDBC result set fetch size (# rows it fetches in one RPC call) to 1000 from 50
[ https://issues.apache.org/jira/browse/HIVE-11429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700179#comment-14700179 ] Thejas M Nair commented on HIVE-11429: -- The test failures are not related. Increase default JDBC result set fetch size (# rows it fetches in one RPC call) to 1000 from 50 --- Key: HIVE-11429 URL: https://issues.apache.org/jira/browse/HIVE-11429 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.2.1 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-11429.1.patch This is in addition to HIVE-10982 which plans to make the fetch size customizable. This just bumps the default to 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11575) Fix test failures in master due to log4j changes
[ https://issues.apache.org/jira/browse/HIVE-11575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700188#comment-14700188 ] Sergey Shelukhin commented on HIVE-11575: - +1 Fix test failures in master due to log4j changes Key: HIVE-11575 URL: https://issues.apache.org/jira/browse/HIVE-11575 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11575.patch Some of the recent commits related to HIVE-11304 is causing test failures in master. Following tests are failing {code} org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11530) push limit thru outer join
[ https://issues.apache.org/jira/browse/HIVE-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700218#comment-14700218 ] Sergey Shelukhin commented on HIVE-11530: - Done! Thanks for working on this. push limit thru outer join -- Key: HIVE-11530 URL: https://issues.apache.org/jira/browse/HIVE-11530 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Sergey Shelukhin Assignee: Yuya OZAWA When the query has a left or right outer join with limit, we can push the limit into the left/right side of the join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11530) push limit thru outer join
[ https://issues.apache.org/jira/browse/HIVE-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11530: Assignee: Yuya OZAWA push limit thru outer join -- Key: HIVE-11530 URL: https://issues.apache.org/jira/browse/HIVE-11530 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Sergey Shelukhin Assignee: Yuya OZAWA When the query has a left or right outer join with limit, we can push the limit into the left/right side of the join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)
[ https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700221#comment-14700221 ] Ashutosh Chauhan commented on HIVE-11375: - Instead of negativeUDFMapping, other possibility is to use GenericUDF::flip() ? Broken processing of queries containing NOT (x IS NOT NULL and x 0) -- Key: HIVE-11375 URL: https://issues.apache.org/jira/browse/HIVE-11375 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0 Reporter: Mariusz Sakowski Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11375.2.patch, HIVE-11375.patch When running query like this: {code}explain select * from test where (val is not null and val 0);{code} hive will simplify expression in parenthesis and omit is not null check: {code} Filter Operator predicate: (val 0) (type: boolean) {code} which is fine. but if we negate condition using NOT operator: {code}explain select * from test where not (val is not null and val 0);{code} hive will also simplify thing, but now it will break stuff: {code} Filter Operator predicate: (not (val 0)) (type: boolean) {code} because valid predicate should be *val == 0 or val is null*, while above row is equivalent to *val == 0* only, filtering away rows where val is null simple example: {code} CREATE TABLE example ( val bigint ); INSERT INTO example VALUES (1), (NULL), (0); -- returns 2 rows - NULL and 0 select * from example where (val is null or val == 0); -- returns 1 row - 0 select * from example where not (val is not null and val 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
[ https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11579: Description: We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. was: We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu We can easily reproduce the debug by the following steps: {code} hive set system:xx=yy; hive lss; hive {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. This bug occurred also in the upstream when using the embeded mode as the new CLI uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)