[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193738#comment-14193738 ] Mithun Radhakrishnan commented on HIVE-8313: I'll wait for Navis's approval for the updated patch. He's had a look at the first go. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Priority: Critical Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194085#comment-14194085 ] Mithun Radhakrishnan commented on HIVE-8313: For the record, the second patch is near identical to the version Navis +1-ed in the comments above, save for the primitive-array change and the tests pass (barring the usual suspects). Let's see if [~navis] agrees with the change, hopefully in time to make 0.14. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Priority: Critical Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194136#comment-14194136 ] Navis commented on HIVE-8313: - +1, No need to wait for me. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Priority: Critical Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194204#comment-14194204 ] Mithun Radhakrishnan commented on HIVE-8313: Thank you for reviewing, [~navis]! Much appreciated. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Priority: Critical Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193513#comment-14193513 ] Mithun Radhakrishnan commented on HIVE-8313: FWIW, the test failure doesn't look related to this change. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193540#comment-14193540 ] Gopal V commented on HIVE-8313: --- [~mithun]: are you planning to include this for 0.14? This would be a good addition. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193700#comment-14193700 ] Gopal V commented on HIVE-8313: --- [~hagleitn]: Can we include this into 0.14? Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193707#comment-14193707 ] Mithun Radhakrishnan commented on HIVE-8313: @[~gopalv]: Yes, I'd like this to be included in 0.14, if possible. There's tangible gains in performance with this fix. Pinging [~hagleitn]. I hope I haven't missed the boat with this one. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193709#comment-14193709 ] Gunther Hagleitner commented on HIVE-8313: -- +1 for hive .14. Not yet [~mithun] - but is it ready to go? Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: Performance Fix For: 0.14.0 Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191832#comment-14191832 ] Hive QA commented on HIVE-8313: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12678371/HIVE-8313.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6607 tests executed *Failed tests:* {noformat} org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1574/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1574/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1574/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12678371 - PreCommit-HIVE-TRUNK-Build Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191124#comment-14191124 ] Mithun Radhakrishnan commented on HIVE-8313: Hello, [~navis]. Thanks for reviewing. (Apologies for getting to this so late.) I suppose I could change {{childrenNeedingPrepare}} to an array, but the size wouldn't be known until the end of {{initialize()}}. Would you recommend that I create a temp-list and convert that to an array? Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191132#comment-14191132 ] Navis commented on HIVE-8313: - Yes, If it's accessed per row basis, it would be better to minimize footprint for it. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191138#comment-14191138 ] Navis commented on HIVE-8313: - Good. Let's see the result of test. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch, HIVE-8313.2.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161531#comment-14161531 ] Navis commented on HIVE-8313: - My bad, +1. Could you change childrenNeedingPrepare to an array? Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch Consider the following query: {code:sql} SELECT foo, bar, goo, id FROM myTable WHERE id IN ( 'A', 'B', 'C', 'D', ... , 'ZZ' ); {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153741#comment-14153741 ] Mithun Radhakrishnan commented on HIVE-8313: This seems to have to do with the changes introduced in HIVE-4209, to provide caching for evaluation of deterministic sub-expressions. In this particular case, the problem occurs in {{ExprNodeGenericFuncEvaluator::_evaluate()}}: {code:title=ExprNodeGenericFuncEvaluator.java|borderStyle=solid} @Override protected Object _evaluate(Object row, int version) throws HiveException { rowObject = row; if (ObjectInspectorUtils.isConstantObjectInspector(outputOI) isDeterministic()) { // The output of this UDF is constant, so don't even bother evaluating. return ((ConstantObjectInspector)outputOI).getWritableConstantValue(); } for (int i = 0; i deferredChildren.length; i++) { deferredChildren[i].prepare(version); } return genericUDF.evaluate(deferredChildren); } {code} In Hive 0.10, the {{deferredChildren[i].evaluate()}} would be skipped in its entirety, for non-eager evaluation. In Hive 0.12, that condition is checked within the {{prepare()}} function, on every invocation, for *each record*, with explosive effect. A lot of this cost can be saved by skipping prepare() for {{ExprNodeEvaluator}}s which yield the same value regardless of the row. E.g. {{ExprNodeConstantEvaluator}} and {{ExprNodeNullEvaluator}}. I'll post a patch for this shortly. Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Consider the following query: {code} SELECT foo, bar, goo, id FROM myTable WHERE id IN { 'A', 'B', 'C', 'D', ... , 'ZZ' }; {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154404#comment-14154404 ] Hive QA commented on HIVE-8313: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12672134/HIVE-8313.1.patch {color:green}SUCCESS:{color} +1 6378 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1064/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1064/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1064/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12672134 Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator --- Key: HIVE-8313 URL: https://issues.apache.org/jira/browse/HIVE-8313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-8313.1.patch Consider the following query: {code} SELECT foo, bar, goo, id FROM myTable WHERE id IN { 'A', 'B', 'C', 'D', ... , 'ZZ' }; {code} One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10. I have a possibly incomplete fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)