[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048517#comment-14048517 ] Lefty Leverenz commented on HIVE-6492: -- *hive.limit.query.max.table.partition* is documented in the wiki here: * [Configuration Properties: hive.limit.query.max.table.partition | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.limit.query.max.table.partition] I also added a comment to HIVE-6586 so *hive.limit.query.max.table.partition* won't get lost in the shuffle when HIVE-6037 changes HiveConf.java. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951150#comment-13951150 ] Selina Zhang commented on HIVE-6492: [~leftylev] Thanks for reminding! We can put This controls how many partitions can be scanned for each partitioned table. The default value -1 means no limit. What do you think? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951695#comment-13951695 ] Lefty Leverenz commented on HIVE-6492: -- That's good, if that's enough information for users. Though I'm curious what happens when a query exceeds the limit ... oh ... you explained that in your March 4th comment, the query fails with an error message. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948934#comment-13948934 ] Lefty Leverenz commented on HIVE-6492: -- This adds *hive.limit.query.max.table.partition* to HiveConf.java but it needs a description. There's plenty of description in the comments, but a release note would be helpful. Then I could put it in the wiki, and make sure the description goes into the new HiveConf.java (via HIVE-6586) after HIVE-6037 gets committed. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948609#comment-13948609 ] Alan Gates commented on HIVE-6492: -- Ran the tests locally, all looks good. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947223#comment-13947223 ] Ashutosh Chauhan commented on HIVE-6492: I left following comment: bq. I thought you want this limit to be applied on cumulative partitions count or limit is meant for per TSOperator? I see you don't updated that part of code. Was it intentional? Currently, limit would be considered per TSOperator (table), not across all tables referred in query. Either way is fine with me, just want to confirm you intend limit per table, not across all tables in query. Other than looks good. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947410#comment-13947410 ] Selina Zhang commented on HIVE-6492: Thanks, Ashutosh! Yes, it just limits partition per table scan intentionally. It based on the assumption that most of queries only involve one instance table. And it is more like a supplement for strict mode. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947433#comment-13947433 ] Ashutosh Chauhan commented on HIVE-6492: Ok. +1 limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942548#comment-13942548 ] Selina Zhang commented on HIVE-6492: Review request is here: https://reviews.apache.org/r/19373/ limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924459#comment-13924459 ] Selina Zhang commented on HIVE-6492: [~hagleitn] Thank you for the suggestions! I will work on the suggestion 2 and move the code to the driver. Because currently I am working on a patch to shorten the execution time for the metadata only query (which is important for BI tools). I prefer leaving out metadata only query from this limitation. What do you think? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923257#comment-13923257 ] Gunther Hagleitner commented on HIVE-6492: -- [~selinazh] limit_partition_3.q: The query (select count(*) from part) will succeed if you turn ON compute.query.using.stats and will fail if you turn it off. That's because in the first case it doesn't do a table scan, while in the second it does. (the limit_partition_3.q.out is hard to read, but you can see it there). The (select hr from srcpart) ... yeah you're right. I missed that. Let me take another look. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923657#comment-13923657 ] Gunther Hagleitner commented on HIVE-6492: -- Looked at it some more. Finally get what you were saying about metadata only. I think we can go two ways: a) use patch as is. since metadata only still launches a job with potentially a lot of tasks (one split per file it seems). b) fix it like you were, but change the variable to count files not partitions (you don't have access to partitions anymore in the lower layers.) and move the code to driver so it works for both mr and tez. [~selinazh] - what works better for you? since i sent you on this wild goose chase, i can take another shot at updating it... limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920146#comment-13920146 ] Gunther Hagleitner commented on HIVE-6492: -- [~selinazh] - sorry if i messed up the metadata only part. Can you give me an example where the patch doesn't work? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920237#comment-13920237 ] Selina Zhang commented on HIVE-6492: In the new test case limit_partition_2.q: select distinct hr from srcpart; should let pass because hr is the partition key. With the new patch, it is blocked: FAILED: SemanticException Number of partitions scanned (=4) on table srcpart exceeds limit (=1). This is controlled by hive.limit.query.max.table.partition. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920358#comment-13920358 ] Selina Zhang commented on HIVE-6492: Also should let the test case pass in limit_partition_3.q set hive.compute.query.using.stats=true; set hive.limit.query.max.table.partition=1; select count(*) from part; for it does not need a table scan. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920579#comment-13920579 ] Hive QA commented on HIVE-6492: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12632689/HIVE-6492.5.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5358 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1623/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1623/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12632689 limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, HIVE-6492.5.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918793#comment-13918793 ] Gunther Hagleitner commented on HIVE-6492: -- [~selinazh] can you open a reviewboard request for this. I have a few more comments: - Can you add a test for stats optimizer? I think since you're checking for explicit limit on fetch operator that would still bail (i.e.: select count(*) from foo with stats available and hive.compute.query.using.stats = true) - Your patch only works in MR (since you're computing access at the physical level) - We already have the pruned list of partitions available at the logical level If you move your code to right after we call Optimizer.optimize in the SemanticAnalyzer you can make both cases work. Logic should be: - If there is a fetch operator at this level let it pass (no mapreduce job will be launched) - Otherwise go through parse context's top ops and use opToPartPruner to find out how many partitions are going to be accessed. Does that make sense? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918886#comment-13918886 ] Hive QA commented on HIVE-6492: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12632353/HIVE-6492.3.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5239 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1608/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1608/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12632353 limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917363#comment-13917363 ] Hive QA commented on HIVE-6492: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12631911/HIVE-6492.2.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5202 tests executed *Failed tests:* {noformat} org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1584/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1584/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12631911 limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915648#comment-13915648 ] Gunther Hagleitner commented on HIVE-6492: -- [~selinazh]: There's a similar safety variable already present in hive: HiveConf.ConfVars.HIVEMAPREDMODE / hive.mapred.mode When turned on it enforces that every query has a condition that prunes partitions from the table it's running against. It's not the same but very similar and might satisfy your requirements. The assumption is that if a user has added a pruning condition they have though about properly limiting the amount of data to be scanned. Does that work for you? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916310#comment-13916310 ] Selina Zhang commented on HIVE-6492: Strict mode disables types of queries we cannot disabled. We need: 1. enable queries on small table without partition filters; 2. select * from table issues from Tableau, because it is a must to enable Tableau connects Hive Server directly through ODBC driver; 3. Enable aggregation on partition keys without partition limits. Thanks for reviewing the changes! limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916358#comment-13916358 ] Gunther Hagleitner commented on HIVE-6492: -- Thanks, Selina. Just trying to understand the requirements to see what's the best way to get this in. One question is whether you can deploy different configs in these scenarios. E.g: use a different site file is someone is starting hive on the console v tools. Or use an alias to add a --hiveconf on the node where users start hive. You're trying to protect the cluster from large jobs - in your case you seem to want to turn this on for certain interfaces and off for others, but for other deployments that might not make much sense (the interface (ODBC/JDBC/CLI) doesn't say if it's a human, tool, etc). But specifically: 1) What's small? Sounds like if it's a query doesn't submit a job you want to let it go through? Or only if there's an explicit limit clause? 2) That's the same as 1 - if you just check for no job started 3) Aggregation on partition key right now will scan the entire table in a massive map-red job. Definitely something that should be fixed - but there's no optimization for that yet afaik. Allowing this query seems to defeat the purpose of the this flag doesn't it? Seems like again you just want to check for no job started. With that - it would make sense to update/extend the hive.mapred.mode variable to allow for queries that don't actually start a job (and allow jobs only with explicit partition pruning). That change + different config for different interfaces you should get all that you want and would be simpler. Correct? limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916765#comment-13916765 ] Selina Zhang commented on HIVE-6492: The original patch actually has two tasks included: 1. limit the partition number when a table scan happens: 2. a hack to identify the query from Tableau and do special handling for it. As we discussed, the second task is just a hack and probably it is not helpful if commit it to trunk. So I created a new patch which only contains the first task. The reason of introducing this configure variable is we want to limit the number of partitions when do table scan. As for metadata only query, since HIVE-1003 has the optimization on this type of query , the table scan is not a problem any more. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
[ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913795#comment-13913795 ] Selina Zhang commented on HIVE-6492: It is not a rare case when a table has 1000+ partitions. To avoid people issue a query lack of knowledge how many partitions will be scanned, introducing one more configure variable hive.limit.query.max.table.partition will enable system admin to protect the grid. The default value is set to -1 which means no limit. This variable will be ignored in the following cases: 1. Simple fetch query with limit : select * from table limit n; 2. Metadata only query: select distinct partition_key from partition_table; There is one special case: Sometimes BI tools such as Tableau (connected through ODBC driver) will issue select * from table at the initial stage to figure out table meta data. It will not hurt the grid because Tableau will cancel the query after it receives one or two rows. To allow Tableau still can work, code is added to mark the query client types such as CLIDriver and JDBC. And only allow ODBC-sourced query go through. limit partition number involved in a table scan --- Key: HIVE-6492 URL: https://issues.apache.org/jira/browse/HIVE-6492 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: Selina Zhang Fix For: 0.13.0 Original Estimate: 24h Remaining Estimate: 24h To protect the cluster, a new configure variable hive.limit.query.max.table.partition is added to hive configuration to limit the table partitions involved in a table scan. The default value will be set to -1 which means there is no limit by default. This variable will not affect metadata only query. -- This message was sent by Atlassian JIRA (v6.1.5#6160)