[jira] [Created] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

2019-07-29 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8807:
-

 Summary: OPTIMIZE_PARTITION_KEY_SCANS works in more cases than 
documented
 Key: IMPALA-8807
 URL: https://issues.apache.org/jira/browse/IMPALA-8807
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This came up here 
https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1

Our docs say

{quote}
This optimization does not apply if the queries contain any WHERE, GROUP BY, or 
HAVING clause. The relevant queries should only compute the minimum, maximum, 
or number of distinct values for the partition key columns across the whole 
table.
{quote}

This is false. Here's  query illustrating it working with all three things:
{noformat}
[localhost:21000] default> set optimize_partition_key_scans=true; explain 
select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000;
OPTIMIZE_PARTITION_KEY_SCANS set to true
Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
| Per-Host Resource Estimates: Memory=10MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 01:AGGREGATE [FINALIZE]|
| |  output: max(ss_sold_date_sk)|
| |  group by: ss_sold_date_sk   |
| |  having: max(ss_sold_date_sk) > 1000 |
| |  row-size=8B cardinality=182 |
| |  |
| 00:UNION   |
|constant-operands=182   |
|row-size=4B cardinality=182 |
++
Fetched 15 row(s) in 0.11s
{noformat}

We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

2019-07-29 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8807:
-

 Summary: OPTIMIZE_PARTITION_KEY_SCANS works in more cases than 
documented
 Key: IMPALA-8807
 URL: https://issues.apache.org/jira/browse/IMPALA-8807
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This came up here 
https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1

Our docs say

{quote}
This optimization does not apply if the queries contain any WHERE, GROUP BY, or 
HAVING clause. The relevant queries should only compute the minimum, maximum, 
or number of distinct values for the partition key columns across the whole 
table.
{quote}

This is false. Here's  query illustrating it working with all three things:
{noformat}
[localhost:21000] default> set optimize_partition_key_scans=true; explain 
select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000;
OPTIMIZE_PARTITION_KEY_SCANS set to true
Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
| Per-Host Resource Estimates: Memory=10MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 01:AGGREGATE [FINALIZE]|
| |  output: max(ss_sold_date_sk)|
| |  group by: ss_sold_date_sk   |
| |  having: max(ss_sold_date_sk) > 1000 |
| |  row-size=8B cardinality=182 |
| |  |
| 00:UNION   |
|constant-operands=182   |
|row-size=4B cardinality=182 |
++
Fetched 15 row(s) in 0.11s
{noformat}

We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org