[ 
https://issues.apache.org/jira/browse/DRILL-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-1691:
---------------------------------------

    Assignee: Arina Ielchiieva

> ConvertCountToDirectScan rule should be applicable for 2 or more COUNT 
> aggregates
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-1691
>                 URL: https://issues.apache.org/jira/browse/DRILL-1691
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.6.0
>            Reporter: Aman Sinha
>            Assignee: Arina Ielchiieva
>             Fix For: 1.12.0
>
>
> The ConvertCountToDirectScan rule currently only applies if there is a single 
> COUNT(*) or COUNT(column)  aggregate without group-by.   This rule should be 
> extended to apply for multiple such aggregates since the rule depends on the 
> underlying ParquetGroupScan providing it the correct column value count and 
> retrieving that count for multiple columns should be fine.  However, if even 
> 1 such column does not have statistics, then we should not apply this rule. 
> Here's an  example sequence: 
> First do a CTAS such that we ensure that statistics are present for the 
> table (the original Parquet data may not have stats):
> {code:sql}
> 0: jdbc:drill:zk=local> create table nation3 as select * from 
> cp.`tpch/nation.parquet`;
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 0_0        | 25                        |
> +------------+---------------------------+
> {code}
> The Explain below shows the count is retrieved directly from the Scan: 
> {code:sql}
> 0: jdbc:drill:zk=local> explain plan for select count(n_regionkey) as x from 
> nation3;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0])
> 00-02        
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@5db6cb92])
> {code}
> The following query which does 2 aggregates causes the StreamAgg to be 
> introduced in the plan which is not needed:  
> {code:sql}
> 0: jdbc:drill:zk=local> explain plan for select count(n_regionkey) as x, 
> count(n_nationkey) as y from nation3;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1])
> 00-02        StreamAgg(group=[{}], x=[COUNT($0)], y=[COUNT($1)])
> 00-03          Project(n_regionkey=[$1], n_nationkey=[$0])
> 00-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/tmp/nation3]], selectionRoot=/tmp/nation3, numFiles=1, 
> columns=[`n_regionkey`, `n_nationkey`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to