[ https://issues.apache.org/jira/browse/DRILL-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arina Ielchiieva reassigned DRILL-1691: --------------------------------------- Assignee: Arina Ielchiieva > ConvertCountToDirectScan rule should be applicable for 2 or more COUNT > aggregates > --------------------------------------------------------------------------------- > > Key: DRILL-1691 > URL: https://issues.apache.org/jira/browse/DRILL-1691 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization > Affects Versions: 0.6.0 > Reporter: Aman Sinha > Assignee: Arina Ielchiieva > Fix For: 1.12.0 > > > The ConvertCountToDirectScan rule currently only applies if there is a single > COUNT(*) or COUNT(column) aggregate without group-by. This rule should be > extended to apply for multiple such aggregates since the rule depends on the > underlying ParquetGroupScan providing it the correct column value count and > retrieving that count for multiple columns should be fine. However, if even > 1 such column does not have statistics, then we should not apply this rule. > Here's an example sequence: > First do a CTAS such that we ensure that statistics are present for the > table (the original Parquet data may not have stats): > {code:sql} > 0: jdbc:drill:zk=local> create table nation3 as select * from > cp.`tpch/nation.parquet`; > +------------+---------------------------+ > | Fragment | Number of records written | > +------------+---------------------------+ > | 0_0 | 25 | > +------------+---------------------------+ > {code} > The Explain below shows the count is retrieved directly from the Scan: > {code:sql} > 0: jdbc:drill:zk=local> explain plan for select count(n_regionkey) as x from > nation3; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(x=[$0]) > 00-02 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@5db6cb92]) > {code} > The following query which does 2 aggregates causes the StreamAgg to be > introduced in the plan which is not needed: > {code:sql} > 0: jdbc:drill:zk=local> explain plan for select count(n_regionkey) as x, > count(n_nationkey) as y from nation3; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(x=[$0], y=[$1]) > 00-02 StreamAgg(group=[{}], x=[COUNT($0)], y=[COUNT($1)]) > 00-03 Project(n_regionkey=[$1], n_nationkey=[$0]) > 00-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=file:/tmp/nation3]], selectionRoot=/tmp/nation3, numFiles=1, > columns=[`n_regionkey`, `n_nationkey`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)