I did some research and find out that the logic plan not good if contains count(distinct ) clause.
I have to change my sql to select count(*) from (select aa, sum() from tt group by aa) t to avoid this issue. 2016-08-09 14:24 GMT+08:00 qiang li <[email protected]>: > Hi > > We use drill to query hbase. > > Recently I check one of our query's plan: > 0: jdbc:drill:drillbit=rfdc2> explain plan for SELECT '2016-07-27 00:00' > as key, sum(convert_from(action.`v`.`c`,'INT_BE')) AS pv,COUNT(DISTINCT > BYTE_SUBSTR(action.row_key,-8,8)) AS uv, > SUM(convert_from(action.`v`.`s`,'INT_BE')) > AS v from hbase.`action_combine` as action where action.row_key > >'020160727pay.search' and action.row_key < '020160727pay.searci' ; > > the result is : > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(key=[$0], pv=[$1], uv=[$2], v=[$3]) > 00-02 Project(key=['2016-07-27 00:00'], pv=[$0], uv=[$2], v=[$1]) > 00-03 NestedLoopJoin(condition=[true], joinType=[inner]) > 00-05 StreamAgg(group=[{}], pv=[SUM($0)], v=[SUM($1)]) > 00-07 StreamAgg(group=[{}], pv=[SUM($0)], v=[SUM($2)]) > 00-08 Project($f0=[CONVERT_FROMINT_BE(ITEM($1, 'c'))], > $f1=[BYTE_SUBSTR($0, -8, 8)], $f2=[CONVERT_FROMINT_BE(ITEM($1, 's'))]) > 00-09 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=action_combine, > startRow=020160727pay.search\x00, stopRow=020160727pay.searci, > filter=null], columns=[`*`]]]) > 00-04 StreamAgg(group=[{}], uv=[$SUM0($0)]) > 00-06 UnionExchange > 01-01 StreamAgg(group=[{}], uv=[COUNT($0)]) > 01-02 HashAgg(group=[{0}]) > 01-03 Project($f1=[$0]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01 UnorderedMuxExchange > 03-01 Project($f1=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02 HashAgg(group=[{0}]) > 03-03 Project($f1=[BYTE_SUBSTR($0, -8, 8)]) > 03-04 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=action_combine, > startRow=020160727pay.search\x00, stopRow=020160727pay.searci, > filter=null], columns=[`*`]]]) > > My question is as there have two scan in the plan , does this mean's the > query will execute twice? > >
