Hi ,
I recently meet a issue that can not query the correct data from hbase with
sql by drill, can anybody help me.
I test with the drill 1.6.
My hbase scheme:
rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
cf : v
qualifier: v, e0, e1
The wrong result only happened when I use group by clause.
This sql will not return correct result:
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`;
Part of explain of this sql is:
0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
a.row_key > '0' group by a.`v`.`e0`;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(k=[$0], p=[$1])
00-02 UnionExchange
01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
01-03 Project($f0=[$0], p=[$1])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02 HashAgg(group=[{0}], p=[COUNT($0)])
03-03 Project($f0=[ITEM($1, 'e0')])
03-04 Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
stopRow=, filter=null], columns=[`*`]]])
The data return very quickly , the result of this sql is :
+------+--------+
| k | p |
+------+--------+
| pay | 12180 |
+------+--------
But I have millons of data in the table.
I tried to change the physical plan. if I change the json explain *"columns"
: [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return the correct
result.
It seems the physical plan is not correct.
I also try to debug the sql parser to find out the reason, but its too
complicate. Can anyone help me.
Also this sql have the same issue.
select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k,
count(a.row_key) p from hbase.browser_action2 a group by
BYTE_SUBSTR(a.row_key, 1 , 9);
I change the json explain *"columns" : [ "`*`" ]* to *"columns" : [
"`row_key`" ] *, it will return the correct result.