So I tried to create the table using HBase API (with no data inserted into
table) and I got the query plan for drill 1.7.0
Drill 1.7.0-SNAPSHOT commit ID : 09b26277
0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
+--------------+------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+--------------+------------+--------------+
| row_key | ANY | NO |
| v | MAP | NO |
+--------------+------------+--------------+
2 rows selected (1.665 seconds)
Table creation Java program
{noformat}
public class PutIntDataToHBase {
public static void main(String args[]) throws IOException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.property.clientPort","5181");
HBaseAdmin admin = new HBaseAdmin(conf);
if (admin.tableExists("browser_action2")) {
admin.disableTable("browser_action2");
admin.deleteTable("browser_action2");
}
byte[][] SPLIT_KEYS =
{{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
HTableDescriptor tableDesc = new
HTableDescriptor(TableName.valueOf("browser_action2"));
tableDesc.addFamily(new HColumnDescriptor("v"));
admin.createTable(tableDesc,SPLIT_KEYS);
}
}
{noformat}
Query plan for the query that was reported as returning wrong results.
{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(k=[$0], p=[$1])
00-02 UnionExchange
01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
01-03 Project($f0=[$0], p=[$1])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02 HashAgg(group=[{0}], p=[COUNT($0)])
03-03 Project($f0=[ITEM($1, 'e0')])
03-04 Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
stopRow=, filter=null], columns=[`*`]]])
{noformat}
and the query plan for the other problem query mentioned in the first email.
{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k,
. . . . . . . . . . . . . . > count(a.row_key) p from hbase.browser_action2
a group by
. . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(k=[$0], p=[$1])
00-02 UnionExchange
01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
01-03 Project($f0=[$0], p=[$1])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02 HashAgg(group=[{0}], p=[COUNT($1)])
03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)],
row_key=[$0])
03-04 Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
{noformat}
Thanks,
Khurram
On Wed, May 18, 2016 at 7:01 AM, qiang li <[email protected]> wrote:
> Yes.
> I use hbase API to create it.
>
> The main code is:
>
> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'},
> {'7'},{'8'}, {'9'} };
> TableName tableName = TableName.valueOf("browser_action2");
>
> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> tableDesc.addFamily(columnDesc);
>
> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>
> admin.createTable(tableDesc, SPLIT_KEYS);
>
>
>
>
> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <[email protected]>:
>
>> Can you provide the CREATE TABLE statement you used to reproduce this
>> problem so we can try to reproduce it on our end.
>>
>> Thanks.
>>
>> -- Zelaine
>>
>> On Tue, May 17, 2016 at 4:50 AM, qiang li <[email protected]> wrote:
>>
>> > Hi ,
>> >
>> > I recently meet a issue that can not query the correct data from hbase
>> with
>> > sql by drill, can anybody help me.
>> >
>> > I test with the drill 1.6.
>> > My hbase scheme:
>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> > cf : v
>> > qualifier: v, e0, e1
>> >
>> > The wrong result only happened when I use group by clause.
>> >
>> > This sql will not return correct result:
>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> > hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`;
>> > Part of explain of this sql is:
>> >
>> > 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>> > a.row_key > '0' group by a.`v`.`e0`;
>> > +------+------+
>> > | text | json |
>> > +------+------+
>> > | 00-00 Screen
>> > 00-01 Project(k=[$0], p=[$1])
>> > 00-02 UnionExchange
>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
>> > 01-03 Project($f0=[$0], p=[$1])
>> > 01-04 HashToRandomExchange(dist0=[[$0]])
>> > 02-01 UnorderedMuxExchange
>> > 03-01 Project($f0=[$0], p=[$1],
>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > 03-02 HashAgg(group=[{0}], p=[COUNT($0)])
>> > 03-03 Project($f0=[ITEM($1, 'e0')])
>> > 03-04 Scan(groupscan=[HBaseGroupScan
>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>> > stopRow=, filter=null], columns=[`*`]]])
>> >
>> > The data return very quickly , the result of this sql is :
>> > +------+--------+
>> > | k | p |
>> > +------+--------+
>> > | pay | 12180 |
>> > +------+--------
>> >
>> > But I have millons of data in the table.
>> >
>> > I tried to change the physical plan. if I change the json explain
>> > *"columns"
>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return the
>> correct
>> > result.
>> >
>> > It seems the physical plan is not correct.
>> > I also try to debug the sql parser to find out the reason, but its too
>> > complicate. Can anyone help me.
>> >
>> > Also this sql have the same issue.
>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k,
>> > count(a.row_key) p from hbase.browser_action2 a group by
>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> > I change the json explain *"columns" : [ "`*`" ]* to *"columns" : [
>> > "`row_key`" ] *, it will return the correct result.
>> >
>>
>
>