I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
again, The result are correct now. Thanks for the help.
Even though, I think the query plan still can be optimized.
Here is what I think can improve:
a. specify the columns when need
b. remove the rowFilter when scan startRow and stopRow can meet it.
for example : select row_key from hbase.browser_action a where
a.row_key >'0' and a.row_key < '1' , I think sql like this will query
faster without rowfilter.
2016-05-24 9:29 GMT+08:00 qiang li <[email protected]>:
> Yes, Its seems like the same issue. I will upgrade it and test again. But
> do you think we can update the physical plan too. If we only want to query
> one qualifier, then the columns of the plan should only contains the
> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>
> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <[email protected]>:
>
>> Hi Qiang,
>>
>> Looks like you might be encountering this issue:
>> https://issues.apache.org/jira/browse/DRILL-4271
>>
>> Thanks
>>
>> On Sun, May 22, 2016 at 8:38 PM, qiang li <[email protected]> wrote:
>>
>> > I test it step by step again. And finally I find out that the issue
>> > happened only if the qualifier number is more than 3.
>> >
>> > It's werid, but this is the result I test.
>> >
>> > I tested about 10 thousands row of data. The length of the event is
>> 6,the
>> > code I used to test is like below:
>> >
>> > String[] earr = action.getEvent().geteArr();
>> > for(int i=0;i<6;i++){
>> > put.addColumn(family, Bytes.toBytes("e"+i),
>> Bytes.toBytes(earr[i]));
>> > }
>> >
>> > Then I test step by step like below :
>> >
>> > put.addColumn(family, Bytes.toBytes("e0"), Bytes.toBytes("e0"));
>> > put.addColumn(family, Bytes.toBytes("e1"), Bytes.toBytes("e1"));
>> > put.addColumn(family, Bytes.toBytes("e2"), Bytes.toBytes("e2"));
>> > put.addColumn(family, Bytes.toBytes("e3"), Bytes.toBytes("e3"));
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>> > group by a.`v`.`e0`;
>> > +-----+-------+
>> > | k | p |
>> > +-----+-------+
>> > | e0 | 3856 |
>> > +-----+-------+
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>> > +-----+-------+
>> > | k | p |
>> > +-----+-------+
>> > | e0 | 9094 |
>> > +-----+-------+
>> >
>> >
>> > if I put the qualifier "e3" into the table(that is more than 3
>> qulifier),
>> > then the issue reproduced.
>> >
>> >
>> > 2016-05-23 9:23 GMT+08:00 qiang li <[email protected]>:
>> >
>> > > Sorry late.
>> > >
>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>> > >
>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <[email protected]>:
>> > >
>> > >> Qiang, Can you please let us know the hbase version and hadoop
>> > >> distribution
>> > >> version that you are using.
>> > >>
>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>> [email protected]>
>> > >> wrote:
>> > >>
>> > >> > Can you please let us know the hbase version and hadoop
>> distribution
>> > >> > version that you are using.
>> > >> >
>> > >> >
>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <[email protected]>
>> > wrote:
>> > >> >
>> > >> >> Khurram , I send the mail again, the last mail forget to cc to
>> > >> >> [email protected]
>> > >> >>
>> > >> >> The main process is the same, but my rowkey is more complicate,
>> > >> >> Here is the detail I tested.
>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>> > >> [event] +
>> > >> >> [uid long] + [ts long]
>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
>> > others
>> > >> >> are
>> > >> >> string.
>> > >> >>
>> > >> >> example:
>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> > >> >> ROW
>> > >> COLUMN+CELL
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e0,
>> > >> >> timestamp=1461839343076, value=pay
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e1,
>> > >> >> timestamp=1461839343076, value=bijia
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e2,
>> > >> >> timestamp=1461839343076, value=browser
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e3,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e4,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e5,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:h,
>> > >> >> timestamp=1459771200000, value=20
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:m,
>> > >> >> timestamp=1459771200000, value=0
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:v,
>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>> > >> >>
>> > >> >>
>> > >> >> 1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 1 row(s) in 0.0410 seconds
>> > >> >>
>> > >> >>
>> > >> >> Here is the example I how the issue look like:
>> > >> >>
>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> > >> >> STARTROW=> '0'}
>> > >> >> ........
>> > >> >> 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029448, value=visit
>> > >> >>
>> > >> >>
>> > >> >> x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029217, value=visit
>> > >> >>
>> > >> >>
>> > >> >> x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029295, value=visit
>> > >> >>
>> > >> >>
>> > >> >> x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 9994 row(s) in 123.8650 seconds
>> > >> >>
>> > >> >> the drill result:
>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8')
>> as k,
>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >
>> > '0'
>> > >> >> group by a.`v`.`e0`;
>> > >> >> +--------+-------+
>> > >> >> | k | p |
>> > >> >> +--------+-------+
>> > >> >> | visit | 1216 |
>> > >> >> +--------+-------+
>> > >> >>
>> > >> >>
>> > >> >> I find out that if the row size larger than 10000 will have the
>> > issue.
>> > >> >> The
>> > >> >> result is right if less that 1000 rows. But not always that way.
>> > >> >> What I can make sure is if I updated the columns in the physical
>> plan
>> > >> and
>> > >> >> query by web UI , the result will be correct.
>> > >> >>
>> > >> >>
>> > >> >> Thanks
>> > >> >>
>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <[email protected]>:
>> > >> >>
>> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if
>> the
>> > >> data
>> > >> >> > set used in my repro is the same as the one you have used. If
>> the
>> > >> data
>> > >> >> set
>> > >> >> > is different please let us know the type of data that you have
>> used
>> > >> in
>> > >> >> your
>> > >> >> > table.
>> > >> >> >
>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
>> > >> results.
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Khurram
>> > >> >> >
>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Khurram, DRILL-4686 seems like a different issue...it is
>> > reporting
>> > >> an
>> > >> >> >> error whereas the original problem from qiang was an incorrect
>> > >> result.
>> > >> >> >> Can
>> > >> >> >> you use the same version (1.6) that he was using. Also, is the
>> > data
>> > >> >> set
>> > >> >> >> similar ? If you are unable to repro the exact same issue,
>> > perhaps
>> > >> >> qiang
>> > >> >> >> should file a JIRA with a smaller repro if possible.
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
>> > >> [email protected]>
>> > >> >> >> wrote:
>> > >> >> >>
>> > >> >> >> > Hello Qiang,
>> > >> >> >> >
>> > >> >> >> > DRILL-4686 is reported to track this problem.
>> > >> >> >> >
>> > >> >> >> > Thanks,
>> > >> >> >> > Khurram
>> > >> >> >> >
>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >> >
>> > >> >> >> >> Ok, Thanks very much.
>> > >> >> >> >>
>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
>> > [email protected]
>> > >> >:
>> > >> >> >> >>
>> > >> >> >> >>> Hello Qiang,
>> > >> >> >> >>>
>> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back to
>> you
>> > >> >> soon. I
>> > >> >> >> >>> work from the India lab and I am in a different time zone
>> as
>> > >> >> compared
>> > >> >> >> to
>> > >> >> >> >>> San Jose office, some one from MapR San Jose will get back
>> to
>> > >> you
>> > >> >> as
>> > >> >> >> soon
>> > >> >> >> >>> as possible.
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Khurram
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
>> > [email protected]
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's
>> the
>> > >> >> >> >>>> conclusion?
>> > >> >> >> >>>>
>> > >> >> >> >>>> Any idea how to sovle it?
>> > >> >> >> >>>>
>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
>> > >> [email protected]>:
>> > >> >> >> >>>>
>> > >> >> >> >>>>> So I tried to create the table using HBase API (with no
>> data
>> > >> >> >> inserted
>> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | row_key | ANY | NO |
>> > >> >> >> >>>>> | v | MAP | NO |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Table creation Java program
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> public class PutIntDataToHBase {
>> > >> >> >> >>>>> public static void main(String args[]) throws
>> > IOException
>> > >> {
>> > >> >> >> >>>>> Configuration conf = HBaseConfiguration.create();
>> > >> >> >> >>>>>
>> > >> conf.set("hbase.zookeeper.property.clientPort","5181");
>> > >> >> >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf);
>> > >> >> >> >>>>> if (admin.tableExists("browser_action2")) {
>> > >> >> >> >>>>> admin.disableTable("browser_action2");
>> > >> >> >> >>>>> admin.deleteTable("browser_action2");
>> > >> >> >> >>>>> }
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> byte[][] SPLIT_KEYS =
>> > >> >> >> >>>>>
>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> > >> >> >> >>>>> HTableDescriptor tableDesc = new
>> > >> >> >> >>>>>
>> > >> >> HTableDescriptor(TableName.valueOf("browser_action2"));
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v"));
>> > >> >> >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS);
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> }
>> > >> >> >> >>>>> }
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Query plan for the query that was reported as returning
>> > wrong
>> > >> >> >> results.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`)
>> p
>> > >> from
>> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by
>> > >> >> a.`v`.`e0`;
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00 Screen
>> > >> >> >> >>>>> 00-01 Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02 UnionExchange
>> > >> >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03 Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01 UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01 Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02 HashAgg(group=[{0}],
>> > p=[COUNT($0)])
>> > >> >> >> >>>>> 03-03 Project($f0=[ITEM($1,
>> 'e0')])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=0\x00,
>> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> and the query plan for the other problem query mentioned
>> in
>> > >> the
>> > >> >> >> first
>> > >> >> >> >>>>> email.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as
>> k,
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> > >> >> >> >>>>> hbase.browser_action2 a group by
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 ,
>> 9);
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00 Screen
>> > >> >> >> >>>>> 00-01 Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02 UnionExchange
>> > >> >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03 Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01 UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01 Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02 HashAgg(group=[{0}],
>> > p=[COUNT($1)])
>> > >> >> >> >>>>> 03-03
>> Project($f0=[BYTE_SUBSTR($0, 1,
>> > >> 9)],
>> > >> >> >> >>>>> row_key=[$0])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=null,
>> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Thanks,
>> > >> >> >> >>>>> Khurram
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
>> > >> [email protected]>
>> > >> >> >> >>>>> wrote:
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>> Yes.
>> > >> >> >> >>>>>> I use hbase API to create it.
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> The main code is:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'},
>> {'4'},
>> > >> >> {'5'},
>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>> > >> >> >> >>>>>> TableName tableName =
>> TableName.valueOf("browser_action2");
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> HTableDescriptor tableDesc = new
>> > HTableDescriptor(tableName);
>> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new
>> HColumnDescriptor("v");
>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> > >> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <
>> [email protected]
>> > >:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> > >> >> reproduce
>> > >> >> >> this
>> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> Thanks.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> -- Zelaine
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> > >> >> [email protected]>
>> > >> >> >> >>>>>>> wrote:
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> > Hi ,
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I recently meet a issue that can not query the
>> correct
>> > >> data
>> > >> >> from
>> > >> >> >> >>>>>>> hbase with
>> > >> >> >> >>>>>>> > sql by drill, can anybody help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I test with the drill 1.6.
>> > >> >> >> >>>>>>> > My hbase scheme:
>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
>> > 120160411visituidts
>> > >> >> >> >>>>>>> > cf : v
>> > >> >> >> >>>>>>> > qualifier: v, e0, e1
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The wrong result only happened when I use group by
>> > clause.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > This sql will not return correct result:
>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > >> >> count(a.`v`.`e0`)
>> > >> >> >> p
>> > >> >> >> >>>>>>> from
>> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' group
>> by
>> > >> >> >> a.`v`.`e0`;
>> > >> >> >> >>>>>>> > Part of explain of this sql is:
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
>> > >> hbase.browser_action2
>> > >> >> a
>> > >> >> >> >>>>>>> where
>> > >> >> >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`;
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | text | json |
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | 00-00 Screen
>> > >> >> >> >>>>>>> > 00-01 Project(k=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 00-02 UnionExchange
>> > >> >> >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)],
>> p=[$1])
>> > >> >> >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 01-04
>> HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>>>> > 02-01 UnorderedMuxExchange
>> > >> >> >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>>>> > 03-02 HashAgg(group=[{0}],
>> > >> >> p=[COUNT($0)])
>> > >> >> >> >>>>>>> > 03-03 Project($f0=[ITEM($1,
>> > 'e0')])
>> > >> >> >> >>>>>>> > 03-04
>> > >> Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>> [tableName=browser_action2,
>> > >> >> >> >>>>>>> startRow=0\x00,
>> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The data return very quickly , the result of this sql
>> > is :
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > | k | p |
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > | pay | 12180 |
>> > >> >> >> >>>>>>> > +------+--------
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > But I have millons of data in the table.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I tried to change the physical plan. if I change the
>> > json
>> > >> >> >> explain
>> > >> >> >> >>>>>>> > *"columns"
>> > >> >> >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it
>> will
>> > >> >> return
>> > >> >> >> the
>> > >> >> >> >>>>>>> correct
>> > >> >> >> >>>>>>> > result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > It seems the physical plan is not correct.
>> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
>> > reason,
>> > >> >> but
>> > >> >> >> its
>> > >> >> >> >>>>>>> too
>> > >> >> >> >>>>>>> > complicate. Can anyone help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > Also this sql have the same issue.
>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
>> > 'UTF8')
>> > >> >> as
>> > >> >> >> k,
>> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a
>> group by
>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* to
>> > >> >> >> *"columns" :
>> > >> >> >> >>>>>>> [
>> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>