Can you please let us know the hbase version and hadoop distribution version that you are using.
On Fri, May 20, 2016 at 1:35 AM, qiang li <[email protected]> wrote: > Khurram , I send the mail again, the last mail forget to cc to > [email protected] > > The main process is the same, but my rowkey is more complicate, > Here is the detail I tested. > rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] + > [uid long] + [ts long] > also I have other qualifiers, only qualifier v:v is integer, the others are > string. > > example: > hbase(main):004:0> scan 'browser_action2', { LIMIT => 1} > ROW COLUMN+CELL > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0, > timestamp=1461839343076, value=pay > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1, > timestamp=1461839343076, value=bijia > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2, > timestamp=1461839343076, value=browser > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3, > timestamp=1461839343076, value=* > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4, > timestamp=1461839343076, value=* > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5, > timestamp=1461839343076, value=* > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h, > timestamp=1459771200000, value=20 > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m, > timestamp=1459771200000, value=0 > > > 1$\xD2\x00 > > > > 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v, > timestamp=1459771200000, value=\x00\x00\x00\x17 > > > 1$\xD2\x00 > > > > 1 row(s) in 0.0410 seconds > > > Here is the example I how the issue look like: > > hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'], > STARTROW=> '0'} > ........ > 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0, > timestamp=1463723029448, value=visit > > > x01T\x00\x0A\xFA\x00 > > > > 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0, > timestamp=1463723029217, value=visit > > > x01T\x00\x0A\xFA\x00 > > > > 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0, > timestamp=1463723029295, value=visit > > > x01T\x00\x0A\xFA\x00 > > > > 9994 row(s) in 123.8650 seconds > > the drill result: > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, > count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0' > group by a.`v`.`e0`; > +--------+-------+ > | k | p | > +--------+-------+ > | visit | 1216 | > +--------+-------+ > > > I find out that if the row size larger than 10000 will have the issue. The > result is right if less that 1000 rows. But not always that way. > What I can make sure is if I updated the columns in the physical plan and > query by web UI , the result will be correct. > > > Thanks > > 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <[email protected]>: > > > Qiang, can you please take a look at DRILL-4686 and confirm if the data > > set used in my repro is the same as the one you have used. If the data > set > > is different please let us know the type of data that you have used in > your > > table. > > > > Aman - I will try to repro the problem on Drill 1.6.0 and share results. > > > > Thanks, > > Khurram > > > > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <[email protected]> > wrote: > > > >> Khurram, DRILL-4686 seems like a different issue...it is reporting an > >> error whereas the original problem from qiang was an incorrect result. > >> Can > >> you use the same version (1.6) that he was using. Also, is the data set > >> similar ? If you are unable to repro the exact same issue, perhaps > qiang > >> should file a JIRA with a smaller repro if possible. > >> > >> > >> > >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <[email protected]> > >> wrote: > >> > >> > Hello Qiang, > >> > > >> > DRILL-4686 is reported to track this problem. > >> > > >> > Thanks, > >> > Khurram > >> > > >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <[email protected]> > wrote: > >> > > >> >> Ok, Thanks very much. > >> >> > >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <[email protected]>: > >> >> > >> >>> Hello Qiang, > >> >>> > >> >>> Someone from our Drill team (in San Jose) will get back to you > soon. I > >> >>> work from the India lab and I am in a different time zone as > compared > >> to > >> >>> San Jose office, some one from MapR San Jose will get back to you as > >> soon > >> >>> as possible. > >> >>> > >> >>> Thanks, > >> >>> Khurram > >> >>> > >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <[email protected]> > >> wrote: > >> >>> > >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the > >> >>>> conclusion? > >> >>>> > >> >>>> Any idea how to sovle it? > >> >>>> > >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <[email protected]>: > >> >>>> > >> >>>>> So I tried to create the table using HBase API (with no data > >> inserted > >> >>>>> into table) and I got the query plan for drill 1.7.0 > >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 > >> >>>>> > >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; > >> >>>>> +--------------+------------+--------------+ > >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | > >> >>>>> +--------------+------------+--------------+ > >> >>>>> | row_key | ANY | NO | > >> >>>>> | v | MAP | NO | > >> >>>>> +--------------+------------+--------------+ > >> >>>>> 2 rows selected (1.665 seconds) > >> >>>>> > >> >>>>> Table creation Java program > >> >>>>> > >> >>>>> {noformat} > >> >>>>> public class PutIntDataToHBase { > >> >>>>> public static void main(String args[]) throws IOException { > >> >>>>> Configuration conf = HBaseConfiguration.create(); > >> >>>>> conf.set("hbase.zookeeper.property.clientPort","5181"); > >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf); > >> >>>>> if (admin.tableExists("browser_action2")) { > >> >>>>> admin.disableTable("browser_action2"); > >> >>>>> admin.deleteTable("browser_action2"); > >> >>>>> } > >> >>>>> > >> >>>>> byte[][] SPLIT_KEYS = > >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; > >> >>>>> HTableDescriptor tableDesc = new > >> >>>>> > HTableDescriptor(TableName.valueOf("browser_action2")); > >> >>>>> > >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v")); > >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS); > >> >>>>> > >> >>>>> } > >> >>>>> } > >> >>>>> {noformat} > >> >>>>> > >> >>>>> Query plan for the query that was reported as returning wrong > >> results. > >> >>>>> > >> >>>>> {noformat} > >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select > >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from > >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by > a.`v`.`e0`; > >> >>>>> +------+------+ > >> >>>>> | text | json | > >> >>>>> +------+------+ > >> >>>>> | 00-00 Screen > >> >>>>> 00-01 Project(k=[$0], p=[$1]) > >> >>>>> 00-02 UnionExchange > >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >>>>> 01-03 Project($f0=[$0], p=[$1]) > >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >>>>> 02-01 UnorderedMuxExchange > >> >>>>> 03-01 Project($f0=[$0], p=[$1], > >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) > >> >>>>> 03-03 Project($f0=[ITEM($1, 'e0')]) > >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan > >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> startRow=0\x00, > >> >>>>> stopRow=, filter=null], columns=[`*`]]]) > >> >>>>> {noformat} > >> >>>>> > >> >>>>> and the query plan for the other problem query mentioned in the > >> first > >> >>>>> email. > >> >>>>> > >> >>>>> {noformat} > >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select > >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, > >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from > >> >>>>> hbase.browser_action2 a group by > >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); > >> >>>>> +------+------+ > >> >>>>> | text | json | > >> >>>>> +------+------+ > >> >>>>> | 00-00 Screen > >> >>>>> 00-01 Project(k=[$0], p=[$1]) > >> >>>>> 00-02 UnionExchange > >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >>>>> 01-03 Project($f0=[$0], p=[$1]) > >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >>>>> 02-01 UnorderedMuxExchange > >> >>>>> 03-01 Project($f0=[$0], p=[$1], > >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) > >> >>>>> 03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)], > >> >>>>> row_key=[$0]) > >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan > >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> startRow=null, > >> >>>>> stopRow=null, filter=null], columns=[`*`]]]) > >> >>>>> {noformat} > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Khurram > >> >>>>> > >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <[email protected]> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> Yes. > >> >>>>>> I use hbase API to create it. > >> >>>>>> > >> >>>>>> The main code is: > >> >>>>>> > >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, > >> {'6'}, {'7'},{'8'}, {'9'} }; > >> >>>>>> TableName tableName = TableName.valueOf("browser_action2"); > >> >>>>>> > >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName); > >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); > >> >>>>>> tableDesc.addFamily(columnDesc); > >> >>>>>> > >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); > >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); > >> >>>>>> > >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS); > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <[email protected]>: > >> >>>>>> > >> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce > >> this > >> >>>>>>> problem so we can try to reproduce it on our end. > >> >>>>>>> > >> >>>>>>> Thanks. > >> >>>>>>> > >> >>>>>>> -- Zelaine > >> >>>>>>> > >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <[email protected] > > > >> >>>>>>> wrote: > >> >>>>>>> > >> >>>>>>> > Hi , > >> >>>>>>> > > >> >>>>>>> > I recently meet a issue that can not query the correct data > from > >> >>>>>>> hbase with > >> >>>>>>> > sql by drill, can anybody help me. > >> >>>>>>> > > >> >>>>>>> > I test with the drill 1.6. > >> >>>>>>> > My hbase scheme: > >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts > >> >>>>>>> > cf : v > >> >>>>>>> > qualifier: v, e0, e1 > >> >>>>>>> > > >> >>>>>>> > The wrong result only happened when I use group by clause. > >> >>>>>>> > > >> >>>>>>> > This sql will not return correct result: > >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, > count(a.`v`.`e0`) > >> p > >> >>>>>>> from > >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' group by > >> a.`v`.`e0`; > >> >>>>>>> > Part of explain of this sql is: > >> >>>>>>> > > >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select > >> >>>>>>> CONVERT_FROM(a.`v`.`e0`, > >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a > >> >>>>>>> where > >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`; > >> >>>>>>> > +------+------+ > >> >>>>>>> > | text | json | > >> >>>>>>> > +------+------+ > >> >>>>>>> > | 00-00 Screen > >> >>>>>>> > 00-01 Project(k=[$0], p=[$1]) > >> >>>>>>> > 00-02 UnionExchange > >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1]) > >> >>>>>>> > 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >>>>>>> > 02-01 UnorderedMuxExchange > >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1], > >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >>>>>>> > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) > >> >>>>>>> > 03-03 Project($f0=[ITEM($1, 'e0')]) > >> >>>>>>> > 03-04 Scan(groupscan=[HBaseGroupScan > >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> >>>>>>> startRow=0\x00, > >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]]) > >> >>>>>>> > > >> >>>>>>> > The data return very quickly , the result of this sql is : > >> >>>>>>> > +------+--------+ > >> >>>>>>> > | k | p | > >> >>>>>>> > +------+--------+ > >> >>>>>>> > | pay | 12180 | > >> >>>>>>> > +------+-------- > >> >>>>>>> > > >> >>>>>>> > But I have millons of data in the table. > >> >>>>>>> > > >> >>>>>>> > I tried to change the physical plan. if I change the json > >> explain > >> >>>>>>> > *"columns" > >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return > >> the > >> >>>>>>> correct > >> >>>>>>> > result. > >> >>>>>>> > > >> >>>>>>> > It seems the physical plan is not correct. > >> >>>>>>> > I also try to debug the sql parser to find out the reason, but > >> its > >> >>>>>>> too > >> >>>>>>> > complicate. Can anyone help me. > >> >>>>>>> > > >> >>>>>>> > Also this sql have the same issue. > >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as > >> k, > >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by > >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); > >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* to > >> *"columns" : > >> >>>>>>> [ > >> >>>>>>> > "`row_key`" ] *, it will return the correct result. > >> >>>>>>> > > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > > >> > > > > >
