Can you please let us know the hbase version and hadoop distribution
version that you are using.


On Fri, May 20, 2016 at 1:35 AM, qiang li <[email protected]> wrote:

> Khurram , I send the mail again, the last mail forget to cc to
> [email protected]
>
> The main process is the same, but my rowkey is more complicate,
> Here is the detail I tested.
> rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
> [uid long] + [ts long]
> also I have other qualifiers, only qualifier v:v is integer, the others are
> string.
>
> example:
> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> ROW                                                          COLUMN+CELL
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
> timestamp=1461839343076, value=pay
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
> timestamp=1461839343076, value=bijia
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
> timestamp=1461839343076, value=browser
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
> timestamp=1459771200000, value=20
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
> timestamp=1459771200000, value=0
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
> timestamp=1459771200000, value=\x00\x00\x00\x17
>
>
>  1$\xD2\x00
>
>
>
> 1 row(s) in 0.0410 seconds
>
>
> Here is the example I how the issue look like:
>
> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> STARTROW=> '0'}
> ........
>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
> timestamp=1463723029448, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
> timestamp=1463723029217, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
> timestamp=1463723029295, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
> 9994 row(s) in 123.8650 seconds
>
> the drill result:
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
>  group by a.`v`.`e0`;
> +--------+-------+
> |   k    |   p   |
> +--------+-------+
> | visit  | 1216  |
> +--------+-------+
>
>
> I find out that if the row size larger than 10000  will have the issue. The
> result is right if less that 1000 rows. But not always that way.
> What I can make sure is if I updated the columns in the physical plan and
> query by web UI , the result will be correct.
>
>
> Thanks
>
> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <[email protected]>:
>
> > Qiang, can you please take a look at DRILL-4686 and confirm if the data
> > set used in my repro is the same as the one you have used. If the data
> set
> > is different please let us know the type of data that you have used in
> your
> > table.
> >
> > Aman - I will try to repro the problem on Drill 1.6.0 and share results.
> >
> > Thanks,
> > Khurram
> >
> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <[email protected]>
> wrote:
> >
> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
> >> error whereas the original problem from qiang was an incorrect result.
> >> Can
> >> you use the same version (1.6) that he was using.  Also, is the data set
> >> similar ? If you are unable to repro the exact same issue,  perhaps
> qiang
> >> should file a JIRA with a smaller repro if possible.
> >>
> >>
> >>
> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <[email protected]>
> >> wrote:
> >>
> >> > Hello Qiang,
> >> >
> >> > DRILL-4686 is reported to track this problem.
> >> >
> >> > Thanks,
> >> > Khurram
> >> >
> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <[email protected]>
> wrote:
> >> >
> >> >> Ok, Thanks very much.
> >> >>
> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <[email protected]>:
> >> >>
> >> >>> Hello Qiang,
> >> >>>
> >> >>> Someone from our Drill team (in San Jose) will get back to you
> soon. I
> >> >>> work from the India lab and I am in a different time zone as
> compared
> >> to
> >> >>> San Jose office, some one from MapR San Jose will get back to you as
> >> soon
> >> >>> as possible.
> >> >>>
> >> >>> Thanks,
> >> >>> Khurram
> >> >>>
> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <[email protected]>
> >> wrote:
> >> >>>
> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >> >>>> conclusion?
> >> >>>>
> >> >>>> Any idea how to sovle it?
> >> >>>>
> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <[email protected]>:
> >> >>>>
> >> >>>>> So I tried to create the table using HBase API (with no data
> >> inserted
> >> >>>>> into table) and I got the query plan for drill 1.7.0
> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >> >>>>>
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> | row_key      | ANY        | NO           |
> >> >>>>> | v            | MAP        | NO           |
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> 2 rows selected (1.665 seconds)
> >> >>>>>
> >> >>>>> Table creation Java program
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> public class PutIntDataToHBase {
> >> >>>>>     public static void main(String args[]) throws IOException {
> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >> >>>>>         if (admin.tableExists("browser_action2")) {
> >> >>>>>             admin.disableTable("browser_action2");
> >> >>>>>             admin.deleteTable("browser_action2");
> >> >>>>>         }
> >> >>>>>
> >> >>>>>         byte[][] SPLIT_KEYS =
> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >> >>>>>         HTableDescriptor tableDesc = new
> >> >>>>>
>  HTableDescriptor(TableName.valueOf("browser_action2"));
> >> >>>>>
> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >> >>>>>
> >> >>>>>     }
> >> >>>>> }
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> Query plan for the query that was reported as returning wrong
> >> results.
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> a.`v`.`e0`;
> >> >>>>> +------+------+
> >> >>>>> | text | json |
> >> >>>>> +------+------+
> >> >>>>> | 00-00    Screen
> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >>>>> 00-02        UnionExchange
> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> startRow=0\x00,
> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> and the query plan for the other problem query mentioned in the
> >> first
> >> >>>>> email.
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >> >>>>> hbase.browser_action2 a group by
> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >>>>> +------+------+
> >> >>>>> | text | json |
> >> >>>>> +------+------+
> >> >>>>> | 00-00    Screen
> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >>>>> 00-02        UnionExchange
> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> >> >>>>> row_key=[$0])
> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> startRow=null,
> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Khurram
> >> >>>>>
> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <[email protected]>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Yes.
> >> >>>>>> I use hbase API to create it.
> >> >>>>>>
> >> >>>>>> The main code is:
> >> >>>>>>
> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
> >> {'6'}, {'7'},{'8'}, {'9'} };
> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >> >>>>>>
> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >> >>>>>> tableDesc.addFamily(columnDesc);
> >> >>>>>>
> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >> >>>>>>
> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <[email protected]>:
> >> >>>>>>
> >> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
> >> this
> >> >>>>>>> problem so we can try to reproduce it on our end.
> >> >>>>>>>
> >> >>>>>>> Thanks.
> >> >>>>>>>
> >> >>>>>>> -- Zelaine
> >> >>>>>>>
> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <[email protected]
> >
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> > Hi ,
> >> >>>>>>> >
> >> >>>>>>> > I recently meet a issue that can not query the correct data
> from
> >> >>>>>>> hbase with
> >> >>>>>>> > sql by drill, can anybody help me.
> >> >>>>>>> >
> >> >>>>>>> > I test with the drill 1.6.
> >> >>>>>>> > My hbase scheme:
> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >> >>>>>>> > cf : v
> >> >>>>>>> > qualifier: v, e0, e1
> >> >>>>>>> >
> >> >>>>>>> > The wrong result only happened when I use group by clause.
> >> >>>>>>> >
> >> >>>>>>> > This sql will not return correct result:
> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`)
> >> p
> >> >>>>>>> from
> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> >> a.`v`.`e0`;
> >> >>>>>>> > Part of explain of this sql is:
> >> >>>>>>> >
> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
> >> >>>>>>> where
> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >> >>>>>>> > +------+------+
> >> >>>>>>> > | text | json |
> >> >>>>>>> > +------+------+
> >> >>>>>>> > | 00-00    Screen
> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >> >>>>>>> > 00-02        UnionExchange
> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >>>>>>> startRow=0\x00,
> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >> >>>>>>> >
> >> >>>>>>> > The data return very quickly , the result of this sql is :
> >> >>>>>>> > +------+--------+
> >> >>>>>>> > |  k   |   p    |
> >> >>>>>>> > +------+--------+
> >> >>>>>>> > | pay  | 12180  |
> >> >>>>>>> > +------+--------
> >> >>>>>>> >
> >> >>>>>>> > But I have millons of data in the table.
> >> >>>>>>> >
> >> >>>>>>> > I tried to change the physical plan.  if I change the json
> >> explain
> >> >>>>>>> > *"columns"
> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
> >> the
> >> >>>>>>> correct
> >> >>>>>>> > result.
> >> >>>>>>> >
> >> >>>>>>> > It seems the physical plan is not correct.
> >> >>>>>>> > I also try to debug the sql parser to find out the reason, but
> >> its
> >> >>>>>>> too
> >> >>>>>>> > complicate. Can anyone help me.
> >> >>>>>>> >
> >> >>>>>>> > Also this sql have the same issue.
> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
> >> k,
> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> >> *"columns" :
> >> >>>>>>> [
> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >> >>>>>>> >
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

Reply via email to