Qiang, can you please take a look at DRILL-4686 and confirm if the data set
used in my repro is the same as the one you have used. If the data set is
different please let us know the type of data that you have used in your
table.

Aman - I will try to repro the problem on Drill 1.6.0 and share results.

Thanks,
Khurram

On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> wrote:

> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
> error whereas the original problem from qiang was an incorrect result.  Can
> you use the same version (1.6) that he was using.  Also, is the data set
> similar ? If you are unable to repro the exact same issue,  perhaps qiang
> should file a JIRA with a smaller repro if possible.
>
>
>
> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com>
> wrote:
>
> > Hello Qiang,
> >
> > DRILL-4686 is reported to track this problem.
> >
> > Thanks,
> > Khurram
> >
> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> wrote:
> >
> >> Ok, Thanks very much.
> >>
> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >>
> >>> Hello Qiang,
> >>>
> >>> Someone from our Drill team (in San Jose) will get back to you soon. I
> >>> work from the India lab and I am in a different time zone as compared
> to
> >>> San Jose office, some one from MapR San Jose will get back to you as
> soon
> >>> as possible.
> >>>
> >>> Thanks,
> >>> Khurram
> >>>
> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com>
> wrote:
> >>>
> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >>>> conclusion?
> >>>>
> >>>> Any idea how to sovle it?
> >>>>
> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >>>>
> >>>>> So I tried to create the table using HBase API (with no data inserted
> >>>>> into table) and I got the query plan for drill 1.7.0
> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >>>>>
> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >>>>> +--------------+------------+--------------+
> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >>>>> +--------------+------------+--------------+
> >>>>> | row_key      | ANY        | NO           |
> >>>>> | v            | MAP        | NO           |
> >>>>> +--------------+------------+--------------+
> >>>>> 2 rows selected (1.665 seconds)
> >>>>>
> >>>>> Table creation Java program
> >>>>>
> >>>>> {noformat}
> >>>>> public class PutIntDataToHBase {
> >>>>>     public static void main(String args[]) throws IOException {
> >>>>>         Configuration conf = HBaseConfiguration.create();
> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >>>>>         if (admin.tableExists("browser_action2")) {
> >>>>>             admin.disableTable("browser_action2");
> >>>>>             admin.deleteTable("browser_action2");
> >>>>>         }
> >>>>>
> >>>>>         byte[][] SPLIT_KEYS =
> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >>>>>         HTableDescriptor tableDesc = new
> >>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
> >>>>>
> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >>>>>
> >>>>>     }
> >>>>> }
> >>>>> {noformat}
> >>>>>
> >>>>> Query plan for the query that was reported as returning wrong
> results.
> >>>>>
> >>>>> {noformat}
> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> >>>>> +------+------+
> >>>>> | text | json |
> >>>>> +------+------+
> >>>>> | 00-00    Screen
> >>>>> 00-01      Project(k=[$0], p=[$1])
> >>>>> 00-02        UnionExchange
> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>> 02-01                  UnorderedMuxExchange
> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> startRow=0\x00,
> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >>>>> {noformat}
> >>>>>
> >>>>> and the query plan for the other problem query mentioned in the first
> >>>>> email.
> >>>>>
> >>>>> {noformat}
> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >>>>> hbase.browser_action2 a group by
> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >>>>> +------+------+
> >>>>> | text | json |
> >>>>> +------+------+
> >>>>> | 00-00    Screen
> >>>>> 00-01      Project(k=[$0], p=[$1])
> >>>>> 00-02        UnionExchange
> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>> 02-01                  UnorderedMuxExchange
> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> >>>>> row_key=[$0])
> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> startRow=null,
> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >>>>> {noformat}
> >>>>>
> >>>>> Thanks,
> >>>>> Khurram
> >>>>>
> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Yes.
> >>>>>> I use hbase API to create it.
> >>>>>>
> >>>>>> The main code is:
> >>>>>>
> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
> {'6'}, {'7'},{'8'}, {'9'} };
> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >>>>>>
> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >>>>>> tableDesc.addFamily(columnDesc);
> >>>>>>
> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >>>>>>
> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
> >>>>>>
> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
> this
> >>>>>>> problem so we can try to reproduce it on our end.
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> -- Zelaine
> >>>>>>>
> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> > Hi ,
> >>>>>>> >
> >>>>>>> > I recently meet a issue that can not query the correct data from
> >>>>>>> hbase with
> >>>>>>> > sql by drill, can anybody help me.
> >>>>>>> >
> >>>>>>> > I test with the drill 1.6.
> >>>>>>> > My hbase scheme:
> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >>>>>>> > cf : v
> >>>>>>> > qualifier: v, e0, e1
> >>>>>>> >
> >>>>>>> > The wrong result only happened when I use group by clause.
> >>>>>>> >
> >>>>>>> > This sql will not return correct result:
> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
> >>>>>>> from
> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> a.`v`.`e0`;
> >>>>>>> > Part of explain of this sql is:
> >>>>>>> >
> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
> >>>>>>> where
> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >>>>>>> > +------+------+
> >>>>>>> > | text | json |
> >>>>>>> > +------+------+
> >>>>>>> > | 00-00    Screen
> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >>>>>>> > 00-02        UnionExchange
> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>>>> > 02-01                  UnorderedMuxExchange
> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >>>>>>> startRow=0\x00,
> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >>>>>>> >
> >>>>>>> > The data return very quickly , the result of this sql is :
> >>>>>>> > +------+--------+
> >>>>>>> > |  k   |   p    |
> >>>>>>> > +------+--------+
> >>>>>>> > | pay  | 12180  |
> >>>>>>> > +------+--------
> >>>>>>> >
> >>>>>>> > But I have millons of data in the table.
> >>>>>>> >
> >>>>>>> > I tried to change the physical plan.  if I change the json
> explain
> >>>>>>> > *"columns"
> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
> the
> >>>>>>> correct
> >>>>>>> > result.
> >>>>>>> >
> >>>>>>> > It seems the physical plan is not correct.
> >>>>>>> > I also try to debug the sql parser to find out the reason, but
> its
> >>>>>>> too
> >>>>>>> > complicate. Can anyone help me.
> >>>>>>> >
> >>>>>>> > Also this sql have the same issue.
> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns"
> :
> >>>>>>> [
> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >>>>>>> >
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to