Re: best way to query hbase dynamic columns

Steven Phillips Fri, 20 Feb 2015 09:57:12 -0800

The combination of kvgen and flatten would be the best approach. kvgen
returns a repeated map, flatten will convert the repeated map into separate
single maps. You can then aggregate or filter on the key or value.


One problem with this, however, is that in this particular example, it
looks like you the hbase table is an extremely wide and sparse table. Drill
will treat each unique column qualifier as its own column, and it looks
like in this table the qualifiers are all unique, so if you scan thousands
of records, you will end up with thousands of columns, each only containing
a single non-null value.  This will most likely result in very poor
performance. I think what we need to add to handle this is the ability to
essentially push the kvgen operator into the scan. That way we only have to
create a single, repeated column.

On Fri, Feb 20, 2015 at 9:24 AM, Carol McDonald <[email protected]>
wrote:

> I did use kvgen, but not sure how.  The price value is a long , how can I
> calculate an average for this , when the column names are variable?
>
> 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, kvgen(t.price)
> as price from dfs.`/tables/trades_flat` t limit 5;
> +------------+------------+
> |    tid     |   price    |
> +------------+------------+
> | AMZN_2013102107 | [{"key":"3901713","value":"AAAAAAAAJUA="}] |
> | AMZN_2013102108 | [{"key":"4600159","value":"AAAAAAAAE6o="}] |
> | AMZN_2013102109 |
>
> [{"key":"3136026","value":"AAAAAAAAHL4="},{"key":"3448092","value":"AAAAAAAAJjI="},{"key":"3926121","value":"AAAAAAAAHq0="}]
> |
> | AMZN_2013102111 |
>
> [{"key":"1149689","value":"AAAAAAAAIuo="},{"key":"3023456","value":"AAAAAAAAHRs="}]
> |
> | AMZN_2013102112 |
>
> [{"key":"0705787","value":"AAAAAAAAInM="},{"key":"4007774","value":"AAAAAAAAFUM="}]
> |
>
> On Fri, Feb 20, 2015 at 3:02 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Would kvgen work on t.price?
> >
> >
> >
> > On Thu, Feb 19, 2015 at 12:59 PM, Carol McDonald <[email protected]
> >
> > wrote:
> >
> > > What is the best way to query an hbase table that has dynamic column
> > names
> > > ? For example this table is similar to the opentsdb table, the rowkey
> is
> > a
> > > stocksymbol followed by the date and hour , the Price column family
> > column
> > > names are the seconds offset from the hour, so each row contains a
> bucket
> > > of  hours columns , the values are the price the stock sold for at that
> > > time.
> > > How can the dynamic names be used in queries ?
> > >
> > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, t.price as
> > > price from dfs.`/tables/trades_flat` t limit 5;
> > > +------------+------------+
> > > |    tid     |   price    |
> > > +------------+------------+
> > > | AMZN_2013102107 | {"3901713":"AAAAAAAAJUA="} |
> > > | AMZN_2013102108 | {"4600159":"AAAAAAAAE6o="} |
> > > | AMZN_2013102109 | {"3136026":"AAAAAAAAHL4=",
> > > "3448092":"AAAAAAAAJjI=","3926121":"AAAAAAAAHq0="} |
> > > | AMZN_2013102111 |
> {"1149689":"AAAAAAAAIuo=","3023456":"AAAAAAAAHRs="} |
> > > | AMZN_2013102112 |
> {"0705787":"AAAAAAAAInM=","4007774":"AAAAAAAAFUM="} |
> > > +------------+------------+
> > > 5 rows selected (0.766 seconds)
> > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid,
> > kvgen(t.price)
> > > as price from dfs.`/tables/trades_flat` t limit 5;
> > > +------------+------------+
> > > |    tid     |   price    |
> > > +------------+------------+
> > > | AMZN_2013102107 | [{"key":"3901713","value":"AAAAAAAAJUA="}] |
> > > | AMZN_2013102108 | [{"key":"4600159","value":"AAAAAAAAE6o="}] |
> > > | AMZN_2013102109 |
> > >
> > >
> >
> [{"key":"3136026","value":"AAAAAAAAHL4="},{"key":"3448092","value":"AAAAAAAAJjI="},{"key":"3926121","value":"AAAAAAAAHq0="}]
> > > |
> > > | AMZN_2013102111 |
> > >
> > >
> >
> [{"key":"1149689","value":"AAAAAAAAIuo="},{"key":"3023456","value":"AAAAAAAAHRs="}]
> > > |
> > > | AMZN_2013102112 |
> > >
> > >
> >
> [{"key":"0705787","value":"AAAAAAAAInM="},{"key":"4007774","value":"AAAAAAAAFUM="}]
> > > |
> > > +------------+------------+
> > >
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: best way to query hbase dynamic columns

Reply via email to