The combination of kvgen and flatten would be the best approach. kvgen returns a repeated map, flatten will convert the repeated map into separate single maps. You can then aggregate or filter on the key or value.
One problem with this, however, is that in this particular example, it looks like you the hbase table is an extremely wide and sparse table. Drill will treat each unique column qualifier as its own column, and it looks like in this table the qualifiers are all unique, so if you scan thousands of records, you will end up with thousands of columns, each only containing a single non-null value. This will most likely result in very poor performance. I think what we need to add to handle this is the ability to essentially push the kvgen operator into the scan. That way we only have to create a single, repeated column. On Fri, Feb 20, 2015 at 9:24 AM, Carol McDonald <[email protected]> wrote: > I did use kvgen, but not sure how. The price value is a long , how can I > calculate an average for this , when the column names are variable? > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, kvgen(t.price) > as price from dfs.`/tables/trades_flat` t limit 5; > +------------+------------+ > | tid | price | > +------------+------------+ > | AMZN_2013102107 | [{"key":"3901713","value":"AAAAAAAAJUA="}] | > | AMZN_2013102108 | [{"key":"4600159","value":"AAAAAAAAE6o="}] | > | AMZN_2013102109 | > > [{"key":"3136026","value":"AAAAAAAAHL4="},{"key":"3448092","value":"AAAAAAAAJjI="},{"key":"3926121","value":"AAAAAAAAHq0="}] > | > | AMZN_2013102111 | > > [{"key":"1149689","value":"AAAAAAAAIuo="},{"key":"3023456","value":"AAAAAAAAHRs="}] > | > | AMZN_2013102112 | > > [{"key":"0705787","value":"AAAAAAAAInM="},{"key":"4007774","value":"AAAAAAAAFUM="}] > | > > On Fri, Feb 20, 2015 at 3:02 AM, Ted Dunning <[email protected]> > wrote: > > > Would kvgen work on t.price? > > > > > > > > On Thu, Feb 19, 2015 at 12:59 PM, Carol McDonald <[email protected] > > > > wrote: > > > > > What is the best way to query an hbase table that has dynamic column > > names > > > ? For example this table is similar to the opentsdb table, the rowkey > is > > a > > > stocksymbol followed by the date and hour , the Price column family > > column > > > names are the seconds offset from the hour, so each row contains a > bucket > > > of hours columns , the values are the price the stock sold for at that > > > time. > > > How can the dynamic names be used in queries ? > > > > > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, t.price as > > > price from dfs.`/tables/trades_flat` t limit 5; > > > +------------+------------+ > > > | tid | price | > > > +------------+------------+ > > > | AMZN_2013102107 | {"3901713":"AAAAAAAAJUA="} | > > > | AMZN_2013102108 | {"4600159":"AAAAAAAAE6o="} | > > > | AMZN_2013102109 | {"3136026":"AAAAAAAAHL4=", > > > "3448092":"AAAAAAAAJjI=","3926121":"AAAAAAAAHq0="} | > > > | AMZN_2013102111 | > {"1149689":"AAAAAAAAIuo=","3023456":"AAAAAAAAHRs="} | > > > | AMZN_2013102112 | > {"0705787":"AAAAAAAAInM=","4007774":"AAAAAAAAFUM="} | > > > +------------+------------+ > > > 5 rows selected (0.766 seconds) > > > 0: jdbc:drill:> select convert_from(row_key, 'UTF8') as tid, > > kvgen(t.price) > > > as price from dfs.`/tables/trades_flat` t limit 5; > > > +------------+------------+ > > > | tid | price | > > > +------------+------------+ > > > | AMZN_2013102107 | [{"key":"3901713","value":"AAAAAAAAJUA="}] | > > > | AMZN_2013102108 | [{"key":"4600159","value":"AAAAAAAAE6o="}] | > > > | AMZN_2013102109 | > > > > > > > > > [{"key":"3136026","value":"AAAAAAAAHL4="},{"key":"3448092","value":"AAAAAAAAJjI="},{"key":"3926121","value":"AAAAAAAAHq0="}] > > > | > > > | AMZN_2013102111 | > > > > > > > > > [{"key":"1149689","value":"AAAAAAAAIuo="},{"key":"3023456","value":"AAAAAAAAHRs="}] > > > | > > > | AMZN_2013102112 | > > > > > > > > > [{"key":"0705787","value":"AAAAAAAAInM="},{"key":"4007774","value":"AAAAAAAAFUM="}] > > > | > > > +------------+------------+ > > > > > > -- Steven Phillips Software Engineer mapr.com
