Hbase/MaprDB: row-key pushdown when performing joins

Andy Pernsteiner Fri, 12 Jun 2015 07:12:13 -0700

A commonly known limitation of HBASE is that one only gets a single row-key
as an index.  Performing any queries which attempt to use a non-row-key
column in the predicate will wind up causing a full key-scan of the table
(which is not ideal).


therefore, many HBASE applications have been constructed with the concept
of a 'lookup table', where you have another table which you perform a
single-row-get (or short range scan) to find the key(s) you are looking
for, which can then be leveraged to perform a single-row get against the
primary table.

Consider the following schema:

'lookup' (lookup table):

order_attribute, order_id

'table orders'

order_id, prod_id, order_total


Now, if I were an operator who wanted to find the prod_id for a order w/
order_attribute=ABCDEF, I would first need to figure out the order_id for
the order from the 'lookup' table, then use the resulting value as the key
to filter on in the 'orders' table.

This is simple enough to do in a native HBASE application, however I've yet
to find a way to do this in SQL-on-Drill which results in limiting the
scans.

Here are some attempts, if anyone has ideas on how to re-write the query,
I'm all ears :)

*select cast(t2.details['prod_id'] as varchar(50)) as pid*
*from*
*`drill/lookup.csv` t1, `tables/orders` t2 where t1.columns[1] = t2.row_key*
*and t1.columns[0] = 'ABCDEF';*


Other attempts:

*select *
*    cast(t2.details['prod_id'] as varchar(50)) as pid*
*from*
*`drill/lookup.csv` t1 JOIN `tables/orders` t2 ON cast(t2.row_key as int) =
cast(t1.columns[1] as int)*
*WHERE cast(t1.columns[0] as varchar(255)) = 'ABCDEF';*



*select*
*    cast(t2.`details`['prod_id'] as varchar(50)) as `pid`*
*from*
*`tables/orders` t2 JOIN *
*(select cast(`columns`[1] as varchar(255)) as row_key from
`drill/lookup.csv` WHERE `columns`[0] = 'ABCDEF') x on cast(t2.row_key as
varchar(255)) = x.row_key*



Note that in all cases, we're seeing in the explain and physical plan that
there are 'null' start and stop keys defined:

 Scan(groupscan=[MapRDBGroupScan [HBaseScanSpec=HBaseScanSpec
[tableName=maprfs:///user/apernsteiner/tables/orders, startRow=null,
stopRow=null, filter=null], columns=[`row_key`, `details`.`prod_id`]]]) :
rowType = RecordType(ANY details, ANY row_key):


Is this something possible in drill?  Is there another way I should be
approaching this?





-- 
 Andy Pernsteiner
 Manager, Field Enablement
ph: 206.228.0737

www.mapr.com

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Hbase/MaprDB: row-key pushdown when performing joins

Reply via email to