A commonly known limitation of HBASE is that one only gets a single row-key as an index. Performing any queries which attempt to use a non-row-key column in the predicate will wind up causing a full key-scan of the table (which is not ideal).
therefore, many HBASE applications have been constructed with the concept of a 'lookup table', where you have another table which you perform a single-row-get (or short range scan) to find the key(s) you are looking for, which can then be leveraged to perform a single-row get against the primary table. Consider the following schema: 'lookup' (lookup table): order_attribute, order_id 'table orders' order_id, prod_id, order_total Now, if I were an operator who wanted to find the prod_id for a order w/ order_attribute=ABCDEF, I would first need to figure out the order_id for the order from the 'lookup' table, then use the resulting value as the key to filter on in the 'orders' table. This is simple enough to do in a native HBASE application, however I've yet to find a way to do this in SQL-on-Drill which results in limiting the scans. Here are some attempts, if anyone has ideas on how to re-write the query, I'm all ears :) *select cast(t2.details['prod_id'] as varchar(50)) as pid* *from* *`drill/lookup.csv` t1, `tables/orders` t2 where t1.columns[1] = t2.row_key* *and t1.columns[0] = 'ABCDEF';* Other attempts: *select * * cast(t2.details['prod_id'] as varchar(50)) as pid* *from* *`drill/lookup.csv` t1 JOIN `tables/orders` t2 ON cast(t2.row_key as int) = cast(t1.columns[1] as int)* *WHERE cast(t1.columns[0] as varchar(255)) = 'ABCDEF';* *select* * cast(t2.`details`['prod_id'] as varchar(50)) as `pid`* *from* *`tables/orders` t2 JOIN * *(select cast(`columns`[1] as varchar(255)) as row_key from `drill/lookup.csv` WHERE `columns`[0] = 'ABCDEF') x on cast(t2.row_key as varchar(255)) = x.row_key* Note that in all cases, we're seeing in the explain and physical plan that there are 'null' start and stop keys defined: Scan(groupscan=[MapRDBGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=maprfs:///user/apernsteiner/tables/orders, startRow=null, stopRow=null, filter=null], columns=[`row_key`, `details`.`prod_id`]]]) : rowType = RecordType(ANY details, ANY row_key): Is this something possible in drill? Is there another way I should be approaching this? -- Andy Pernsteiner Manager, Field Enablement ph: 206.228.0737 www.mapr.com Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
