Performance: hive+hbase integration query against the row_key

Shengjie Min Tue, 11 Sep 2012 06:57:27 -0700

Hi,

I am trying to get hive working on top of my hbase table following the
guide below:
https://cwiki.apache.org/Hive/hbaseintegration.html


CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c
string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES ("
hbase.table.name"="test");

this hive table creation makes my mapping roughly look like this:

hive_hbase_test  VS   test
Hive key  -   hbase row_key
Hive column a -  hbase cf:a
Hive column b  -  hbase cf:b
Hive column c  -  hbase cf:c

>From my understanding on how HBaseStorageHandler works, it's supposed to
take advantage of the hbase row_key index as much as possible. So I would
expect,

1. if you do a hive query against the row key like "select * from
hive_hbase_test where key='blabla'", this would utilize the hbase row_key
index which give you very quick nearly real-time response just like hbase
does.

2. of coz, if you do a hive query against a column like "select * from
hive_hbase_test where a='blabla'", in this case, it queries against a
specific column, it probably uses mapred because there is nothing from
Hbase side can be utilized.

>From my test, query 1 doesn't seem fast at all, still taking ages, so
select * from hive_hbase_test where key='blabla'   36secs
vs
get 'test', 'blabla'      less than 1 sec
still shows a huge difference.

Anybody has tried this before? Is there anyway I can do sort of query plan
analysis against hive query? or I am not mapping hive table against hbase
table correctly?

Performance: hive+hbase integration query against the row_key

Reply via email to