Hi,
I am trying to get hive working on top of my hbase table following the
guide below:
https://cwiki.apache.org/Hive/hbaseintegration.html
CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c
string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,cf:a,cf:b,cf:c") TBLPROPERTIES ("
hbase.table.name"="test");
this hive table creation makes my mapping roughly look like this:
hive_hbase_test VS test
Hive key - hbase row_key
Hive column a - hbase cf:a
Hive column b - hbase cf:b
Hive column c - hbase cf:c
>From my understanding on how HBaseStorageHandler works, it's supposed to
take advantage of the hbase row_key index as much as possible. So I would
expect,
1. if you do a hive query against the row key like "select * from
hive_hbase_test where key='blabla'", this would utilize the hbase row_key
index which give you very quick nearly real-time response just like hbase
does.
2. of coz, if you do a hive query against a column like "select * from
hive_hbase_test where a='blabla'", in this case, it queries against a
specific column, it probably uses mapred because there is nothing from
Hbase side can be utilized.
>From my test, query 1 doesn't seem fast at all, still taking ages, so
select * from hive_hbase_test where key='blabla' 36secs
vs
get 'test', 'blabla' less than 1 sec
still shows a huge difference.
Anybody has tried this before? Is there anyway I can do sort of query plan
analysis against hive query? or I am not mapping hive table against hbase
table correctly?