Why HBase integation with Hive makes Hive slow

Hao Ren Thu, 01 Aug 2013 07:46:40 -0700

Hi,

I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, andHadoop.

In order to do some daily row-level update routine, we need to integrateHbase with hive, but the performance is not good.


E.g. There are 2 tables in hive,
    hbase_table:  a hbase table created via Hive
    hive_table: a native hive table
 both hold the same data set.

When runing:
    select count(*) from hbase_table; ===> takes 500 s
    select count(*) from hive_table; ===> takes 6 s

I have tried a lot of queries on the two tables. But hbase_table isalways very slow.


To be claire, I created the hbase_ table as below:

CREATE TABLE hbase_table (
idvisite string,
client_list Array<string>,
nb_client int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,clients:id_list,clients:nb")

TBLPROPERTIES("hbase.table.name" = "table_test")
;

And my Hbase is on pseudo-distributed mode.

I guess, at the beginning of a hive query execution, hive will load datafrom Hbase, where serde takes a long time.


Could someone tell me how to improve my poor performance ?
Is this cause by my wrongly configured integration ?
Is a fully-distributed mode needed here ?

Thank you in advance for your time.

Hao.


--
Hao Ren
ClaraVista
www.claravista.fr

Why HBase integation with Hive makes Hive slow

Reply via email to