Update:
I messed up some queries, here are the right ones:
CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");
insert OVERWRITE TABLE hbase_table
select * from test; -- takes a long time (about 8 hours)
# bin/hadoop dfs -dus /user/hive/warehouse/test
hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/user/hive/warehouse/test
1318012108
the table 'test' is just about 1.3 GB.
Le 19/08/2013 10:40, Hao Ren a écrit :
Hi,
I am runing Hive and Hbase on the same Amazon EC2 cluster, where Hbase
is in a pseudo-distributed mode.
After integrating HBase in Hive, I find that it takes a long time when
runing a "insert overwrite" query from hive in order to load data into
a related HBase table.
In fact, the size of data is about 1.3Gb. I dont think it's normal.
Maybe there are something wrong with my configuration.
Here are some queries:
CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");
insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
select * from test; -- takes a long time (about 8 hours)
Here are some configurations files for my cluster :
# cat hive/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip-10-159-41-177.ec2.internal</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
</configuration>
# cat hbase-0.92.0/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip-10-159-41-177.ec2.internal</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
</configuration>
Any help is highly appreciated!
Thank you.
Hao
--
Hao Ren
ClaraVista
www.claravista.fr