Hi,

I am runing Hive and Hbase on the same Amazon EC2 cluster, where Hbase is in a pseudo-distributed mode.

After integrating HBase in Hive, I find that it takes a long time when runing a "insert overwrite" query from hive in order to load data into a related HBase table.

In fact, the size of data is about 1.3Gb. I dont think it's normal.

Maybe there are something wrong with my configuration.

Here are some queries:

CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");

insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
select * from test;  -- takes a long time (about 8 hours)


Here are some configurations files for my cluster :

# cat hive/conf/hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ip-10-159-41-177.ec2.internal</value>
    </property>

    <property>
        <name>hive.aux.jars.path</name>
<value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
    </property>

    <property>
        <name>hbase.client.scanner.caching</name>
        <value>10000</value>
    </property>

</configuration>

# cat hbase-0.92.0/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <property>
        <name>hbase.rootdir</name>
<value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
    </property>

    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ip-10-159-41-177.ec2.internal</value>
    </property>

    <property>
        <name>hbase.client.scanner.caching</name>
        <value>10000</value>
    </property>

</configuration>

Any help is highly appreciated!

Thank you.

Hao

--
Hao Ren
ClaraVista
www.claravista.fr

Reply via email to