Hi,

We have done this kind of thing using HBase 0.92.1 + Pig, but we finally had to limit the size of the tables and move the biggest data to HDFS: loading data directly from HBase is much slower than from HDFS, and doing it using M/R overloads HBase region servers, since several maps jobs scan table regions at the same time: so the bigger your tables are, the higher the load is (usually Pig creates 1 map per region, I don't know about Hive).

This may not be an issue if your HBase cluster is dedicated to this kind of job, but if you also have to ensure a good random read latency at the same time, forget it.

Regards,

Le 11/11/2013 13:10, JC a écrit :
We are looking to use hbase as a transformation engine. In other words, take
data already loaded into hbase, run some large calculation/aggregation on
that data and then load it back into a rdbms for our BI analytic tools to
use. I was curious about what the communities experience is on this and if
there are some best practices. Some thoughts we are kicking around is using
Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the rdbms.
Not sure what all the pieces are needed for the complete application though.

Thanks in advance for your help,
JC



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-as-a-transformation-engine-tp4052670.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to