Hi, We do something like that programmatically. Read blobbed HBase data (qualifiers represent cross-sections such as country_product and blob data such as clicks, impressions etc.) We have several aggregation tasks (one per MySQL table) that aggregates the data and inserts (in batches) to MySQL. I don't know how much data you wanna scan and insert but we scan, aggregate and insert approximately 7GB as ~12M lines from one HBase table into 9 MySQL tables and that takes a little bit less than 2 hours. Our analysis shows that ~25% of that time is net HBase read and most of the time is spent on MySQL inserts. Since we are in the process of building a new system, optimizing is not in our agenda but I would definitely try writing to csv and bulk loading into RDBMS.
Hope that helps. On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat <[email protected]>wrote: > Hi, > > We have done this kind of thing using HBase 0.92.1 + Pig, but we finally > had to limit the size of the tables and move the biggest data to HDFS: > loading data directly from HBase is much slower than from HDFS, and doing > it using M/R overloads HBase region servers, since several maps jobs scan > table regions at the same time: so the bigger your tables are, the higher > the load is (usually Pig creates 1 map per region, I don't know about Hive). > > This may not be an issue if your HBase cluster is dedicated to this kind > of job, but if you also have to ensure a good random read latency at the > same time, forget it. > > Regards, > > Le 11/11/2013 13:10, JC a écrit : > > We are looking to use hbase as a transformation engine. In other words, >> take >> data already loaded into hbase, run some large calculation/aggregation on >> that data and then load it back into a rdbms for our BI analytic tools to >> use. I was curious about what the communities experience is on this and if >> there are some best practices. Some thoughts we are kicking around is >> using >> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the >> rdbms. >> Not sure what all the pieces are needed for the complete application >> though. >> >> Thanks in advance for your help, >> JC >> >> >> >> -- >> View this message in context: http://apache-hbase.679495.n3. >> nabble.com/HBase-as-a-transformation-engine-tp4052670.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >>
