Are you reading using HBase client or do you have an inputFormat for reading HFiles?
On Wednesday, November 13, 2013, Amit Sela wrote: > Hi, > > We do something like that programmatically. > Read blobbed HBase data (qualifiers represent cross-sections such as > country_product and blob data such as clicks, impressions etc.) > We have several aggregation tasks (one per MySQL table) that aggregates the > data and inserts (in batches) to MySQL. > I don't know how much data you wanna scan and insert but we scan, aggregate > and insert approximately 7GB as ~12M lines from one HBase table into 9 > MySQL tables and that takes a little bit less than 2 hours. > Our analysis shows that ~25% of that time is net HBase read and most of the > time is spent on MySQL inserts. > Since we are in the process of building a new system, optimizing is not in > our agenda but I would definitely try writing to csv and bulk loading into > RDBMS. > > Hope that helps. > > > > > On Wed, Nov 13, 2013 at 9:11 AM, Vincent Barat > <[email protected]<javascript:;> > >wrote: > > > Hi, > > > > We have done this kind of thing using HBase 0.92.1 + Pig, but we finally > > had to limit the size of the tables and move the biggest data to HDFS: > > loading data directly from HBase is much slower than from HDFS, and doing > > it using M/R overloads HBase region servers, since several maps jobs scan > > table regions at the same time: so the bigger your tables are, the higher > > the load is (usually Pig creates 1 map per region, I don't know about > Hive). > > > > This may not be an issue if your HBase cluster is dedicated to this kind > > of job, but if you also have to ensure a good random read latency at the > > same time, forget it. > > > > Regards, > > > > Le 11/11/2013 13:10, JC a écrit : > > > > We are looking to use hbase as a transformation engine. In other words, > >> take > >> data already loaded into hbase, run some large calculation/aggregation > on > >> that data and then load it back into a rdbms for our BI analytic tools > to > >> use. I was curious about what the communities experience is on this and > if > >> there are some best practices. Some thoughts we are kicking around is > >> using > >> Mapreduce 2 and Yarn and writing files to HDFS to be loaded into the > >> rdbms. > >> Not sure what all the pieces are needed for the complete application > >> though. > >> > >> Thanks in advance for your help, > >> JC > >> > >> > >> > >> -- > >> View this message in context: http://apache-hbase.679495.n3. > >> nabble.com/HBase-as-a-transformation-engine-tp4052670.html > >> Sent from the HBase User mailing list archive at Nabble.com. > >> > >> >
