What's the read / write mix in your workload ? Have you looked at HBASE-10070 'HBase read high-availability using timeline-consistent region replicas' (phase 1 has been merged for the upcoming 1.0 release) ?
Cheers On Thu, Jul 31, 2014 at 8:17 AM, Wilm Schumacher <[email protected] > wrote: > Hi, > > I have a "conceptional" question and would appreciate hints. > > My task is to save files to hdfs and to maintain some informations about > them in a hbase db and then serve both to the application. > > Per file I have around 50 rows with 10 columns (in 2 column families) in > the tables, which have string values of length around 100. > > The files have normal size (perhaps between some kB to 100 MB or so). > > By this estimation the number of files are way smaller than the the > number of rows (times columns), but the space on disk is way larger for > the files than the space for the hbase. I would further estimate, that > for every get on a file there should be around hundreds of getRows on > the hbase. > > For the files I want to run an hadoop cluster (obviously). The question > now arises: should I run the hbase on the same hadoop cluster? > > The pro of running together is obvious: i would only have to run one > hadoop cluster which would which would save time, money and nerves. > > On the other hand it wouldn't be possible to make special adjustments > for optimizing the cluster for one or the other task. E.g. if I want to > make the hbase more "distributed" by optimizing the replication (to > let's say 6) I would have to use a doubled amount of disk for the > "normal" files, too. > > So: what should I do? > > Do you have any comments or hints on this question > > Best wishes, > > wilm >
