Hi Zahoor, I mean the HDFS space taken by one replica of table in HBase0.90 was 90 GB however hdfs disk space taken for the same table in HBase0.92 is 45GB. So, i am interested in knowing how HFilev2 takes around 50% less hdfs space. No compression was enabled for these tables, no schema changes and same data-set is used .
Actually, i have to provide estimates for Hardware of HBase cluster and difference of 50% disk usage between HFilev1 and HFilev2 makes a big difference in my estimates. So, i am just trying to make sure that if we use HFilev2 then less disk space will be required. Thanks, Anil On Tue, Aug 14, 2012 at 11:50 AM, jmozah <[email protected]> wrote: > Hi > > I am not very sure about the storage savings you are talking about, But > there is definitely savings in RAM as there is block level index and bloom > filter instead of file level. More here > > http://www.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ > http://hbase.apache.org/book.html#d540e10937 > > Was compression enabled in 0.90? is it enabled now in 0.92? > > ./zahoor > > > On 14-Aug-2012, at 11:45 PM, anil gupta <[email protected]> wrote: > > > Hi All, > > > > I recently updated my cluster from HBase 0.90 to HBase 0.92. One replica > of > > one table used to take 90 GB in 0.90 but the same table takes 45 GB in > > 0.92(HFilev2). The table has 1 column family and each row stores data of > > 300-400 bytes(this is the size of values) in 20-30 column. > > I am interested in knowing of any disk usage optimization done in > HFilev2? > > Please share if you know of any relevant document to understand the > > reduction in disk space usage? > > > > -- > > Thanks & Regards, > > Anil Gupta > > -- Thanks & Regards, Anil Gupta
