On Mon, Aug 27, 2012 at 8:30 PM, anil gupta <[email protected]> wrote: > Hi All, > > Here are the steps i followed to load the table with HFilev1 format: > 1. Set the property hfile.format.version to 1. > 2. Updated the conf across the cluster. > 3. Restarted the cluster. > 4. Ran the bulk loader. > > Table has 34 million records and one column family. > Results: > HDFS space for one replica of table in HFilev2:39.8 GB > HDFS space for one replica of table in HFilev1:38.4 GB > > Ironically, as per the above results HFileV1 is taking 3.5% lesser space > than HFileV2 format. I also skimmed through the code and i saw references > to "hfile.format.version" in HFile.java class. >
It would be interesting to know what makes up the 3.5% difference? More metadata on the end of the file on v2? St.Ack
