Re: Disk space usage of HFilev1 vs HFilev2

anil gupta Mon, 27 Aug 2012 20:31:29 -0700

Hi All,

Here are the steps i followed to load the table with HFilev1 format:
1. Set the property hfile.format.version to 1.
2. Updated the conf across the cluster.
3. Restarted the cluster.
4. Ran the bulk loader.


Table has 34 million records and one column family.
Results:
HDFS space for one replica of table in HFilev2:39.8 GB
HDFS space for one replica of table in HFilev1:38.4 GB

Ironically, as per the above results HFileV1 is taking 3.5% lesser space
than HFileV2 format. I also skimmed through the code and i saw references
to "hfile.format.version" in HFile.java class.

Thanks,
Anil Gupta

On Mon, Aug 27, 2012 at 1:32 PM, Kevin O'dell <[email protected]>wrote:

> Anil,
>
>   Please let us know how well this works.
>
> On Mon, Aug 27, 2012 at 4:19 PM, anil gupta <[email protected]> wrote:
>
> > Hi Guys,
> >
> > I was digging through the hbase-default.xml file and i found this
> property
> > relates HFile handling:
> > </property>
> >     <property>
> >       <name>hfile.format.version</name>
> >       <value>2</value>
> >       <description>
> >           The HFile format version to use for new files. Set this to 1 to
> > test
> >           backwards-compatibility. The default value of this option
> should
> > be
> >           consistent with FixedFileTrailer.MAX_VERSION.
> >       </description>
> >   </property>
> >
> > I believe setting this to 1 would help me carry out my test. Now we know
> > how to store data in HFileV1 in HBase0.92 :) . I'll post the result once
> i
> > try this out.
> >
> > Thanks,
> > Anil
> >
> >
> > On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <[email protected]>
> > wrote:
> >
> > > Cool. Now we have something on the records :-)
> > >
> > > ./Zahoor@iPad
> > >
> > > On 15-Aug-2012, at 3:12 AM, Harsh J <[email protected]> wrote:
> > >
> > > > Not wanting to have this thread too end up as a mystery-result on the
> > > > web, I did some tests. I loaded 10k rows (of 100 KB random chars
> each)
> > > > into test tables on 0.90 and 0.92 both, flushed them,
> major_compact'ed
> > > > them (waited for completion and drop in IO write activity) and then
> > > > measured them to find this:
> > > >
> > > > 0.92 takes a total of 1049661190 bytes under its /hbase/test
> directory.
> > > > 0.90 takes a total of 1049467570 bytes under its /hbase/test
> directory.
> > > >
> > > > So… not much of a difference. It is still your data that counts. I
> > > > believe what Anil may have had were merely additional, un-compacted
> > > > stores?
> > > >
> > > > P.s. Note that my 'test' table were all defaults. That is, merely
> > > > "create 'test', 'col1'", nothing else, so the block indexes must've
> > > > probably gotten created for every row, as thats at 64k by default,
> > > > while my rows are all 100k each.
> > > >
> > > > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <[email protected]>
> > > wrote:
> > > >> Hi Kevin,
> > > >>
> > > >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> > > last
> > > >> option will be to do store data on pseudo-distributed or standalone
> > > cluster
> > > >> for the comparison.
> > > >> The advantage with the current installation is that its a fully
> > > distributed
> > > >> cluster with around 33 million records in a table. So, it would give
> > me
> > > a
> > > >> better estimate.
> > > >>
> > > >> Thanks,
> > > >> Anil Gupta
> > > >>
> > > >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <
> > [email protected]
> > > >wrote:
> > > >>
> > > >>> Do you not have a pseudo cluster for testing anywhere?
> > > >>>
> > > >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <[email protected]
> >
> > > wrote:
> > > >>>
> > > >>>> Hi Jerry,
> > > >>>>
> > > >>>> I am wiling to do that but the problem is that i wiped off the
> > > HBase0.90
> > > >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92?
> > If i
> > > >>> can
> > > >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Anil Gupta
> > > >>>>
> > > >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <[email protected]>
> > > wrote:
> > > >>>>
> > > >>>>> Hi Anil:
> > > >>>>>
> > > >>>>> Maybe you can try to compare the two HFile implementation
> directly?
> > > Let
> > > >>>> say
> > > >>>>> write 1000 rows into HFile v1 format and then into HFile v2
> format.
> > > You
> > > >>>> can
> > > >>>>> then compare the size of the two directly?
> > > >>>>>
> > > >>>>> HTH,
> > > >>>>>
> > > >>>>> Jerry
> > > >>>>>
> > > >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <
> [email protected]
> > >
> > > >>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Zahoor,
> > > >>>>>>
> > > >>>>>> Then it seems like i might have missed something when doing hdfs
> > > >>> usage
> > > >>>>>> estimation of HBase. I usually do hadoop fs -dus
> > /hbase/$TABLE_NAME
> > > >>> for
> > > >>>>>> getting the hdfs usage of a table. Is this the right way? Since
> i
> > > >>> wiped
> > > >>>>> of
> > > >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of
> it.
> > Is
> > > >>> it
> > > >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> > > HBase0.92?
> > > >>>>>> In this way i can do a fair comparison.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Anil Gupta
> > > >>>>>>
> > > >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <[email protected]>
> > wrote:
> > > >>>>>>
> > > >>>>>>> Hi Anil,
> > > >>>>>>>
> > > >>>>>>> I really doubt that there is 50% drop in file sizes... As far
> as
> > i
> > > >>>>> know..
> > > >>>>>>> there is no drastic space conserving feature in V2. Just as  an
> > > >>> after
> > > >>>>>>> thought.. do a major compact and check the sizes.
> > > >>>>>>>
> > > >>>>>>> ./Zahoor
> > > >>>>>>> http://blog.zahoor.in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <[email protected]
> >
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> l
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Thanks & Regards,
> > > >>>>>> Anil Gupta
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Thanks & Regards,
> > > >>>> Anil Gupta
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Kevin O'Dell
> > > >>> Customer Operations Engineer, Cloudera
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks & Regards,
> > > >> Anil Gupta
> > > >
> > > >
> > > >
> > > > --
> > > > Harsh J
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Reply via email to