Thanks J-D,

sorry for my bad English, what I meant about the space is because my data is almost immutable (i.e. almost no update and not delete), if I compact two tables of size S1 and S2, the size of the merged table will be almost S1+S2, whereas if (if I understand well how it works) if I have made a lot of deletes on the two original tables, the size of the merged table could be much less than S1+S2.

I do not have right now problem with disk space, but the 20% thumb rule is good to know (we all end up filling our large disks ;) )

Thanks
TuX

On 18/05/10 18:27, Jean-Daniel Cryans wrote:
The equivalent of HBase minor compactions would be Bigtable's merging
compaction (minus the part where it also reads from memtable).

About your space problem, the recommended practice is to keep your
system with at least 20% free disk space else you can run into all
sorts of problems.

J-D

On Tue, May 18, 2010 at 4:06 AM, TuX RaceR<[email protected]>  wrote:
Thank you Jonathan for raising the Jira and attaching a patch

I was looking for more info on how major compactions and minor compactions
work and google found me this page:

http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture

After reading the wiki page and Google Bigtable paper, it seems to me that
there is a difference between Google 'minor compactions' andHbase 'minor
compactions'.

In google, a minor compaction is (from the paper):
"5.4 Compactions
As write operations execute, the size of the memtable increases. When the
memtable size reaches a threshold, the memtable is frozen, a new memtable is
created, and the frozen memtable is converted to an SSTable and written to
GFS. This minor compaction process has two goals:
it shrinks the memory usage of the tablet server, and it reduces the amount
of data that has to be read from the commit log during recovery if this
server dies. Incoming read and write operations can continue while
compactions occur.
Every minor compaction creates a new SSTable. If this behavior continued
unchecked, read operations might need to merge updates from an arbitrary
number of SSTables."

On the other hand the Hbase wiki:
"Compactions: When the number of MapFiles exceeds a configurable threshold,
a minor compaction is performed which consolidates the most recently written
MapFiles."

So it seems that:
1) google minor compactions are equivalent to Hbase cache flushes
2) google major compactions are equivalent to Hbase major compactions
3) there is no equivalent of Hbase minor compactions in the google design.

can somebody confirm this?
As in my case my data is almost immutable (i.e I do not have a lot of space
to claim for deleted rows as there are few of them) , I am wondering if the
compactions do not more harm than good.

Thanks
TuX



On 17/05/10 23:12, Jonathan Gray wrote:
No there isn't.

I just opened a JIRA to make it so it can be set to 0 to disable.

https://issues.apache.org/jira/browse/HBASE-2559

Will put up a patch for trunk/0.21.

JG


-----Original Message-----
From: TuX RaceR [mailto:[email protected]]
Sent: Monday, May 17, 2010 1:47 PM
To: [email protected]
Subject: Re: Additional disk space required for Hbase compactions..

Hello List,


On 17/05/10 20:26, Jonathan Gray wrote:

   Same with major compactions (you would definitely need to turn them

off and control them manually if you need them at all).


How would you turn the major compaction off?
The only major compaction related parameter is this one:

<property>
<name>hbase.hregion.majorcompaction</name>
<value>86400000</value>
<description>The time (in miliseconds) between 'major' compactions of
all
      HStoreFiles in a region.  Default: 1 day.
</description>
</property>

Is there a cleaner way to turn it off than putting a ridiculously large
value?

Thanks
TuX



Reply via email to