Compactions in busy system(s)

Iulia Zidaru Tue, 05 Apr 2011 03:46:58 -0700

 Hi all,

I'm not sure if I've understood well the purpose of major compaction andhow to handle it in a busy system.It is important to run major compaction when we have a lot of deleteddata, as it removes the "marked as deleted" flags.There are also the "flush" and "minor compaction" operations associatedwith the writing on disk. I understand that in minor compaction manyfiles resulted from flush operations are written in only one file. Whatis not very clear is whether major compaction does the same operation(and so it can be skipped if no deletes are in the system) or there isalso a particular operation which is not done in minor compaction andskipping it may affect the performance or volume.

An other thing that I'd like you to help me clarifying is if majorcompaction on all dataset is the sum of major compaction of all regions.If so, it is possible to major compact only some regions at a time, andother regions at other time. I also don't understand well if it ispossible for the system to merge a region with less data with otherregion and if it does, which of the mentioned operations might affectthe good system behavior(i.e. what NOT to do).

The last point is regarding the files in HDFS (this might affect thevolume). When is the data deleted from HDFS(in minor and majorcompaction)? Are the files deleted when a compaction is performed orthey are only marked as deleted?


Thank you,
Iulia



--
Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
[email protected]
0040 31 223 9153

Compactions in busy system(s)

Reply via email to