Author: mduerig
Date: Fri Jan 19 15:39:43 2018
New Revision: 1821669
URL: http://svn.apache.org/viewvc?rev=1821669&view=rev
Log:
OAK-7075: Document oak-run compact arguments and system properties
merged r1821393
Modified:
jackrabbit/oak/branches/1.8/ (props changed)
jackrabbit/oak/branches/1.8/oak-doc/src/site/markdown/nodestore/segment/overview.md
Propchange: jackrabbit/oak/branches/1.8/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Fri Jan 19 15:39:43 2018
@@ -1,3 +1,3 @@
/jackrabbit/oak/branches/1.0:1665962
-/jackrabbit/oak/trunk:1820660-1820661,1820859,1820861,1820878,1820888,1820947,1821130,1821140-1821141,1821240,1821258,1821325,1821358,1821361-1821362,1821370,1821375,1821477,1821487,1821516
+/jackrabbit/oak/trunk:1820660-1820661,1820859,1820861,1820878,1820888,1820947,1821130,1821140-1821141,1821240,1821258,1821325,1821358,1821361-1821362,1821370,1821375,1821393,1821477,1821487,1821516
/jackrabbit/trunk:1345480
Modified:
jackrabbit/oak/branches/1.8/oak-doc/src/site/markdown/nodestore/segment/overview.md
URL:
http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.8/oak-doc/src/site/markdown/nodestore/segment/overview.md?rev=1821669&r1=1821668&r2=1821669&view=diff
==============================================================================
---
jackrabbit/oak/branches/1.8/oak-doc/src/site/markdown/nodestore/segment/overview.md
(original)
+++
jackrabbit/oak/branches/1.8/oak-doc/src/site/markdown/nodestore/segment/overview.md
Fri Jan 19 15:39:43 2018
@@ -30,10 +30,12 @@
* [Was estimation cancelled?](#was-estimation-cancelled)
* [When did estimation complete?](#when-did-estimation-complete)
* [When did compaction start?](#when-did-compaction-start)
+ * [What is the compaction type?](#what-is-the-compaction-type)
* [Is compaction disabled?](#is-compaction-disabled)
* [Was compaction cancelled?](#was-compaction-cancelled)
* [When did compaction complete?](#when-did-compaction-complete)
* [How does compaction work with concurrent
writes?](#how-does-compaction-works-with-concurrent-writes)
+ * [How does compaction deal with
checkpoints?](#how-does-compaction-deal-with-checkpoints)
* [When did clean-up start?](#when-did-cleanup-start)
* [Was cleanup cancelled?](#was-cleanup-cancelled)
* [When did cleanup complete?](#when-did-cleanup-complete)
@@ -105,15 +107,12 @@ If there is not enough garbage to justif
If the output of this phase reports that the amount of garbage is beyond a
certain threshold, the system creates a new generation and goes on with the
next phase.
Compaction executes after a new generation is created.
-The purpose of compaction is to identify data that is currently used by the
user.
-Once the system has a clear picture of which pieces of data the user is
currently using, everything is copied to the new generation.
-This phase might be very time consuming depending on the size of the
repository.
-The bigger the repository, the more has to be copied to the new generation.
+The purpose of compaction is to create a compact representation of the current
generation. For this the current generation is copied to the new generation
leaving out anything from the current generation that is not reachable anymore.
Starting with Oak 1.8 compaction can operate in either of two modes: full
compaction and tail compaction. Full compaction copies all revisions pertaining
to the current generation to the new generation. In contrast tail compaction
only copies the most recent ones. The two compaction modes differ in usage of
system resources and how much time they consume. While full compaction is more
thorough overall, it usually requires much more time, disk spice and disk IO
than tail compaction.
Cleanup is the last phase of garbage collection and kicks in as soon as
compaction is done.
Once relevant data is safe in the new generation, old and unused data from a
previous generation can be removed.
This phase locates outdated pieces of data from one of the oldest generations
and removes it from the system.
-This is the only phase where data is actually deleted and disk space is
finally freed.
+This is the only phase where data is actually deleted and disk space is
finally freed. The amount of freed disk space depends on the preceding
compaction operation. In general cleanup can free less space after a tail
compaction than after a full compaction. However, this only becomes effective a
further garbage collection cycle due to the system always retaining a total of
two generations.
### <a name="offline-garbage-collection"/> Offline Garbage Collection
@@ -127,7 +126,7 @@ In such a case, the human operator has t
Since offline garbage collection requires human intervention to run, the
estimation phase is not executed at all.
The human operator who decides to run offline garbage collection does so
because he or she decided that the garbage in the repository is exceeding some
arbitrary threshold.
Since the decision comes from a human operator, offline garbage collection is
not in charge of implementing heuristics to decide if and when garbage
collection should be run.
-The offline garbage collection process consist of the compaction and cleanup
phases only.
+The offline garbage collection process consist of the compaction and cleanup
phases only. It always employs full compaction with the subsequent cleanup
retaining a single generation.
The main drawback of offline garbage collection is that the process has to
take exclusive control of the repository.
Nevertheless, this is also a strength.
@@ -224,6 +223,20 @@ TarMK GC #2: compaction started, gc opti
The message includes a dump of the garbage collection options that are used
during the compaction phase.
+##### <a name="what-is-the-compaction-type"/> What is the compaction type?
+
+The type of the compaction phase is determined by the configuration. A log
message indicates which compaction type is used.
+
+```
+TarMK GC #2: running ${MODE} compaction
+```
+
+Here ${MODE} is either `full` or `tail`. Under some circumstances (e.g. on the
very first garbage collection run) when a tail compaction is scheduled to run
the system needs to fall back to a full compaction. This is indicated in the
log via the following message:
+
+```
+TarMK GC #2: no base state available, running full compaction instead
+```
+
##### <a name="is-compaction-disabled"/> Is compaction disabled?
The compaction phase can be skipped by pausing the garbage collection process.
If compaction is paused, the following message is printed.
@@ -277,6 +290,18 @@ There is also a special message that is
TarMK GC #2: compaction interrupted
```
+##### <a name="how-does-compaction-deal-with-checkpoints"/> How does
compaction deal with checkpoints?
+
+Since checkpoints share a lot of common data between themselves and between
the actual content compaction handles them individually deduplicating as much
content as possible. The following messages will be printed to the log during
the process.
+
+```
+TarMK GC #2: Found checkpoint 4b2ee46a-d7cf-45e7-93c3-799d538f85e6 created at
Wed Nov 29 15:31:43 CET 2017.
+TarMK GC #2: Found checkpoint 5c45ca7b-5863-4679-a7c5-6056a999a6cd created at
Wed Nov 29 15:31:43 CET 2017.
+TarMK GC #2: compacting checkpoints/4b2ee46a-d7cf-45e7-93c3-799d538f85e6/root.
+TarMK GC #2: compacting checkpoints/5c45ca7b-5863-4679-a7c5-6056a999a6cd/root.
+TarMK GC #2: compacting root.
+```
+
##### <a name="how-does-compaction-works-with-concurrent-writes"/> How does
compaction work with concurrent writes?
When compaction runs as part of online garbage collection, it has to work
concurrently with the rest of the system.
@@ -392,15 +417,6 @@ TarMK GC #1: current repository size is
```
After that, the cleanup phase will iterate through every TAR file and figure
out which segments are still in use and which ones can be reclaimed.
-Cleanup will print a sequence of messages like the following.
-
-```
-data00000a.tar: size of bulk references/reclaim set 0/6
-```
-
-The first part of the message is the TAR file analyzed last.
-The two numbers at the end give an idea of how many references to segments are
being (transitively) followed and how many of them point to bulk segments that
can be removed.
-
After the cleanup phase scanned the repository, TAR files are purged of unused
segments.
In some cases, a TAR file would end up containing no segments at all.
In this case, the TAR file is marked for deletion and the following message is
printed.