Author: amitj
Date: Thu Dec 15 06:55:14 2016
New Revision: 1774377

URL: http://svn.apache.org/viewvc?rev=1774377&view=rev
Log:
OAK-301: Document Oak

* Refine section on shared DataStore GC
* Add section for unregistration from shared datastore

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md?rev=1774377&r1=1774376&r2=1774377&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md 
(original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md Thu Dec 
15 06:55:14 2016
@@ -150,9 +150,21 @@ The garbage collection can be triggered
  
 #### Shared DataStore Blob Garbage Collection (Since 1.2.0)
 
-On start of a repository configured with a shared DataStore, a unique 
repository id is registered. 
+##### Registration
+
+On start of a repository configured to use a shared DataStore (same path or S3 
bucket), a unique repository id is 
+generated and registered in the NodeStore as well as the DataStore. 
 In the DataStore this repository id is registered as an empty file with the 
format `repository-[repository-id]` 
-(e.g. repository-988373a0-3efb-451e-ab4c-f7e794189273).
+(e.g. repository-988373a0-3efb-451e-ab4c-f7e794189273). This empty file is 
created under:
+
+* FileDataStore - Under the root directory configured for the datastore.
+* S3DataStore - Under `META` folder in the S3 bucket configured.
+
+On start/configuration of all the repositories sharing the data store it 
should be confirmed that the unique 
+repositoryId per repository is registered in the DataStore. Refer the section 
below on [Checking Shared GC status](#check-shared-datastore-gc).
+
+##### Execution
+
 The high-level process for garbage collection is still the same as described 
above. 
 But to support blob garbage collection in a shared DataStore the Mark and 
Sweep phase can be
 run independently.
@@ -164,14 +176,15 @@ The details of the process are as follow
     * All the references are collected in the DataStore in a file with the 
format `references-[repository-id]` 
     (e.g. references-988373a0-3efb-451e-ab4c-f7e794189273).
 * One completion of the above process on all repositories, the sweep phase 
needs to be triggered.
-    * This can be executed by running 
`MarkSweepGarbageCollector#collectGarbage(false)` on one of the repositories,  
+    * This can be executed by running 
`MarkSweepGarbageCollector#collectGarbage(false)` on one of the repositories, 
     where false indicates to run sweep also. 
-    * The sweep process checks for availability of the references file from 
all registered repositories and aborts 
-    otherwise.
+    * The sweep process checks for availability of the references file from 
all registered repositories (all 
+    repositories corresponding to the `repository-[repositoryId]` files 
available) and aborts otherwise.
     * All the references available are collected.
     * All the blobs available in the DataStore are collected and deletion 
candidates identified by calculating all the 
     blobs available not appearing in the blobs referenced. Only blobs older 
than a specified time interval from the 
-    earliest available references file are deleted. (last modified say 24 hrs 
(default)).
+    earliest available references file are deleted. (last modified say 24 hrs 
(default)). The earliest references are
+     identified by means of a timestamp marker file 
(`markedTimestamp-[repositoryId]`) for each repository.
     
 The shared DataStore garbage collection is applicable for the following 
DataStore(s):
 
@@ -179,6 +192,7 @@ The shared DataStore garbage collection
 * SharedS3DataStore - Extends the S3DataStore to enable sharing of the data 
store with
                         multiple repositories                        
  
+<a name="check-shared-datastore-gc"></a>  
 ##### Checking GC status for Shared DataStore Garbage Collection
 
 The status of the GC operations on all the repositories connected to the 
DataStore can be checked by calling:
@@ -186,7 +200,8 @@ The status of the GC operations on all t
 * `MarkSweepGarbageCollector#getStats()` which returns a list of 
`GarbageCollectionRepoStats` objects having the 
 following fields:
     * repositoryId - The repositoryId of the repository
-    * local - Indicates whether the repositoryId is of local instance where 
the operation ran
+        * local - repositoryId tagged with an asterix(\*) indicates whether 
the repositoryId is of local instance 
+        where the operation ran.
     * startTime - Start time of the mark operation on the repository
     * endTime - End time of the mark operation on the repository
     * length - Size of the references file created
@@ -266,6 +281,18 @@ public class GetGCStats {
 }
 ```
 
+##### Unregistration
+
+If a repository no longer shares the DataStore then it needs to be 
unregistered from the shared DataStore by following 
+the steps:
+
+* Identify the repositoryId for the repository using the steps above.
+* Remove the corresponding registered repository file 
(`repository-[repositoryId]`) from the DataStore
+    * FileDataStore - Remove the file from the data store root directory.
+    * S3DataStore - Remove the file from the `META` folder of the S3 bucket.
+* Remove other files corresponding to the particular repositoryId e.g. 
`markedTimestamp-[repositoryId]` or 
+`references-[repositoryId]`.
+
 #### Consistency Check
 The data store consistency check will report any data store binaries that are 
missing but are still referenced. The 
 consistency check can be triggered by:


Reply via email to