Author: chetanm
Date: Tue Mar 31 06:04:25 2015
New Revision: 1670263

URL: http://svn.apache.org/r1670263
Log:
OAK-301- Document Oak

Document details around various caches used in Oak

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md?rev=1670263&r1=1670262&r2=1670263&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md 
(original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md Tue 
Mar 31 06:04:25 2015
@@ -367,6 +367,7 @@ of commit. For example when a node under
 of all ancestors also need to be updated to the commit revision. Such changes 
are accumulated
 and flushed periodically through a asynchronous job.
 
+<a name="bg-read"></a>
 ### Background Reads
 
 DocumentMK periodically picks up changes from other DocumentMK instances by 
polling the root node
@@ -455,8 +456,102 @@ cluster nodes:
       {$set: {readWriteMode:'readPreference=primary&w=majority'}}, 
       {multi: true})    
 
+<a name="cache"></a>
+Caching
+-------
+
+`DocumentNodeStore` maintains multiple caches to provide an optimum 
performance. 
+By default the cached instances are kept in heap memory but some of the caches 
+can be backed by [persistent cache](persistent-cache.html).
+
+1. `documentCache` - Document cache is used for caching the `NodeDocument` 
+    instance. These are in memory representation of the persistent state. For 
+    example in case of Mongo it maps to the Mongo document in `nodes` 
collection 
+    and for RDB its maps to the row in `NODES` table. 
+    
+    Depending on the `DocumentStore` implementation different heuristics are 
+    applied for invalidating the cache entries based on changes in backend  
+    
+2. `docChildrenCache` - Document Children cache is used to cache the children 
+    state for a given parent node. This is invalidated completely upon every 
+    background read
+    
+3. `nodeCache` - Node cache is used to cache the `DocumentNodeState` instances.
+    These are **immutable** view of `NodeDocument` as seen at a given revision
+    hence no consistency checks are to be performed for them
+     
+4. `childrenCache` - Children cache is used to cache the children for a given
+    node. These are also **immutable** and represent the state of children for
+    a given parent at certain revision
+    
+5. `diffCache` - Caches the diff for the changes done between successive 
revision.
+   For local changes done the diff is add to the cache upon commit while for 
+   external changes the diff entries are added upon computation of diff as 
part 
+   of observation call
+   
+All the above caches are managed on heap. For the last 3 `nodeCache`, 
+`childrenCache` and `diffCache` Oak provides support for [persistent cache]
+(persistent-cache.html). By enabling the persistent cache feature Oak can 
manage
+a much larger cache off heap and thus avoid freeing up heap memory for 
application
+usage.
+
+### Cache Invalidation
+
+`documentCache` and `docChildrenCache` are containing mutable state which 
requires
+consistency checks to be performed to keep them in sync with the backend 
persisted
+state. Oak uses a MVCC model under which it maintains a consistent view of a 
given
+Node at a given revision. This allows using local cache instead of using a 
global
+clustered cache where changes made by any other cluster node need not be 
instantly
+reflected on all other nodes. 
+
+Each cluster node periodically performs [background reads](#bg-read) to pickup 
+changes done by other cluster nodes. At that time it performs [consistency 
check]
+[OAK-1156] to ensure that cached instance state reflect the state in the 
backend 
+persisted state. Performing the check would take some time would be 
proportional 
+number of entries present in the cache. 
+    
+For repository to work properly its important to ensure that such background 
reads 
+do not consume much time and [work is underway][OAK-2646] to improve current 
+approach. _To ensure that such background operation (which include the cache 
+invalidation checks) perform quickly one should not set a large size for 
+these caches_.
+
+All other caches consist of immutable state and hence no cache invalidation 
needs
+to be performed for them. For that reason those caches can be backed by 
persistent
+cache and even having large number of entries in such caches would not be a 
matter
+of concern. 
+
+### Cache Configuration
+
+In a default setup the [DocumentNodeStoreService][osgi-config]
+takes a single config for `cache` which is internally distributed among the 
+various caches above in following way
+
+1. `nodeCache` - 25%
+2. `childrenCache` - 10% 
+3. `docChildrenCache` - 3% 
+4. `diffCache` - 5% 
+5. `documentCache` - Is given the rest i.e. 57%
+
+Lately [options are provided][OAK-2546] to have a fine grained control over 
the 
+distribution. See [Cache Allocation][cache-allocation]
+
+While distributing ensure that cache left for `documentCache` is not very large
+i.e. prefer to keep that ~500 MB max or lower. As a large `documentCache` can 
+lead to increase in the time taken to perform cache invalidation.
+
+Further make use of the persistent cache. This reduces pressure on GC by 
keeping
+instances off heap with slight decrease in performance compared to keeping them
+on heap.
+
+
 [1]: http://docs.mongodb.org/manual/core/read-preference/
 [2]: http://docs.mongodb.org/manual/core/write-concern/
 [3]: 
http://docs.mongodb.org/manual/reference/connection-string/#read-preference-options
 [4]: 
http://docs.mongodb.org/manual/reference/connection-string/#write-concern-options
+[OAK-1156]: https://issues.apache.org/jira/browse/OAK-1156
+[OAK-2646]: https://issues.apache.org/jira/browse/OAK-2646
+[OAK-2546]: https://issues.apache.org/jira/browse/OAK-2546
+[osgi-config]: ../osgi_config.html#document-node-store
+[cache-allocation]: ../osgi_config.html#cache-allocation
 

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md?rev=1670263&r1=1670262&r2=1670263&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/osgi_config.md Tue Mar 31 
06:04:25 2015
@@ -102,7 +102,24 @@ blobCacheSize
   
 persistentCache
 : Default "" (an empty string, meaning disabled)
-: The persistent cache, which is stored in the local file system.
+: The [persistent cache][persistent-cache], which is stored in the local file 
system.
+
+<a name="cache-allocation"></a>
+nodeCachePercentage
+: Default 25
+: Percentage of `cache` allocated for `nodeCache`. See [Caching][doc-cache]
+
+childrenCachePercentage
+: Default 10
+: Percentage of `cache` allocated for `childrenCache`. See [Caching][doc-cache]
+
+diffCachePercentage
+: Default 5
+: Percentage of `cache` allocated for `diffCache`. See [Caching][doc-cache]
+
+docChildrenCachePercentage
+: Default 3
+: Percentage of `cache` allocated for `docChildrenCache`. See 
[Caching][doc-cache]
 
 Example config file
 
@@ -119,7 +136,7 @@ All the configuration related to Mongo c
         mongodb://sysop:moon@localhost
     
 * **Read Preferences and Write Concern** - These also can be spcified as part 
of Mongo URI. Refer to 
-  [Read Preference and Write Concern](documentmk.html#rw-preference) section 
for more details. For
+  [Read Preference and Write 
Concern](./nodestore/documentmk.html#rw-preference) section for more details. 
For
   e.g. following would set _readPreference_ to _secondary_ and prefer replica 
with tag _dc:ny,rack:1_.
   It would also specify the write timeout to 10 sec
   
@@ -242,10 +259,6 @@ in both config file and framework proper
 For example by default Sling sets **repository.home** to 
_${sling.home}/repository_. So this value
 need not be specified in config files
 
-[1]: http://docs.mongodb.org/manual/reference/connection-string/
-[2]: 
http://jackrabbit.apache.org/api/2.4/org/apache/jackrabbit/core/data/FileDataStore.html
-[OAK-1645]: https://issues.apache.org/jira/browse/OAK-1645
-
 ### Solr Server Configuration
 Solr index requires some configuration to be properly used, in OSGi 
environments such configurations can be performed 
 via OSGi Configuration Admin.
@@ -320,3 +333,8 @@ The following configuration items can be
         #type of Solr server provider to be used, supported types are none, 
remote (RemoteSolrServerProvider) and embedded (EmbeddedSolrServerProvider)
         server.type = none
 
+[1]: http://docs.mongodb.org/manual/reference/connection-string/
+[2]: 
http://jackrabbit.apache.org/api/2.4/org/apache/jackrabbit/core/data/FileDataStore.html
+[OAK-1645]: https://issues.apache.org/jira/browse/OAK-1645
+[doc-cache]: ./nodestore/documentmk.html#cache
+[persistent-cache]: ./nodestore/persistent-cache.html
\ No newline at end of file


Reply via email to