Date: Thu Sep 15 09:43:59 2016
New Revision: 1760904
OAK-301 Document Oak - property index cost
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/property-index.md Thu
Sep 15 09:43:59 2016
@@ -84,6 +84,15 @@ or to simplify you can use one of the ex
+Usually, reindexing is only needed if the configuration of an index is
+such that the index should contain more or different data.
+For example, reindexing is needed if the property to be indexed is changed,
+if a nodetype is added to __`declaringNodeTypes`__, or if __`includedPaths`__
+It is not strictly needed if less data is to be indexed, for example if a
nodetype is removed.
+However, to save space, it might make sense to reindex even in that case.
+Typically, if a query does not return the expected result, reindexing does not
+more likely, the reason in somewhere else to be found, and disabling the index
should be tried first.
Reindexing a property index happens synchronously by setting the __`reindex`__
flag to __`true`__. This means that the
first #save call will generate a full repository traversal with the purpose of
building the index content and it might
take a long time.
@@ -106,3 +115,44 @@ Example:
+#### Cost Estimation
+When running a query, the property index reports its estimated cost to the
+and then the query engine picks the index with the lowest cost (cost-based
+The algorithm to calculate the estimated cost is roughly as follows (a bit
+* The cost is infinity (so the index is never used)
+ if the condition contains a fulltext constraint,
+ no applicable restriction,
+ the wrong nodetype, or
+ if the path filtering (`includedPaths` / `excludedPaths`) does not match the
+* For the nodetype index, the cost is the sum of the cost for the
+ (if the primary type is known),
+ plus the cost for the `jcr:mixinTypes` lookup (if that is known).
+* Otherwise, the cost is based on the overhead (which is 2),
+ plus the estimated number of entries.
+* For an "x is not null" condition,
+ the estimated number of entries is
+ either the configured `entryCount` or, if not set, the
+ approximate number of entries in the index.
+ The approximation is an "order of magnitude" estimation (Morris' algorithm).
+* For a unique index and "x = 1" condition,
+ the estimated number of entries is either 0 or 1
+ (depending on whether the key is found).
+* For a non-unique index and a "x = 1" condition,
+ if the `entryCount` and `keyCount` are set, those setting are used to
+ the number of entries. If not, the
+ approximate number of entries for the key is read (maintained using
+ In addition to that, the path condition is used to scale down
+ the estimated count depending on the approximate number of nodes
+ in that subtree versus the approximate number of entries
+ in the repository, using approximation available via the `counter` index.
+For example, for a query with path restriction "/content/products/t-shirts"
and property restriction
+"color = 'red'", if there is an index for the property "color", then
+the entry count approximation is read from the index. Let's say it is 10'000
for this value.
+Then the approximate number of nodes in the subtree
"/content/products/t-shirts" is read
+(let's say it is 20'000), and the approximate number of nodes in the
+(let's say it is 1 million).
+Therefore, the estimated number of entries is scaled down (divided by 50) from
10'000 to 200.
+The estimated cost is therefore 202, due to the overhead of 2.
\ No newline at end of file