Author: chetanm
Date: Fri Feb 26 05:47:00 2016
New Revision: 1732423

URL: http://svn.apache.org/viewvc?rev=1732423&view=rev
Log:
OAK-4031 - Clarify usage of includedPaths, excludedPath and queryPaths

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md?rev=1732423&r1=1732422&r2=1732423&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/lucene.md Fri Feb 26 
05:47:00 2016
@@ -95,6 +95,7 @@ Below is the canonical index definition
       - compatVersion (long) = 2
       - includedPaths (string) multiple
       - excludedPaths (string) multiple
+      - queryPaths (string) multiple = ['/']
       - indexPath (string)
       - codec (string)
       + indexRules (nt:unstructured)
@@ -130,6 +131,11 @@ excludedPaths
 : Optional multi value property. Defaults to empty
 : List of paths which should be [excluded](#include-exclude) from indexing. 
 
+queryPaths
+: Optional multi value property. Defaults to '/'
+: List of paths for which the index can be used to perform queries. Refer to
+[Path Includes/Excludes](#include-exclude) for more details
+
 indexPath
 : Optional string property to specify [index path](#copy-on-write)
 : Path of the index definition in the repository. For e.g. if the index
@@ -416,6 +422,45 @@ In majority of case `excludedPaths` only
 it might be required to also specify explicit set of path which should be 
 indexed. In that case make use of `includedPaths`
 
+Note that `excludedPaths` and `includedPaths` *does not* affect the index
+selection logic for a query i.e. if a query has any path restriction specified
+then that would not be checked against the `excludedPaths` and `includedPaths`.
+
+<a name="query-paths"></a>
+**queryPaths**
+
+If you need to ensure that a given index only gets used for query with specific
+path restrictions then you need to specify those paths in `queryPaths`. 
+
+For example if `includedPaths` and `queryPaths` are set to _[ "/content/a", 
"/content/b" ]_. 
+The index would be used for queries below "/content/a" as well as for queries 
below 
+"/content/b". But not for queries without path restriction, or for queries 
below 
+"/content/c".
+
+**Usage**
+
+Key points to consider while using `excludedPaths`, `includedPaths` and 
`queryPaths`
+
+1. Reduce what gets indexed in global fulltext index - For 
+   setups where a global fulltext index is configured say at /oak:index/lucene 
which
+   indexes everything then `excludedPaths` can be used to avoid indexing 
transient
+   repository state like in '/var' or '/tmp'. This would help in improving 
indexing 
+   rate. By far this is the primary usecase
+   
+2. Reduce reindexing time - If its known that certain type of data is stored 
under specific
+   subtree only but the query is not specifying that path restriction then 
`includedPaths`
+   can be used to reduce reindexing time for existing content by ensuring that 
indexing
+   logic only traverses that path for building up the index
+   
+3. Use `excludedPaths`, `includedPaths` with caution - When paths are excluded 
or included
+   then query engine is not aware of that. If wrong paths get excluded then 
its possible
+   that nodes which should have been part of query result get excluded as they 
are not indexed.
+   So only exclude those paths which do not have node matching given nodeType 
or nodes which
+   are known to be not part of any query result
+
+In most cases use of `queryPaths` would not be required as index definition 
should not have
+any overlap. 
+    
 Refer to [OAK-2599][OAK-2599] for more details.
 
 <a name="aggregation"></a>
@@ -1157,6 +1202,14 @@ While defining the index definition do c
     only those properties. So `ordering`  should be enabled only when sorting 
is
     being performed for those properties and `evaluatePathRestrictions` should
     only be enabled if you are going to specify path restrictions.
+    
+8. **Avoid overlapping index definition** - Do not have overlapping index 
definition 
+    indexing same nodetype but having different `includedPaths` and 
`excludedPaths`.
+    Index selection logic does not make use of the `includedPaths` and 
`excludedPaths` 
+    for index selection. Index selection is done only on cost basis and 
`queryPaths`. 
+    Having multiple definition for same type would cause ambiguity in index 
selection 
+    and may lead to unexpected results. Instead have a single index definition 
for same 
+    type.
    
 Following analogy might be helpful to people coming from RDBMS world. Treat 
your
 nodetype as Table in your DB and all the direct or relative properties as 
columns


Reply via email to