Author: alexparvulescu Date: Mon Sep 2 13:35:27 2013 New Revision: 1519440
URL: http://svn.apache.org/r1519440 Log: https://issues.apache.org/jira/browse/OAK-301 - added some query docs Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md?rev=1519440&r1=1519439&r2=1519440&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md (original) +++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/differences.md Mon Sep 2 13:35:27 2013 @@ -71,7 +71,7 @@ Oak does not index content by default as necessary, much like in traditional RDBMSs. If there is no index for a specific query then the repository will be traversed. That is, the query will still work but probably be very slow. -See TODO for how to create a custom index. +See the [query overview page](/query/) for how to create a custom index. Observation ----------- Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md?rev=1519440&view=auto ============================================================================== --- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md (added) +++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query.md Mon Sep 2 13:35:27 2013 @@ -0,0 +1,99 @@ +## Query + +Oak does not index content by default as does Jackrabbit 2. You need to create custom indexes when +necessary, much like in traditional RDBMSs. If there is no index for a specific query then the +repository will be traversed. That is, the query will still work but probably be very slow. + +Query Indices are defined under the `oak:index` node. + +### Cost calculation + +Each query index is expected to estimate the worst-case cost to query with the given filter. +The returned value is between 1 (very fast; lookup of a unique node) and the estimated number of entries to traverse, if the cursor would be fully read, and if there could in theory be one network round-trip or disk read operation per node (this method may return a lower number if the data is known to be fully in memory). + +The returned value is supposed to be an estimate and doesn't have to be very accurate. Please note this method is called on each index whenever a query is run, so the method should be reasonably fast (not read any data itself, or at least not read too much data). + +If an index implementation can not query the data, it has to return `Double.POSITIVE_INFINITY`. + +### Property index + +To define a property index on a subtree you have to add an index definition node that: + +* must be of type `oak:queryIndexDefinition` +* must have the `type` property set to __`property`__ +* contains the `propertyNames` property that indicates what properties will be stored in the index. + + `propertyNames` can be a list of properties, and it is optional.in case it is missing, the node name will be used as a property name reference value + +_Optionally_ you can specify + +* a uniqueness constraint on a property index by setting the `unique` flag to `true` +* that the property index only applies to a certain node type by setting the `declaringNodeTypes` property +* the `reindex` flag which when set to `true`, triggers a full content re-index. + +Example: + + { + NodeBuilder index = root.child("oak:index"); + index.child("uuid") + .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME) + .setProperty("type", "property") + .setProperty("propertyNames", "jcr:uuid") + .setProperty("declaringNodeTypes", "mix:referenceable") + .setProperty("unique", true) + .setProperty("reindex", true); + } + +or to simplify you can use one of the existing `IndexUtils#createIndexDefinition` helper methods: + + { + NodeBuilder index = IndexUtils.getOrCreateOakIndex(root); + IndexUtils.createIndexDefinition(index, "myProp", true, false, ImmutableList.of("myProp"), null); + } + + +### Node type index + +The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on `jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the filter. +The cost for this index is the sum of the costs of the `PropertyIndexLookup` for queries on `jcr:primaryType` and `jcr:mixinTypes`. + + +### Lucene full-text index + +The full-text index update is asynchronous via a background thread, see `Oak#withAsyncIndexing`. + +This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process. The async update status is now reflected on the `oak:index` node with the help of a few properties, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980) + +TODO Node aggregation [OAK-828](https://issues.apache.org/jira/browse/OAK-828) + +The index definition node for a lucene-based full-text index: + +* must be of type `oak:queryIndexDefinition` +* must have the `type` property set to __`lucene`__ +* must contain the `async` property set to the value `async`, this is what sends the index update process to a background thread + +_Optionally_ you can add + + * what subset of property types to be included in the index via the `includePropertyTypes` property + * a blacklist of property names: what property to be excluded from the index via the `excludePropertyNames` property + * the `reindex` flag which when set to `true`, triggers a full content re-index. + +Example: + + { + NodeBuilder index = root.child("oak:index"); + index.child("lucene") + .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME) + .setProperty("type", "lucene") + .setProperty("async", "async") + .setProperty(PropertyStates.createProperty("includePropertyTypes", ImmutableSet.of( + PropertyType.TYPENAME_STRING, PropertyType.TYPENAME_BINARY), Type.STRINGS)) + .setProperty(PropertyStates.createProperty("excludePropertyNames", ImmutableSet.of( + "jcr:createdBy", "jcr:lastModifiedBy"), Type.STRINGS)) + .setProperty("reindex", true); + } + + +### Solr full-text index + +`TODO` Modified: jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java?rev=1519440&r1=1519439&r2=1519440&view=diff ============================================================================== --- jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java (original) +++ jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Mon Sep 2 13:35:27 2013 @@ -100,16 +100,19 @@ import org.slf4j.LoggerFactory; * Under it follows the index definition node that: * <ul> * <li>must be of type <code>oak:queryIndexDefinition</code></li> - * <li>must have the <code>type</code> property set to <b><code>lucene</code> + * <li>must have the <code>type</code> property set to <b><code>lucene</code></b></li> + * <li>must have the <code>async</code> property set to <b><code>async</code></b></li> * </b></li> * </ul> * </p> - * * <p> - * Note: <code>reindex<code> is a property that when set to <code>true</code>, - * triggers a full content reindex. + * Optionally you can add + * <ul> + * <li>what subset of property types to be included in the index via the <code>includePropertyTypes<code> property</li> + * <li>a blacklist of property names: what property to be excluded from the index via the <code>excludePropertyNames<code> property</li> + * <li>the <code>reindex<code> flag which when set to <code>true<code>, triggers a full content re-index.</li> + * </ul> * </p> - * * <pre> * <code> * { @@ -117,6 +120,7 @@ import org.slf4j.LoggerFactory; * index.child("lucene") * .setProperty("jcr:primaryType", "oak:queryIndexDefinition", Type.NAME) * .setProperty("type", "lucene") + * .setProperty("async", "async") * .setProperty("reindex", "true"); * } * </code>
