query: query.md solr.md

thomasm Wed, 25 Feb 2015 08:55:47 -0800

Author: thomasm
Date: Wed Feb 25 16:55:28 2015
New Revision: 1662260

URL: http://svn.apache.org/r1662260
Log:
OAK-301 : oak documentation


Added:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query.md?rev=1662260&r1=1662259&r2=1662260&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/query.md Wed Feb 25 
16:55:28 2015
@@ -225,10 +225,12 @@ or to simplify you can use one of the ex
       IndexUtils.createIndexDefinition(index, "myProp", true, false, 
ImmutableList.of("myProp"), null);
     }
 
-__Note on `propertyNames`__ Adding a property index definition that contains 
two or more properties  will only
+(Not sure if this is correct) 
+__Note on `propertyNames`__ Adding a property index definition that contains 
two or more properties will only
 include nodes that have _all_ specified properties present. This is different 
than adding a dedicated property
 index for each and letting the query engine make use of them.
 
+(Not sure if this is correct) 
 __Note__ Is is currently not possible to add more than one property index on 
the same property name, even if it
 might be used in various combinations with other property names. This rule is 
not enforced in any way, but the
 behavior is undefined, one of the defined indexes will be updated while the 
others will simply be ignored by the
@@ -258,8 +260,24 @@ Example:
         .setProperty("reindex", true);
     }
 
+### The Lucene Index
+
+See [Lucene Index](lucene.html) for details.
+
+### The Solr Index
+
+See [Solr Index](lucene.html) for details.
+
+### The Node Type Index
+
+The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on 
`jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the 
filter.
+The cost for this index is the sum of the costs of the `PropertyIndexLookup` 
for queries on `jcr:primaryType` and `jcr:mixinTypes`.
+
 ### The Ordered Index
 
+NOTE: This index type has been deprecated. 
+Please use the Lucene Property Index instead, which offers the same features.
+
 Extension of the Property index will keep the order of the indexed
 property persisted in the repository.
 
@@ -302,129 +320,6 @@ _Caveats_
   define it as asynchronous by providing `async=async` in the index
   definition. This is to avoid cluster merges.
 
-### The Lucene Index
-
-Refer to [Lucene Index](lucene.html) for details.
-
-### The Solr Index
-
-The Solr index is mainly meant for full-text search (the 'contains' type of 
queries):
-
-    //*[jcr:contains(., 'text')]
-
-but is also able to search by path, property restrictions and primary type 
restrictions.
-This means the Solr index in Oak can be used for any type of JCR query.
-
-Even if it's not just a full-text index, it's recommended to use it 
asynchronously (see `Oak#withAsyncIndexing`)
-because, in most production scenarios, it'll be a 'remote' index, and 
therefore network eventual latency / errors would 
-have less impact on the repository performance.
-To set up the Solr index to be asynchronous that has to be defined inside the 
index definition, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980)
-
-TODO Node aggregation.
-
-##### Index definition for Solr index
-<a name="solr-index-definition"></a>
-The index definition node for a Solr-based index:
-
- * must be of type `oak:QueryIndexDefinition`
- * must have the `type` property set to __`solr`__
- * must contain the `async` property set to the value `async`, this is what 
sends the 
-
-index update process to a background thread.
-
-_Optionally_ one can add
-
- * the `reindex` flag which when set to `true`, triggers a full content 
re-index.
-
-Example:
-
-    {
-      NodeBuilder index = root.child("oak:index");
-      index.child("solr")
-        .setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME)
-        .setProperty("type", "solr")
-        .setProperty("async", "async")
-        .setProperty("reindex", true);
-    }
-    
-#### Setting up the Solr server
-For the Solr index to work Oak needs to be able to communicate with a Solr 
instance / cluster.
-Apache Solr supports multiple deployment architectures: 
-
- * embedded Solr instance running in the same JVM the client runs into
- * single remote instance
- * master / slave architecture, eventually with multiple shards and replicas
- * SolrCloud cluster, with Zookeeper instance(s) to control a dynamic, 
resilient set of Solr servers for high 
- availability and fault tolerance
-
-The Oak Solr index can be configured to either use an 'embedded Solr server' 
or a 'remote Solr server' (being able to 
-connect to a single remote instance or to a SolrCloud cluster via Zookeeper).
-
-##### OSGi environment
-All the Solr configuration parameters are described in the 'Solr Server 
Configuration' section on the 
-[OSGi configuration](osgi_config.html) page.
-
-Create an index definition for the Solr index, as described 
[above](#solr-index-definition).
-Once the query index definition node has been created, access OSGi 
ConfigurationAdmin via e.g. Apache Felix WebConsole:
-
- 1. find the 'Oak Solr indexing / search configuration' item and eventually 
change configuration properties as needed
- 2. find either the 'Oak Solr embedded server configuration' or 'Oak Solr 
remote server configuration' items depending 
- on the chosen Solr architecture and eventually change configuration 
properties as needed
- 3. find the 'Oak Solr server provider' item and select the chosen provider 
('remote' or 'embedded') 
-
-##### Solr server configurations
-Depending on the use case, different Solr server configurations are 
recommended.
-
-###### Embedded Solr server
-The embedded Solr server is recommended for developing and testing the Solr 
index for an Oak repository. With that an 
-in-memory Solr instance is started in the same JVM of the Oak repository, 
without HTTP bindings (for security purposes 
-as it'd allow HTTP access to repository data independently of ACLs). 
-Configuring an embedded Solr server mainly consists of providing the path to a 
standard [Solr home dir](https://wiki.apache.org/solr/SolrTerminology) 
-(_solr.home.path_ Oak property) to be used to start Solr; this path can be 
either relative or absolute, if such a path 
-would not exist then the default configuration provided with _oak-solr-core_ 
artifact would be put in the given path.
-To start an embedded Solr server with a custom configuration (e.g. different 
schema.xml / solrconfig.xml than the default
- ones) the (modified) Solr home files would have to be put in a dedicated 
directory, according to Solr home structure, so 
- that the solr.home.path property can be pointed to that directory.
-
-###### Single remote Solr server
-A single (remote) Solr instance is the simplest possible setup for using the 
Oak Solr index in a production environment. 
-Oak will communicate to such a Solr server through Solr's HTTP APIs (via 
[SolrJ](http://wiki.apache.org/solr/Solrj) client).
-Configuring a single remote Solr instance consists of providing the URL to 
connect to in order to reach the [Solr core]
-(https://wiki.apache.org/solr/SolrTerminology) that will host the Solr index 
for the Oak repository via the _solr.http.url_
- property which will have to contain such a URL (e.g. 
_http://10.10.1.101:8983/solr/oak_). 
-All the configuration and tuning of Solr, other than what's described in 'Solr 
Server Configuration' section of the [OSGi 
-configuration](osgi_config.html) page, will have to be performed on the Solr 
side; [sample Solr configuration]
- 
(http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-solr-core/src/main/resources/solr/)
 files (schema.xml, 
- solrconfig.xml, etc.) to start with can be found in _oak-solr-core_ artifact.
-
-###### SolrCloud cluster
-A [SolrCloud](https://cwiki.apache.org/confluence/display/solr/SolrCloud) 
cluster is the recommended setup for an Oak 
-Solr index in production as it provides a scalable and fault tolerant 
architecture.
-In order to configure a SolrCloud cluster the host of the Zookeeper instance / 
ensemble managing the Solr servers has 
-to be provided in the _solr.zk.host_ property (e.g. _10.1.1.108:9983_) since 
the SolrJ client for SolrCloud communicates 
-directly with Zookeeper.
-The [Solr collection](https://wiki.apache.org/solr/SolrTerminology) to be used 
within Oak is named _oak_, having a replication
- factor of 2 and using 2 shards; this means in the default setup the SolrCloud 
cluster would have to be composed by at 
- least 4 Solr servers as the index will be split into 2 shards and each shard 
will have 2 replicas. Such parameters can 
- be changed, look for the 'Oak Solr remote server configuration' item on the 
[OSGi configuration](osgi_config.html) page.
-SolrCloud also allows the hot deploy of configuration files to be used for a 
certain collection so while setting up the 
- collection to be used for Oak with the needed files before starting the 
cluster, configuration files can also be uploaded 
- from a local directory, this is controlled by the _solr.conf.dir_ property of 
the 'Oak Solr remote server configuration'.
-For a detailed description of how SolrCloud works see the [Solr reference 
guide](https://cwiki.apache.org/confluence/display/solr/SolrCloud).
-
-#### Differences with the Lucene index
-As of Oak version 1.0.0:
-
-* Solr index doesn't support search using relative properties, see 
[OAK-1835](https://issues.apache.org/jira/browse/OAK-1835).
-* Solr configuration is mostly done on the Solr side via schema.xml / 
solrconfig.xml files.
-* Lucene can only be used for full-text queries, Solr can be used for 
full-text search _and_ for JCR queries involving
-path, property and primary type restrictions.
-
-### The Node Type Index
-
-The `NodeTypeIndex` implements a `QueryIndex` using `PropertyIndexLookup`s on 
`jcr:primaryType` `jcr:mixinTypes` to evaluate a node type restriction on the 
filter.
-The cost for this index is the sum of the costs of the `PropertyIndexLookup` 
for queries on `jcr:primaryType` and `jcr:mixinTypes`.
-
 ### Cost Calculation
 
 Each query index is expected to estimate the worst-case cost to query with the 
given filter. 

Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md?rev=1662260&view=auto
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md (added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/query/solr.md Wed Feb 25 
16:55:28 2015
@@ -0,0 +1,129 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+  -->
+
+The Solr index is mainly meant for full-text search (the 'contains' type of 
queries):
+
+    //*[jcr:contains(., 'text')]
+
+but is also able to search by path, property restrictions and primary type 
restrictions.
+This means the Solr index in Oak can be used for any type of JCR query.
+
+Even if it's not just a full-text index, it's recommended to use it 
asynchronously (see `Oak#withAsyncIndexing`)
+because, in most production scenarios, it'll be a 'remote' index, and 
therefore network eventual latency / errors would 
+have less impact on the repository performance.
+To set up the Solr index to be asynchronous that has to be defined inside the 
index definition, see [OAK-980](https://issues.apache.org/jira/browse/OAK-980)
+
+TODO Node aggregation.
+
+##### Index definition for Solr index
+<a name="solr-index-definition"></a>
+The index definition node for a Solr-based index:
+
+ * must be of type `oak:QueryIndexDefinition`
+ * must have the `type` property set to __`solr`__
+ * must contain the `async` property set to the value `async`, this is what 
sends the 
+
+index update process to a background thread.
+
+_Optionally_ one can add
+
+ * the `reindex` flag which when set to `true`, triggers a full content 
re-index.
+
+Example:
+
+    {
+      NodeBuilder index = root.child("oak:index");
+      index.child("solr")
+        .setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME)
+        .setProperty("type", "solr")
+        .setProperty("async", "async")
+        .setProperty("reindex", true);
+    }
+    
+#### Setting up the Solr server
+For the Solr index to work Oak needs to be able to communicate with a Solr 
instance / cluster.
+Apache Solr supports multiple deployment architectures: 
+
+ * embedded Solr instance running in the same JVM the client runs into
+ * single remote instance
+ * master / slave architecture, eventually with multiple shards and replicas
+ * SolrCloud cluster, with Zookeeper instance(s) to control a dynamic, 
resilient set of Solr servers for high 
+ availability and fault tolerance
+
+The Oak Solr index can be configured to either use an 'embedded Solr server' 
or a 'remote Solr server' (being able to 
+connect to a single remote instance or to a SolrCloud cluster via Zookeeper).
+
+##### OSGi environment
+All the Solr configuration parameters are described in the 'Solr Server 
Configuration' section on the 
+[OSGi configuration](osgi_config.html) page.
+
+Create an index definition for the Solr index, as described 
[above](#solr-index-definition).
+Once the query index definition node has been created, access OSGi 
ConfigurationAdmin via e.g. Apache Felix WebConsole:
+
+ 1. find the 'Oak Solr indexing / search configuration' item and eventually 
change configuration properties as needed
+ 2. find either the 'Oak Solr embedded server configuration' or 'Oak Solr 
remote server configuration' items depending 
+ on the chosen Solr architecture and eventually change configuration 
properties as needed
+ 3. find the 'Oak Solr server provider' item and select the chosen provider 
('remote' or 'embedded') 
+
+##### Solr server configurations
+Depending on the use case, different Solr server configurations are 
recommended.
+
+###### Embedded Solr server
+The embedded Solr server is recommended for developing and testing the Solr 
index for an Oak repository. With that an 
+in-memory Solr instance is started in the same JVM of the Oak repository, 
without HTTP bindings (for security purposes 
+as it'd allow HTTP access to repository data independently of ACLs). 
+Configuring an embedded Solr server mainly consists of providing the path to a 
standard [Solr home dir](https://wiki.apache.org/solr/SolrTerminology) 
+(_solr.home.path_ Oak property) to be used to start Solr; this path can be 
either relative or absolute, if such a path 
+would not exist then the default configuration provided with _oak-solr-core_ 
artifact would be put in the given path.
+To start an embedded Solr server with a custom configuration (e.g. different 
schema.xml / solrconfig.xml than the default
+ ones) the (modified) Solr home files would have to be put in a dedicated 
directory, according to Solr home structure, so 
+ that the solr.home.path property can be pointed to that directory.
+
+###### Single remote Solr server
+A single (remote) Solr instance is the simplest possible setup for using the 
Oak Solr index in a production environment. 
+Oak will communicate to such a Solr server through Solr's HTTP APIs (via 
[SolrJ](http://wiki.apache.org/solr/Solrj) client).
+Configuring a single remote Solr instance consists of providing the URL to 
connect to in order to reach the [Solr core]
+(https://wiki.apache.org/solr/SolrTerminology) that will host the Solr index 
for the Oak repository via the _solr.http.url_
+ property which will have to contain such a URL (e.g. 
_http://10.10.1.101:8983/solr/oak_). 
+All the configuration and tuning of Solr, other than what's described in 'Solr 
Server Configuration' section of the [OSGi 
+configuration](osgi_config.html) page, will have to be performed on the Solr 
side; [sample Solr configuration]
+ 
(http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-solr-core/src/main/resources/solr/)
 files (schema.xml, 
+ solrconfig.xml, etc.) to start with can be found in _oak-solr-core_ artifact.
+
+###### SolrCloud cluster
+A [SolrCloud](https://cwiki.apache.org/confluence/display/solr/SolrCloud) 
cluster is the recommended setup for an Oak 
+Solr index in production as it provides a scalable and fault tolerant 
architecture.
+In order to configure a SolrCloud cluster the host of the Zookeeper instance / 
ensemble managing the Solr servers has 
+to be provided in the _solr.zk.host_ property (e.g. _10.1.1.108:9983_) since 
the SolrJ client for SolrCloud communicates 
+directly with Zookeeper.
+The [Solr collection](https://wiki.apache.org/solr/SolrTerminology) to be used 
within Oak is named _oak_, having a replication
+ factor of 2 and using 2 shards; this means in the default setup the SolrCloud 
cluster would have to be composed by at 
+ least 4 Solr servers as the index will be split into 2 shards and each shard 
will have 2 replicas. Such parameters can 
+ be changed, look for the 'Oak Solr remote server configuration' item on the 
[OSGi configuration](osgi_config.html) page.
+SolrCloud also allows the hot deploy of configuration files to be used for a 
certain collection so while setting up the 
+ collection to be used for Oak with the needed files before starting the 
cluster, configuration files can also be uploaded 
+ from a local directory, this is controlled by the _solr.conf.dir_ property of 
the 'Oak Solr remote server configuration'.
+For a detailed description of how SolrCloud works see the [Solr reference 
guide](https://cwiki.apache.org/confluence/display/solr/SolrCloud).
+
+#### Differences with the Lucene index
+As of Oak version 1.0.0:
+
+* Solr index doesn't support search using relative properties, see 
[OAK-1835](https://issues.apache.org/jira/browse/OAK-1835).
+* Solr configuration is mostly done on the Solr side via schema.xml / 
solrconfig.xml files.
+* Lucene can only be used for full-text queries, Solr can be used for 
full-text search _and_ for JCR queries involving
+path, property and primary type restrictions.
+

svn commit: r1662260 - in /jackrabbit/oak/trunk/oak-doc/src/site/markdown/query: query.md solr.md

Reply via email to