This is an automated email from the ASF dual-hosted git repository.

thomasm pushed a commit to branch OAK-10262
in repository https://gitbox.apache.org/repos/asf/jackrabbit-oak.git

commit fa4faaad7557fd18bf18f37dfb64ded39f7be3a6
Author: Thomas Mueller <[email protected]>
AuthorDate: Wed May 24 15:50:31 2023 +0200

    OAK-10262 Document ASCIIFolder and OakAnalyzer
---
 oak-doc/src/site/markdown/query/lucene.md | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/oak-doc/src/site/markdown/query/lucene.md 
b/oak-doc/src/site/markdown/query/lucene.md
index c50af4f9d7..711d0e3005 100644
--- a/oak-doc/src/site/markdown/query/lucene.md
+++ b/oak-doc/src/site/markdown/query/lucene.md
@@ -762,10 +762,15 @@ defaults to 5
 
 #### <a name="analyzers"></a>Analyzers
 
+If no analyzer is specified, then `OakAnalyzer` is used, which uses the
+Apache Lucene `StandardTokenizer`, the `LowerCaseFilter`,
+and the `WordDelimiterFilter` with the following options:
+`GENERATE_WORD_PARTS`, `STEM_ENGLISH_POSSESSIVE`, and `GENERATE_NUMBER_PARTS`.
+
 `@since Oak 1.5.5, 1.4.7, 1.2.19`
-Unless custom analyzer is configured (as documented below), in-built analyzer
-can be configured to include original term as well to be indexed. This is
-controlled by setting boolean property `indexOriginalTerm` on analyzers node.
+Unless custom analyzer is explicitly configured (as documented below), the 
built-in analyzer
+can be configured to include the original term as well (`PRESERVE_ORIGINAL`). 
This is
+controlled by setting boolean property `indexOriginalTerm` on analyzers node:
 
     /oak:index/assetType
       - jcr:primaryType = "oak:QueryIndexDefinition"
@@ -845,7 +850,17 @@ all the other components (e.g. `charFilters`, `Synonym`) 
are optional.
 
 #### Examples
 
-Adding stemming support
+To convert umlauts using ASCII folding, use:
+```
+    + analyzers
+      + default
+        + tokenizer
+          - name = "Standard"
+        + filters (nt:unstructured) // the filters needs to be ordered
+          + ASCIIFolding
+```
+
+For stemming support, use:
 ```
 1. Use an analyzer which has stemming included by default e.g. EnglishAnalyzer 
which has PorterStemFilter.
     + analyzers

Reply via email to