This is an automated email from the ASF dual-hosted git repository. thomasm pushed a commit to branch OAK-10262 in repository https://gitbox.apache.org/repos/asf/jackrabbit-oak.git
commit fa4faaad7557fd18bf18f37dfb64ded39f7be3a6 Author: Thomas Mueller <[email protected]> AuthorDate: Wed May 24 15:50:31 2023 +0200 OAK-10262 Document ASCIIFolder and OakAnalyzer --- oak-doc/src/site/markdown/query/lucene.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/oak-doc/src/site/markdown/query/lucene.md b/oak-doc/src/site/markdown/query/lucene.md index c50af4f9d7..711d0e3005 100644 --- a/oak-doc/src/site/markdown/query/lucene.md +++ b/oak-doc/src/site/markdown/query/lucene.md @@ -762,10 +762,15 @@ defaults to 5 #### <a name="analyzers"></a>Analyzers +If no analyzer is specified, then `OakAnalyzer` is used, which uses the +Apache Lucene `StandardTokenizer`, the `LowerCaseFilter`, +and the `WordDelimiterFilter` with the following options: +`GENERATE_WORD_PARTS`, `STEM_ENGLISH_POSSESSIVE`, and `GENERATE_NUMBER_PARTS`. + `@since Oak 1.5.5, 1.4.7, 1.2.19` -Unless custom analyzer is configured (as documented below), in-built analyzer -can be configured to include original term as well to be indexed. This is -controlled by setting boolean property `indexOriginalTerm` on analyzers node. +Unless custom analyzer is explicitly configured (as documented below), the built-in analyzer +can be configured to include the original term as well (`PRESERVE_ORIGINAL`). This is +controlled by setting boolean property `indexOriginalTerm` on analyzers node: /oak:index/assetType - jcr:primaryType = "oak:QueryIndexDefinition" @@ -845,7 +850,17 @@ all the other components (e.g. `charFilters`, `Synonym`) are optional. #### Examples -Adding stemming support +To convert umlauts using ASCII folding, use: +``` + + analyzers + + default + + tokenizer + - name = "Standard" + + filters (nt:unstructured) // the filters needs to be ordered + + ASCIIFolding +``` + +For stemming support, use: ``` 1. Use an analyzer which has stemming included by default e.g. EnglishAnalyzer which has PorterStemFilter. + analyzers
