[
https://issues.apache.org/jira/browse/OAK-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vikas Saurabh updated OAK-3648:
-------------------------------
Fix Version/s: 1.0.34
> Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer
> ----------------------------------------------------------------
>
> Key: OAK-3648
> URL: https://issues.apache.org/jira/browse/OAK-3648
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Reporter: Vikas Saurabh
> Assignee: Vikas Saurabh
> Fix For: 1.4, 1.3.11, 1.2.19, 1.0.34
>
>
> This is related to OAK-3276 where the intent was to use {{StandardAnalyzer}}
> by default (instead of {{OakAnalyzer}}). As discussed there, we need specific
> word delimiter which isn't possible with StandardAnalyzer, so we instead
> should switch over to StandardTokenizer in OakAnalyer itself.
> A few motivations to do that:
> * Better unicode support
> * ClassicTokenizer is the old (~lucene 3.1) implementation of standard
> tokenizer
> One of the key difference between classic and standard tokenizer is the way
> they delimit words (standard analyzer follows unicode text segmentation
> rules)... but that difference gets nullified as we have our own
> WordDelimiterFilter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)