[ 
https://issues.apache.org/jira/browse/OAK-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430850#comment-15430850
 ] 

Vikas Saurabh edited comment on OAK-3648 at 8/22/16 3:03 PM:
-------------------------------------------------------------

Backport [r1714827|https://svn.apache.org/r1714827] to 1.2 in 
[r1757183|https://svn.apache.org/r1757183] and to 1.0 in 
[r1757189|https://svn.apache.org/r1757189].


was (Author: catholicon):
Backport to 1.2 in [r1757183|https://svn.apache.org/r1757183].

> Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer
> ----------------------------------------------------------------
>
>                 Key: OAK-3648
>                 URL: https://issues.apache.org/jira/browse/OAK-3648
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>             Fix For: 1.4, 1.3.11, 1.2.19, 1.0.34
>
>
> This is related to OAK-3276 where the intent was to use {{StandardAnalyzer}} 
> by default (instead of {{OakAnalyzer}}). As discussed there, we need specific 
> word delimiter which isn't possible with StandardAnalyzer, so we instead 
> should switch over to StandardTokenizer in OakAnalyer itself.
> A few motivations to do that:
> * Better unicode support
> * ClassicTokenizer is the old (~lucene 3.1) implementation of standard 
> tokenizer
> One of the key difference between classic and standard tokenizer is the way 
> they delimit words (standard analyzer follows unicode text segmentation 
> rules)... but that difference gets nullified as we have our own 
> WordDelimiterFilter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to