[ https://issues.apache.org/jira/browse/OAK-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Parvulescu resolved OAK-1022. ---------------------------------- Resolution: Fixed Fix Version/s: 0.10 marking as done, thanks for the input > Add a custom Oak Lucene analizer > --------------------------------- > > Key: OAK-1022 > URL: https://issues.apache.org/jira/browse/OAK-1022 > Project: Jackrabbit Oak > Issue Type: Bug > Components: oak-lucene > Reporter: Alex Parvulescu > Assignee: Alex Parvulescu > Fix For: 0.10 > > > Following OAK-1007 where I switched to a ClassicAnalizer, I realized that it > introduced some subtle changes in tokenization behavior. > For example there's a twist if the token contains a number. > From the ClassicTokenizer api: > bq. Splits words at hyphens, unless there's a number in the token, in which > case the whole token is interpreted as a product number and is not split. > this means that a path token could be split either in 2 tokens if it has no > numbers: > {code} > /parent/child => 'parent', 'child' > {code} > or just one if it has numbers: > {code} > /p12345/p23456 => '/p12345/p23456' > {code} > Also, I'd like to split alphanumeric tokens on '_' and on '.' as well. -- This message was sent by Atlassian JIRA (v6.1#6144)