[jira] [Commented] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

Dave Hughes (Jira) Sun, 22 Nov 2020 10:43:06 -0800


    [ 
https://issues.apache.org/jira/browse/OAK-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236996#comment-17236996
 ]


Dave Hughes commented on OAK-9145:
----------------------------------

I opened this issue in July and emailed the dev mailing list in September, but 
I've failed to gain any traction on it. I've probably failed to follow your 
contribution guidelines, but those weren't super clear when I went searching 
for the process in July.

In a last ditch effort, I'm going to mention a bunch of people who have tickets 
on the current agile board, in hopes that one of you can take this on, or at 
least guide me to the correct process.  Thanks in advance.

[~thomasm] [~mreutegg] [~baedke] [~mattvryan] [~teofili] [~angela] [~adulceanu]

> OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order
> --------------------------------------------------------------------------
>
>                 Key: OAK-9145
>                 URL: https://issues.apache.org/jira/browse/OAK-9145
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: indexing, jcr, lucene
>         Environment: Discovered while performing DAM searches in Adobe 
> Experience Manager. 
> Searching for _savings_, the damAssetLucene index (which uses the default 
> OakAnalyzer) does not find an asset named _savingsAccount.svg_.
> Upon configuring the index's analyzers 
> (_/oak:index/damAssetLucene/analyzers_) to apply WordDelimiterFilter before 
> LowerCaseFilter, the correct behaviour was seen.
> {noformat}
> {
>   "jcr:primaryType": "nt:unstructured",
>   "default": {
>     "jcr:primaryType": "nt:unstructured",
>     "tokenizer": {
>       "jcr:primaryType": "nt:unstructured",
>       "name": "Standard"
>     },
>     "filters": {
>       "jcr:primaryType": "nt:unstructured",
>       "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
>       "LowerCase": {"jcr:primaryType": "nt:unstructured"}
>     }
>   }
> }
> {noformat}
>            Reporter: Dave Hughes
>            Priority: Minor
>              Labels: easyfix, pull-request-available
>
> I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the 
> wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS 
> flag, which splits camelCase/PascalCase into multiple terms, but since the 
> LowerCaseFilter is applied first, the mixed-case is lost and the terms can't 
> be split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

Reply via email to