[ https://issues.apache.org/jira/browse/OAK-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278123#comment-17278123 ]
Thomas Mueller commented on OAK-9145: ------------------------------------- The patch itself looks good to me... But we can't merge it, I understand. > OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order > -------------------------------------------------------------------------- > > Key: OAK-9145 > URL: https://issues.apache.org/jira/browse/OAK-9145 > Project: Jackrabbit Oak > Issue Type: Bug > Components: indexing, jcr, lucene > Environment: Discovered while performing DAM searches in Adobe > Experience Manager. > Reporter: Dave Hughes > Assignee: Fabrizio Fortino > Priority: Minor > Labels: easyfix, pull-request-available > > I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the > wrong order. WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS > flag, which splits camelCase/PascalCase into multiple terms, but since the > LowerCaseFilter is applied first, the mixed-case is lost and the terms can't > be split. > Searching for savings, the damAssetLucene index (which uses the default > OakAnalyzer) does not find an asset named savingsAccount.svg. > Upon configuring the index's analyzers (/oak:index/damAssetLucene/analyzers) > to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour > was seen. > {noformat} > { > "jcr:primaryType": "nt:unstructured", > "default": { > "jcr:primaryType": "nt:unstructured", > "tokenizer": { > "jcr:primaryType": "nt:unstructured", > "name": "Standard" > }, > "filters": { > "jcr:primaryType": "nt:unstructured", > "WordDelimiter": {"jcr:primaryType": "nt:unstructured"}, > "LowerCase": {"jcr:primaryType": "nt:unstructured"} > } > } > } > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)