Dave Hughes created OAK-9145:
--------------------------------
Summary: OakAnalyzer applies LowerCaseFilter and
WordDelimiterFilter in wrong order
Key: OAK-9145
URL: https://issues.apache.org/jira/browse/OAK-9145
Project: Jackrabbit Oak
Issue Type: Bug
Components: indexing, jcr, lucene
Environment: Discovered while performing DAM searches in Adobe
Experience Manager.
Searching for _savings_, the damAssetLucene index (which uses the default
OakAnalyzer) does not find an asset named _savingsAccount.svg_.
Upon configuring the index's analyzers (_/oak:index/damAssetLucene/analyzers_)
to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was
seen.
{noformat}
{
"jcr:primaryType": "nt:unstructured",
"default": {
"jcr:primaryType": "nt:unstructured",
"tokenizer": {
"jcr:primaryType": "nt:unstructured",
"name": "Standard"
},
"filters": {
"jcr:primaryType": "nt:unstructured",
"WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
"LowerCase": {"jcr:primaryType": "nt:unstructured"}
}
}
}
{noformat}
Reporter: Dave Hughes
I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the
wrong order. WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag,
which splits camelCase/PascalCase into multiple terms, but since the
LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be
split.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)