[jira] [Created] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

Dave Hughes (Jira) Mon, 20 Jul 2020 02:02:04 -0700

Dave Hughes created OAK-9145:
--------------------------------

             Summary: OakAnalyzer applies LowerCaseFilter and 
WordDelimiterFilter in wrong order
                 Key: OAK-9145
                 URL: https://issues.apache.org/jira/browse/OAK-9145
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: indexing, jcr, lucene
         Environment: Discovered while performing DAM searches in Adobe 
Experience Manager.


Searching for _savings_, the damAssetLucene index (which uses the default 
OakAnalyzer) does not find an asset named _savingsAccount.svg_.

Upon configuring the index's analyzers (_/oak:index/damAssetLucene/analyzers_) 
to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was 
seen.
{noformat}
{
  "jcr:primaryType": "nt:unstructured",
  "default": {
    "jcr:primaryType": "nt:unstructured",
    "tokenizer": {
      "jcr:primaryType": "nt:unstructured",
      "name": "Standard"
    },
    "filters": {
      "jcr:primaryType": "nt:unstructured",
      "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
      "LowerCase": {"jcr:primaryType": "nt:unstructured"}
    }
  }
}
{noformat}
            Reporter: Dave Hughes


I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the 
wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, 
which splits camelCase/PascalCase into multiple terms, but since the 
LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be 
split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

Reply via email to