[ https://issues.apache.org/jira/browse/OAK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168547#comment-15168547 ]
Vikas Saurabh commented on OAK-4042: ------------------------------------ [~chetanm] pointed out offline that the issue isn't really about analysis of GB-18030 chars but that queries generally [don't get analyzed|https://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F]. If we want to run analyzer over queried text, then we need to use [AnalyzingQueryParser|https://lucene.apache.org/core/4_7_1/queryparser/org/apache/lucene/queryparser/analyzing/AnalyzingQueryParser.html] instead. That comes with a few caveats - quoting from javadoc: {quote} Warning: This class should only be used with analyzers that do not use stopwords or that add tokens. Also, several stemming analyzers are inappropriate: for example, GermanAnalyzer will turn Häuser into hau, but H?user will become h?user when using this parser and thus no match would be found (i.e. using this parser will be no improvement over QueryParser in such cases). {quote} Btw, about issue not being linked to GB-18030: currently, on querying {{192.168.1*}} we won't find text containing {{192.168.1.1}} (of course, that's a trivial example.. the take away is discrepancy between query text being broken on whitespace while analyzer breaking terms on more stuff than just whitespace) So, here's my suggestion - since OakAnalyzer is fairly simple analyzer, we should use AnalyzingQueryParser when OakAnalyzer is in play. Otoh, we expose a boolean prop on {{analyzers}} config (default=false) which allows custom configured analyzers to use AnalyzingQueryParser if necessary. ([~chetanm], [~teofili].... thoughs?) As a side-node, I tried using {{lucene-analyzers-smartcn->SmartChineseAnalyzer}} which analyzes {{中文标题suffix}} as {{\[中文], \[标题], \[suffix]}} - and consequently the test case still won't work. > Full text search doesn't work for prefix text containing GB-18030 characters > ---------------------------------------------------------------------------- > > Key: OAK-4042 > URL: https://issues.apache.org/jira/browse/OAK-4042 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Vikas Saurabh > Assignee: Vikas Saurabh > Fix For: 1.6 > > > For a full text indexed field {{text}} and a node having > {{/a/b/@text="some text normaltextsuffix and 中文标题suffix."}}, this node should > be returned for: > {{SELECT * from \[nt:base] WHERE CONTAINS([text], '中文标题*')}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)