[
https://issues.apache.org/jira/browse/OAK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-3879:
---------------------------------
Attachment: OAK-3879-v1.patch
[proposed patch|^OAK-3879-v1.patch] for the same.
(Copying some comment from OAK-3769)
Lucene Query has [following special
chars|https://github.com/apache/lucene-solr/blob/lucene_solr_4_7_1/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L983-L995]
{code}
char c = s.charAt(i);
// These characters are part of the query syntax and must be escaped
if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c == ')'
|| c == ':'
|| c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c ==
'}' || c == '~'
|| c == '*' || c == '?' || c == '|' || c == '&' || c == '/') {
sb.append('\\');
}
{code}
Refer to Solr doc for details on these operators
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
For now I have kept the chars which need to be escaped
{code}
private static final char[] LUCENE_QUERY_OPERATORS = {':' , '/', '!', '&', '|'};
{code}
Note JR2 only escaped ':'. So we are escaping few more (which might have also
added after 3.x released used by JR2). From Lucene list I think following are
still useful
*A - Supported operators* Should NOT be included in escape list
# \? - {{te?t}}
# \* - {{te*t}}
# \~ - {{roam~1}} - Fuzzy Search and Proximity Search
# \^ - Boost
# \- - Excluding a term from
# \( \) - This *might* be useful to group terms
*B - Unsupported operators* Should be included in escape list
Now that leaves following
# \+
# \[ \] - Used for range queries. {{mod_date:[20020101 TO 20030101]}}
# \{ \} - {{\{Aida TO Carmen\}}}
[~tmueller] [~teofili] [~catholicon] Can you have a look now and we take a
final call on what should be escaped and what should be not. Also review the
patch. I would like to commit it by tomorrow
> Lucene index / compatVersion 2: search for 'abc!' does not work
> ---------------------------------------------------------------
>
> Key: OAK-3879
> URL: https://issues.apache.org/jira/browse/OAK-3879
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Reporter: Thomas Mueller
> Assignee: Chetan Mehrotra
> Fix For: 1.4
>
> Attachments: OAK-3879-v1.patch
>
>
> When using a Lucene fulltext index with compatVersion 2, then the following
> query does not return any results. When using compatVersion 1, the correct
> result is returned.
> {noformat}
> SELECT * FROM [nt:unstructured] AS c
> WHERE CONTAINS(c.[jcr:description], 'abc!')
> AND ISDESCENDANTNODE(c, '/content')
> {noformat}
> With compatVersion 1 and 2, searching for just 'abc' works. Also, searching
> with '=' instead of 'contains' works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)