Wildcard queries, especially a wildcard query with a wildcard both
_before_ and _after_, are going to be fairly slow for Solr to process,
anyhow. (In fact, for some reason I thought wildcards weren't even
supported both before and after, just one or the other).
Still, it's a bug in lucene, it ought not to do that, true.
But there may be a better design to handle your actual use case with
much better performance anyhow. Based around doing something at indexing
time to tokenize in a different field on individual letters (if perhaps
you frequently want to search on arbitrary individual characters), or to
simply index a "1" or "0" in a field depending on whether it includes a
question mark if you specifically want to search all the time on
question marks and don't care about other letters. Or some kind of more
complex ngram'ing, if you want to be able to search on all sorts of
sub-strings, efficiently. The trade-off will be disk space for
performance... but if you start to have a lot of records, that
wildcard-on-both-sides thing will have unacceptable performance, I predict.
Jonathan
Stephen Powis wrote:
Looking at the JIRA issue, looks like there's been a new patch related to
this. This is good news! We've re-written a portion of our web app to use
Solr instead of mysql. This part of our app allows clients to construct
rules to match data within their account, and automatically apply actions to
those matched data points. So far our testing and then rollout has been
smooth, until we encountered the above rule/query. I guess I assumed since
these metacharacters were escaped that they would be parsed correctly under
any type of query.
What is the likelihood of this being included in the next release/bug fix
version of Solr? Are there docs available online with basic information
about rolling our own build of Solr that includes this patch?
I appreciate your help!
Thanks!
Stephen
On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir <rcm...@gmail.com> wrote:
On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis <stephen.po...@pardot.com>
wrote:
I want to return any first name with a Question Mark in it
Query: first_name: *\?*
There is no way to escape the metacharacters * or ? for a wildcard
query (regardless of queryparser, even if you write your own).
See https://issues.apache.org/jira/browse/LUCENE-588
Its something we could fix, but in all honesty it seems one reason it
isn't fixed is because the bug is so old, yet there hasn't really been
any indication of demand for such a thing...