I gather that the standard Solr query parser uses the same syntax for
proximity searches as Lucene, and that Lucene syntax is described at
http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches
This syntax lets me look for terms that are within x words of each
other. Their example is that
jakarta apache~10
will find documents where jakarta and apache occur within 10 words
of one another.
What I would like to do is is find documents where *phrases*, not just
terms, are within x words of each other. I want to be able to say
things like
Find the documents where the phrases apache jakarta and sun
microsystems occur within ten words
of one another.
If I gave such a search, I would *not* want it to count as a match if,
for instance, apache appeared near microsystems but apache
wasn't followed immediately by jakarta, or microsystems wasn't
preceded immediately by sun. I would also not want it to match if
apache jakarta appeared, but sun microsystems did not appear.
Is there any way to do such a search currently? I suppose it might work to say
apache jakarta sun microsystems~10 +apache jakarta +sun microsystems
but that seems like an unfortunate hack. In any case it's not really
something I can expect my users to be able to type in by themselves.
In our current query language (which I'm hoping to wean our users off
of), they can type
apache jakarta near/10 sun microsystems
which I believe is more intuitive.
Any ideas?
Chris