Phrase-based (vs. Word-Based) Proximity Search

Chris Harris Mon, 12 Nov 2007 10:19:02 -0800

I gather that the standard Solr query parser uses the same syntax for
proximity searches as Lucene, and that Lucene syntax is described at


  http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches

This syntax lets me look for terms that are within x words of each
other. Their example is that

  "jakarta apache"~10

will find documents where "jakarta" and "apache" occur within 10 words
of one another.

What I would like to do is is find documents where *phrases*, not just
terms, are within x words of each other. I want to be able to say
things like

  Find the documents where the phrases "apache jakarta" and "sun
microsystems" occur within ten words
  of one another.

If I gave such a search, I would *not* want it to count as a match if,
for instance, "apache" appeared near "microsystems" but "apache"
wasn't followed immediately by "jakarta", or "microsystems" wasn't
preceded immediately by "sun". I would also not want it to match if
"apache jakarta" appeared, but "sun microsystems" did not appear.

Is there any way to do such a search currently? I suppose it might work to say

  "apache jakarta sun microsystems"~10 +"apache jakarta" +"sun microsystems"

but that seems like an unfortunate hack. In any case it's not really
something I can expect my users to be able to type in by themselves.
In our current query language (which I'm hoping to wean our users off
of), they can type

  "apache jakarta" <near/10> "sun microsystems"

which I believe is more intuitive.

Any ideas?

Chris

Phrase-based (vs. Word-Based) Proximity Search

Reply via email to