: Agreed, but best match is not ONLY about keywords. Here is where the : system developer can provide extra intelligence by doing query : re-writing.
I finally got a chance to read through the URL (disclaimer: i do not have "a basic working knowledge of Oracle Text, such as the operators used in query expressions.") At it's core what is being described here can easily be done with a custom request handler that takes in a multivalue "q" param, and executes them in order until it finds some matches ... careful math when dealing start/rows and the number of results from each query make it easy to ensure that you can seemlessly return results from any/all queries in the order described (allthough you'd have to do something funky with the raw score values if you actually wanted to return them to the client) In general though, I agree with Walter ... this seems like a very naive approach. At a very low conceptually level, The DisMaxRequestHandler does what the early counter example in the link talks about... >> select book_id from books >> where contains (author, '(michel crichton) OR (?michel ?crichton) >> OR (michel OR crichton) OR (?michel OR ?crichton) the problem is that the two critisism of this appraoch (which may be valid in Oracle text matching) don't really apply in Solr/Lucene... >> 1. From the user's point of view, hits which are a poor match will be >> mixed in with hits which are a good match. The user wants to see good >> matches displayed first. "poor" hits won't score as high as "good" hits -- boost values can be assigned for hte various pieces of the DisMax query so that exact phrase matches can be weighted better then individual word matches, coordFactors will ensure that docs only matching a few words don't score as well as docs matching all of the words, etc... >> 2. From the system's point of view, the search is inefficient. Even if >> there were plenty of hits for exactly "Michel Crichton", it would still >> have to do all the work of the fuzzy expansions and fetch data for all the >> rows which satisfy the query. My problem with this claim is the assumption that once you find lots of hits for "Michel Crichton" you don't need to keep looking for "Michel" or "Crichton" ... by this logic, many docs that contain the exact phrase "Michel Crichton" (and are roughly the same length) will get the same score, and the query will stop there ... the benefit of looking for 8everything* as a single query, is that the scores can become more fine grained -- docs with 1 exact match that *also* contain things like "Mr Crichton" several dozen times will score higher then docs with just that one exact match (cosider an article about "Michel Crichton" in which his full name appears only once vs an article listing popular authors, in which "Michel Crichton" appears exactly once) : Why do you say this? The rank is still provided by the search engine : BASED ON THE QUERY submitted and it does consider natural language : text. It's just leaving the order of execution in the hands of the : developer who knows better what the system should return for some : specific cases. evaluating each of the query parts in isolation and then aggregating the results doesn't take into account the *cumulative* value of the parts ... it's like averagine the ages of people in each city, then averaging those averages for each state and calling that the average age per state -- it's a much less accurate representation of reality then averaging the ages of everyone in the state all at once. -Hoss
