On 24-Feb-08, at 11:36 PM, Chris Hostetter wrote:


: Which leads me to the next question, in the extendedResults, shouldn't it use : the Query analyzer for the spellcheck field to tokenize the terms instead of
: splitting on the space character?

this question came up a little while back, and i made the same suggestion
... but Mike disagreed.  I'm not a spellcheck user, but it still seems
like the right choice to me...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200711.mbox/[EMAIL 
PROTECTED]
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200712.mbox/[EMAIL 
PROTECTED]

Hi Hoss,

I've mostly come around to your view. Essentially, I don't think that this is a problem that is solvable solely via using analyzers, but I also think that applying the field analyzer is better than the current behaviour.

Our input is a (single) string, and spellchecker output is essentially a map of tokens -> suggestions. The problem lies in that the client needs to know what was the tokenization to reconstruct the corrected query to display to the user.

Query string: "ain't input/output paradigmic-oriented ad-hoc FastNetworks"
output:
"aint" -> ...
"fast" -> ...
"oriented" -> ...
...

Now, the client has to embed the suggestions in the original query string for presentation. Without knowledge of the original offsets (or reconstructing the analysis itself), I'm not sure how it could do so robustly.

(Perhaps returning offsets would be helpful.)

But, as Hoss mentioned in his reply, users have to think carefully about analyzers anyway. (And whitespace tokenization is broken in other ways.)

-Mike

Reply via email to