Re: Spell checking ?'s

Mike Klaas Mon, 25 Feb 2008 10:49:50 -0800

On 24-Feb-08, at 11:36 PM, Chris Hostetter wrote:

: Which leads me to the next question, in the extendedResults,shouldn't it use: the Query analyzer for the spellcheck field to tokenize the termsinstead of
: splitting on the space character?
this question came up a little while back, and i made the samesuggestion
... but Mike disagreed.  I'm not a spellcheck user, but it still seems
like the right choice to me...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200711.mbox/[EMAIL 
PROTECTED]
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200712.mbox/[EMAIL 
PROTECTED]


Hi Hoss,

I've mostly come around to your view. Essentially, I don't think thatthis is a problem that is solvable solely via using analyzers, but Ialso think that applying the field analyzer is better than the currentbehaviour.

Our input is a (single) string, and spellchecker output is essentiallya map of tokens -> suggestions. The problem lies in that the clientneeds to know what was the tokenization to reconstruct the correctedquery to display to the user.

Query string: "ain't input/output paradigmic-oriented ad-hocFastNetworks"

output:
"aint" -> ...
"fast" -> ...
"oriented" -> ...
...

Now, the client has to embed the suggestions in the original querystring for presentation. Without knowledge of the original offsets(or reconstructing the analysis itself), I'm not sure how it could doso robustly.


(Perhaps returning offsets would be helpful.)

But, as Hoss mentioned in his reply, users have to think carefullyabout analyzers anyway. (And whitespace tokenization is broken inother ways.)


-Mike

Re: Spell checking ?'s

Reply via email to