On 24-Feb-08, at 11:36 PM, Chris Hostetter wrote:
: Which leads me to the next question, in the extendedResults,
shouldn't it use
: the Query analyzer for the spellcheck field to tokenize the terms
instead of
: splitting on the space character?
this question came up a little while back, and i made the same
suggestion
... but Mike disagreed. I'm not a spellcheck user, but it still seems
like the right choice to me...
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200711.mbox/[EMAIL
PROTECTED]
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200712.mbox/[EMAIL
PROTECTED]
Hi Hoss,
I've mostly come around to your view. Essentially, I don't think that
this is a problem that is solvable solely via using analyzers, but I
also think that applying the field analyzer is better than the current
behaviour.
Our input is a (single) string, and spellchecker output is essentially
a map of tokens -> suggestions. The problem lies in that the client
needs to know what was the tokenization to reconstruct the corrected
query to display to the user.
Query string: "ain't input/output paradigmic-oriented ad-hoc
FastNetworks"
output:
"aint" -> ...
"fast" -> ...
"oriented" -> ...
...
Now, the client has to embed the suggestions in the original query
string for presentation. Without knowledge of the original offsets
(or reconstructing the analysis itself), I'm not sure how it could do
so robustly.
(Perhaps returning offsets would be helpful.)
But, as Hoss mentioned in his reply, users have to think carefully
about analyzers anyway. (And whitespace tokenization is broken in
other ways.)
-Mike