: Yeah, I've used the Lucene-based spellchecker before, I just never had
: to hook it up with Solr.  At this point I'm not interested in the fancy
: stuff (cache, RAMDir...), I just want to figure out how to configure it
: via schema.xml...

But the crux of the issue is that if you are maintaining a second index
inside your base Solr installation for the purposes of the Spellchecker
class, then you don't want or need to configure it in schema.xml -- it
lives outside the schema space.

I pointed this out the last time spellchecking came up, there are two
extremely differnet approaches involved when you talk about "implimenting
a spelling/suggestion service with Solr"...

In the first approach, the main SOlr index *is* the suggestion index ...
each Document represents a suggested word, with one stored field telling
you what the word is, and indexed fields containing the ngrams.  you could
populate this index from any initial source: a dictionary, logs of popular
query terms, or a dump of all terms in your corpus.  At query time, your
application would query this index seperately from querying your "main"
Solr index containing your domain specific data.

The second approach is to have the spelling/suggestion index live inside
of your Solr index side by side with your main domain specific index, so
your Request Handler can talk to it directly, and it can be populated
directly using the terms in your corpus -- this sounds like the
approach you are taking, but in this approach there is no need for your
schema.xml to know anything about the index .. just use the SpellChecker
class as is: construct it with an empty RAMDirectory and call
indexDictionary on a LuceneDictionary pointed at your main Solr index.
The only code you really need to write is something to run clearIndex and
indexDirectory as a newSearcher hook  (the easiest way probably being to
hang your Spellchecker instance off of a single element Solr cache nad
write a Regenerator)

But like i said: you dodn't need to worry about making the schema know
about your ngrams -- you do that if you're going for the first approach.



-Hoss

Reply via email to