James, Thanks, the spellcheck.q was exactly what I needed to be using!
-Camden On Mon, Jan 17, 2011 at 3:54 PM, Dyer, James <james.d...@ingrambook.com>wrote: > Camden, > > Have you seen Smiley&Pugh's Solr book? They describe something very > similar to what you're trying to do on p180ff. The difference seems to be > they use a field that only has a couple of terms so they don't bother with > shingles. The book makes a big point about using "spellcheck.q" in this > case in order to get the analysis right. I'm not sure if this is the > solution but I thought I'd mention it. I never tried spell checking this > way because it seemed very limited and possibly quite expensive. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: Camden Daily [mailto:cam...@jaunter.com] > Sent: Monday, January 17, 2011 1:41 PM > To: solr-user@lucene.apache.org > Subject: Re: Spell Checking a multi word phrase > > James, > > Thank you, but I'm not sure that will work for my needs. I'm very > interested in contextual spell checking. Take for example the author > "stephenie meyer". "stephenie" is a far less popular spelling than > "stephanie", but in this context it's the correct option. I feel like > shingles with an un tokenized query string would be able to catch this, but > I can't find too many examples of people attempting this. > > On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James <james.d...@ingrambook.com > >wrote: > > > Camden, > > > > You may also want to be aware that there is a new feature added to Spell > > Check's "collate" functionality that will guarantee the collations will > > return hits. It also is able to return more than one collation and tell > you > > how many hits each one would result in if re-queried. This might do the > > same thing you're trying to do using shingles, but with more accuracy and > > less work. > > > > For info, look at "spellcheck.collate", "spellcheck.maxCollations", > > "spellcheck.maxCollationTries" & spellcheck.collateExtendedResults" on > the > > component's wiki page: > > http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate > > > > This feature is committed to 3.x and 4.x and is available as a patch for > > 1.4.1 (here: https://issues.apache.org/jira/browse/SOLR-2010). > > > > James Dyer > > E-Commerce Systems > > Ingram Content Group > > (615) 213-4311 > > > > > > -----Original Message----- > > From: Camden Daily [mailto:cam...@jaunter.com] > > Sent: Monday, January 17, 2011 1:01 PM > > To: solr-user@lucene.apache.org > > Subject: Spell Checking a multi word phrase > > > > Hello all, > > > > I'm pretty new to Solr, and trying to set up a spell checker that can > > handle > > entire phrases. My goal would be to have something that could offer a > > suggestion of "united states" for a query of "untied stats". > > > > I have a very large index, and I've worked a bit with creating shingles > for > > the spelling index. The problem I'm running into now is that the > > SpellCheckComponent is always tokenizing the query that I pass to it. > > > > For example, a query like this > > > > > http://localhost:8080/solr/spell?q=untied\stats&spellcheck=true&debugQuery=on<http://localhost:8080/solr/spell?q=untied%5Cstats&spellcheck=true&debugQuery=on> > < > http://localhost:8080/solr/spell?q=untied%5Cstats&spellcheck=true&debugQuery=on > > > > > > The debug information shows me that the parsed query is: > > PhraseQuery(text:"untied stats") > > > > But I receive the spelling suggestions for "untied" and "stats" > separately. > > From what I understand, this is not a case where I would want to collate; > I > > simply want the entire phrase treated as one token. > > > > I found the following post after much searching that suggests setting up > a > > custom QueryConverter: > > > > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E > > > > Does anyone know if that would be required? I had hoped to avoid Java > code > > entirely with Solr (I haven't used Java in a very long time), but if I do > > need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be > > able to give me some tips of exactly how I would add that functionality > to > > Solr? > > > > Relevant configs below: > > > > solrconfig.xml: > > > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > > <lst name="spellchecker"> > > <str name="name">default</str> > > <str name="field">spellShingle</str> > > <str name="spellcheckIndexDir">./spellShingle</str> > > <str name="queryAnalyzerFieldType">textSpellShingle</str> > > <str name="buildOnOptimize">true</str> > > </lst> > > </searchComponent> > > > > schema.xml: > > > > <fieldType name="textSpellShingle" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt"/> > > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > > outputUnigrams="true"/> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > (I had thought setting the KeywordTokenizer for the query analyzer would > > keep it from being tokenized, but it doesn't seem to make any difference) > > > > -Camden Daily > > >