Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Bill Dueber
We had stemming on for authors at first (maybe was the VUFind default way back when?) and turned it off as soon as we noticed. The initial complaint was that searching on Rowles gave records for Rowling. and of course it's not hard to find other examples, esp. with the -ing suffix. On Mon, Jun

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Keith Jenkins
Does Solr support Soundex? (Soundex was originally developed to assist with alternate spellings of names) Keith On Mon, Jun 13, 2011 at 8:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In a Solr-based search, stemming is done at indexing time, into fields with stemmed tokens. It seems

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Erik Hatcher
On Jun 14, 2011, at 08:10 , Keith Jenkins wrote: Does Solr support Soundex? (Soundex was originally developed to assist with alternate spellings of names) Indeed. And several other phonetic algorithms:

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Jonathan Rochkind
That's an interesting idea, I might try creating author fields with Soundex normalization rather than the standard English language 'stemming' normalization. Still curious to get more feedback on what others have done, even if you didn't consider it carefully, if you're doing it in production

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Jonathan Rochkind
Hey Erik, in that wiki documentation the example it gives is: filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true/ Do you know what that 'inject' argument is about, and where (if anywhere) I'd find it (and other available arguments for PhoneticFilterFactory, which

Re: [CODE4LIB] stemming in author search?

2011-06-14 Thread Erik Hatcher
It's documented in that wiki page link below as true/false -- true will add tokens to the stream, false will replace the existing token So if you index cat and it the phonetic filter turns it into KT, it can either index cat and KT or just KT. Erik On Jun 14, 2011, at 10:45 ,

[CODE4LIB] stemming in author search?

2011-06-13 Thread Jonathan Rochkind
In a Solr-based search, stemming is done at indexing time, into fields with stemmed tokens. It seems typical in library-catalog type applications based on Solr to have the default (or even only) searches be over these stemmed fields, thus 'auto-stemming' to the user. (Search for 'monkey', find