Thanks Robert, exactly what I was looking for. -----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, October 19, 2011 1:15 PM To: solr-user@lucene.apache.org Subject: Re: stemEnglishPossessive and contractions
The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below. 2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else. On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus <herm...@angieslist.com> wrote: > We utilize a comprehensive dictionary of English words, place names, > surnames, male and female first names, ... you get the point. As such, the > possessive plural forms of these words are recognized as 'misspelled'. > > I simply thought that 'turning on' this option for the WordDelimiterFactory > would address my concerns; however, I also got an unintended consequence: > Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be > affected. Is this intended behavior? When I read 'English possessive' I > hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm > missing here? > -- lucidimagination.com