How about something like this? { "add-field-type": [ { "name": "norepeat", "class": "solr.TextField", "analyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.LowerCaseFilterFactory" }, { "class": "solr.PatternReplaceFilterFactory", "pattern": "(.)\\1+", "replacement": "$1" } ] } } ] }
This finds a match... http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes&analysis.query=yyyyYyyyyyyeeEssSsssss&analysis.fieldtype=norepeat Andy On Thu, 8 Oct 2020 at 23:02, Mike Drob <md...@mdrob.com> wrote: > I'm looking for a way to transform words with repeated letters into the > same token - does something like this exist out of the box? Do our stemmers > support it? > > For example, say I would want all of these terms to return the same search > results: > > YES > YESSS > YYYEEESSS > YYEESSSS[...]S > > I don't know how long a user would hold down the S key at the end to > capture their level of excitement, and I don't want to manually define > synonyms for every length. > > I'm pretty sure that I don't want PhoneticFilter here, maybe > PatternReplace? Not a huge fan of how that one is configured, and I think > I'd have to set up a bunch of patterns inline for it? > > Mike >