How about something like this?

{
    "add-field-type": [
        {
            "name": "norepeat",
            "class": "solr.TextField",
            "analyzer": {
                "tokenizer": {
                    "class": "solr.StandardTokenizerFactory"
                },
                "filters": [
                    {
                        "class": "solr.LowerCaseFilterFactory"
                    },
                    {
                        "class": "solr.PatternReplaceFilterFactory",
                        "pattern": "(.)\\1+",
                        "replacement": "$1"
                    }
                ]
            }
        }
    ]
}

This finds a match...
http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes&analysis.query=yyyyYyyyyyyeeEssSsssss&analysis.fieldtype=norepeat

Andy



On Thu, 8 Oct 2020 at 23:02, Mike Drob <md...@mdrob.com> wrote:

> I'm looking for a way to transform words with repeated letters into the
> same token - does something like this exist out of the box? Do our stemmers
> support it?
>
> For example, say I would want all of these terms to return the same search
> results:
>
> YES
> YESSS
> YYYEEESSS
> YYEESSSS[...]S
>
> I don't know how long a user would hold down the S key at the end to
> capture their level of excitement, and I don't want to manually define
> synonyms for every length.
>
> I'm pretty sure that I don't want PhoneticFilter here, maybe
> PatternReplace? Not a huge fan of how that one is configured, and I think
> I'd have to set up a bunch of patterns inline for it?
>
> Mike
>

Reply via email to