Hi,
Apache Solr sorts by lexicographic order so uppercase/lowercase counts!

Cheers

On Sat, 17 Dec 2022, 02:51 Christopher Schultz, <
[email protected]> wrote:

> All,
>
> I'm trying to determine why a change was made to my internal project
> some years ago. The commit comment is unhelpful, but this field type was
> added and then we changed our "username" field in the Solr index to use
> this field-type:
>
> "add-field-type" : {
>    "name":"sortMe",
>    "class":"solr.TextField",
>    "analyzer":{
>      "tokenizer":{
>        "class":"solr.KeywordTokenizerFactory"
>      }
>      "filters":[{
>        "class":"solr.LowercaseFilterFactory"
>      }]
>    }
> }
>
> The "username" field contains (wait for it) the username for a user,
> where each document in the index represents a user. We want to be able
> to search for users given their usernames and also be able to sort based
> upon the value.
>
> I *think* the reason we changed this was because of the sorting. If you
> have a username like "foo-bar-baz" then Solr will tokenize the value
> into separate terms but we want to use the whole thing together as one
> continuous string.
>
> We want to do the same thing with email addresses, and we used this same
> field-type for that purpose. For example, it's never useful to search
> for "gmail" in email addresses because some huge percentage of users
> come back. If you really want to search for all gmail users, we want you
> to search for "*gmail*".
>
> Will we likely achieve our goals with the field-type specified above?
>
> Is there a reason to lowercase everything? Does that affect sorting? It
> does not seem to affect searching.
>
> Thanks,
> -chris
>

Reply via email to