Hi, Apache Solr sorts by lexicographic order so uppercase/lowercase counts!
Cheers On Sat, 17 Dec 2022, 02:51 Christopher Schultz, < [email protected]> wrote: > All, > > I'm trying to determine why a change was made to my internal project > some years ago. The commit comment is unhelpful, but this field type was > added and then we changed our "username" field in the Solr index to use > this field-type: > > "add-field-type" : { > "name":"sortMe", > "class":"solr.TextField", > "analyzer":{ > "tokenizer":{ > "class":"solr.KeywordTokenizerFactory" > } > "filters":[{ > "class":"solr.LowercaseFilterFactory" > }] > } > } > > The "username" field contains (wait for it) the username for a user, > where each document in the index represents a user. We want to be able > to search for users given their usernames and also be able to sort based > upon the value. > > I *think* the reason we changed this was because of the sorting. If you > have a username like "foo-bar-baz" then Solr will tokenize the value > into separate terms but we want to use the whole thing together as one > continuous string. > > We want to do the same thing with email addresses, and we used this same > field-type for that purpose. For example, it's never useful to search > for "gmail" in email addresses because some huge percentage of users > come back. If you really want to search for all gmail users, we want you > to search for "*gmail*". > > Will we likely achieve our goals with the field-type specified above? > > Is there a reason to lowercase everything? Does that affect sorting? It > does not seem to affect searching. > > Thanks, > -chris >
