Retconn'ing Solr index schema

Christopher Schultz Fri, 16 Dec 2022 09:51:09 -0800

All,

I'm trying to determine why a change was made to my internal projectsome years ago. The commit comment is unhelpful, but this field type wasadded and then we changed our "username" field in the Solr index to usethis field-type:


"add-field-type" : {
  "name":"sortMe",
  "class":"solr.TextField",
  "analyzer":{
    "tokenizer":{
      "class":"solr.KeywordTokenizerFactory"
    }
    "filters":[{
      "class":"solr.LowercaseFilterFactory"
    }]
  }
}

The "username" field contains (wait for it) the username for a user,where each document in the index represents a user. We want to be ableto search for users given their usernames and also be able to sort basedupon the value.

I *think* the reason we changed this was because of the sorting. If youhave a username like "foo-bar-baz" then Solr will tokenize the valueinto separate terms but we want to use the whole thing together as onecontinuous string.

We want to do the same thing with email addresses, and we used this samefield-type for that purpose. For example, it's never useful to searchfor "gmail" in email addresses because some huge percentage of userscome back. If you really want to search for all gmail users, we want youto search for "*gmail*".


Will we likely achieve our goals with the field-type specified above?

Is there a reason to lowercase everything? Does that affect sorting? Itdoes not seem to affect searching.


Thanks,
-chris

Retconn'ing Solr index schema

Reply via email to