All,

I'm trying to determine why a change was made to my internal project some years ago. The commit comment is unhelpful, but this field type was added and then we changed our "username" field in the Solr index to use this field-type:

"add-field-type" : {
  "name":"sortMe",
  "class":"solr.TextField",
  "analyzer":{
    "tokenizer":{
      "class":"solr.KeywordTokenizerFactory"
    }
    "filters":[{
      "class":"solr.LowercaseFilterFactory"
    }]
  }
}

The "username" field contains (wait for it) the username for a user, where each document in the index represents a user. We want to be able to search for users given their usernames and also be able to sort based upon the value.

I *think* the reason we changed this was because of the sorting. If you have a username like "foo-bar-baz" then Solr will tokenize the value into separate terms but we want to use the whole thing together as one continuous string.

We want to do the same thing with email addresses, and we used this same field-type for that purpose. For example, it's never useful to search for "gmail" in email addresses because some huge percentage of users come back. If you really want to search for all gmail users, we want you to search for "*gmail*".

Will we likely achieve our goals with the field-type specified above?

Is there a reason to lowercase everything? Does that affect sorting? It does not seem to affect searching.

Thanks,
-chris

Reply via email to