All,
I'm trying to determine why a change was made to my internal project
some years ago. The commit comment is unhelpful, but this field type was
added and then we changed our "username" field in the Solr index to use
this field-type:
"add-field-type" : {
"name":"sortMe",
"class":"solr.TextField",
"analyzer":{
"tokenizer":{
"class":"solr.KeywordTokenizerFactory"
}
"filters":[{
"class":"solr.LowercaseFilterFactory"
}]
}
}
The "username" field contains (wait for it) the username for a user,
where each document in the index represents a user. We want to be able
to search for users given their usernames and also be able to sort based
upon the value.
I *think* the reason we changed this was because of the sorting. If you
have a username like "foo-bar-baz" then Solr will tokenize the value
into separate terms but we want to use the whole thing together as one
continuous string.
We want to do the same thing with email addresses, and we used this same
field-type for that purpose. For example, it's never useful to search
for "gmail" in email addresses because some huge percentage of users
come back. If you really want to search for all gmail users, we want you
to search for "*gmail*".
Will we likely achieve our goals with the field-type specified above?
Is there a reason to lowercase everything? Does that affect sorting? It
does not seem to affect searching.
Thanks,
-chris