[
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498492
]
Ryan McKinley commented on SOLR-248:
------------------------------------
>
> 1) would it make sense for the keep option to refer to a file, using the same
> format as StopFilter ... that way it's easy to reuse the same file (which
> seems like it would be a common case.
>
probably. that is a good idea
> 2) what is the point of forceFirstLetter="true" ? ... if you want to force
> capitalization, what's the point of making hte keep list?
>
This is one that came of necessity!
with keep="the ..." and input:
"Grand army of the Republic", "the arts"
I want: "Grand Army of the Republic" and "The Arts"
"forceFirstLetter" only applies to the first character in the token, not to
each word.
> 3) is okPrefix going to force the case for things that have that prefix in an
> alternate case, or only allow that casing to remain (ie: if i index McKeen,
> Mckeen, mckeen and MCKEEN what tokens do i wind up with?)
>
As written, if the prefix matches, it assumes the word capitalization is
correct. For my input data, this is sufficient -- but it should problem do
something smarter.
So, if you index "McKeen, Mckeen, mckeen, MCKEEN and McKEEN", you would get:
"McKeen, Mckeen, Mckeen, Mckeen And McKEEN"
If "okPrefix" was treated as *the* capitalization for input where the lowercase
prefix matches "mck", it would give:
"McKeen, McKeen, McKeen, McKeen And McKeen"
> Capitalization Filter Factory
> -----------------------------
>
> Key: SOLR-248
> URL: https://issues.apache.org/jira/browse/SOLR-248
> Project: Solr
> Issue Type: New Feature
> Reporter: Ryan McKinley
> Priority: Minor
> Attachments: SOLR-248-CapitalizationFilter.patch
>
>
> For tokens that are used in faceting, it is nice to have standard
> capitalization.
> I want "Aerial views" and "Aerial Views" to both be: "Aerial Views"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.