unicode collation support
-------------------------
Key: SOLR-1571
URL: https://issues.apache.org/jira/browse/SOLR-1571
Project: Solr
Issue Type: New Feature
Components: Analysis
Reporter: Robert Muir
Priority: Minor
Attachments: SOLR-1571.patch
This patch adds support for unicode collation (searching and sorting).
Unicode collation is helpful in a search engine, for many languages you want
things to match or sort differently.
You might even want to use copyfield and support different sort orders/matching
schemes if you need to support multiple languages.
This is simply a factory for lucene's CollationKeyFilter, which indexes binary
collation keys in a special format that preserves binary sort order.
I've added support for creating a Collator in two ways:
* system collator from a Locale spec (language + country + variant)
* tailored collator from custom rules in a text file
in no way is there an option to use the "default" locale of the jvm, (I
consider this a bit dangerous)
in this patch, it is mandatory to define the locale explicitly for a system
collator.
The required lucene-collation-2.9.1.jar is only 12KB.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.