Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "UnicodeCollation" page has been changed by OtisGospodnetic.
The comment on this change is: Clarification FIXME for.... Robert Muir?.
http://wiki.apache.org/solr/UnicodeCollation?action=diff&rev1=1&rev2=2

--------------------------------------------------

  == Sorting text for multiple languages ==
  There are two approaches to supporting multiple languages:
  
-  * If there is a small list, consider defining collated fields for each 
language and using copyField.
+  * If there is a small list (FIXME: small list of Languages? Fields?), 
consider defining collated fields for each language and using copyField.
   * If there is a very large list, an alternative is to use the "Unicode 
default" collator.
  
  The Unicode default, or "ROOT" Locale, has rules that are designed to work 
well in general for most languages. To use it, simply define the language as 
the empty string.
@@ -70, +70 @@

  The example code below shows how to create a custom ruleset and dump it to a 
file.
  
  {{{
-     // get the default rules for germany
+     // get the default rules for Germany
      // these are called DIN 5007-1 sorting
      RuleBasedCollator baseCollator = (RuleBasedCollator) 
Collator.getInstance(new Locale("de", "DE"));
  
@@ -116, +116 @@

    </analyzer>
  </fieldType>
  }}}
- 
  Below is an example of what this would look like for two words that should 
match with this collator: Töne and toene.
  
  '''org.apache.solr.analysis.StandardTokenizerFactory'''
@@ -127, +126 @@

  ||<style="text-align: center;" |1>payload ||<class="debugdata"> 
||<class="debugdata"> ||
  
  
+ 
+ 
  '''org.apache.solr.analysis.CollationKeyFilterFactory   {strength=primary, 
custom=customRules.dat}'''
  ||<tablewidth="" tableclass="analysis"style="text-align: center;" |1>term 
position ||<class="debugdata">1 ||<class="debugdata">2 ||
  ||<style="text-align: center;" |1>term text 
||<class="debugdata">3䀘䀋#6;ࠂ怀#0;#0;#0; ||<class="debugdata">3䀘䀋#6;ࠂ怀#0;#0;#0; ||
@@ -134, +135 @@

  ||<style="text-align: center;" |1>source start,end ||<class="debugdata">0,4 
||<class="debugdata">5,10 ||
  ||<style="text-align: center;" |1>payload ||<class="debugdata"> 
||<class="debugdata"> ||
  
- Please note that the strange output you see from the filter is really a 
binary collation key encoded in a special form.
- What is important is that it is the same value for equivalent tokens as 
defined by that collator.
  
+ 
+ 
+ Please note that the strange output you see from the filter is really a 
binary collation key encoded in a special form. What is important is that it is 
the same value for equivalent tokens as defined by that collator.
+ 

Reply via email to