On 11/21/2010 06:09 PM, Robert Haas wrote:
I think that's fair.  It actually doesn't seem like it should be that
hard if we knew that the server encoding were UTF8 - it's just a big
translation table somewhere, no?

No, it's far more complex. See for example <http://unicode.org/reports/tr21/tr21-3.html>, which says:

   There are a number of complications to case mappings that occur once
   the repertoire of characters is expanded beyond ASCII.

       * Because of the inclusion of certain composite characters for
         compatibility, such as 01F1 "DZ" /capital dz/, there is a
         third case, called /titlecase/, which is used where the first
         letter of a word is to be capitalized (e.g. Titlecase, vs.
         UPPERCASE, or lowercase).
             o For example, the title case of the example character is
               01F2 "Dz" /capital d with small z/.
       * Case mappings may produce strings of different length than the
         original.
             o For example, the German character 00DF "ß" /small letter
               sharp s/ expands when uppercased to the sequence of two
               characters "SS". This also occurs where there is no
               precomposed character corresponding to a case mapping,
               such as with 0149 "'n" /latin small letter n preceded by
               apostrophe./
       * Characters may also have different case mappings, depending on
         the context.
             o For example, 03A3 "?" /capital sigma/ lowercases to 03C3
               "?" /small sigma/ if it is followed by another letter,
               but lowercases to 03C2 "?" /small final sigma/ if it is not.
       * Characters may have case mappings that depend on the locale.
             o For example, in Turkish the letter 0049 "I" /capital
               letter i/ lowercases to 0131 "?" /small dotless i/.
       * Case mappings are not, in general, reversible.
             o For example, once the string "McGowan" has been
               uppercased, lowercased or titlecased, the original
               cannot be recovered by applying another uppercase,
               lowercase, or titlecase operation.


cheers

andrew



Reply via email to