Hello, in UAX #44 i read

  Simple_Titlecase_Mapping ...
    Note: If this field is null, then the Simple_Titlecase_Mapping
    is the same as the Simple_Uppercase_Mapping for this character.

So a parser has to be aware of this, automatically falling back to
the uppercase mapping (index 12) when there is no explicit
titlecase mapping (index 14).

Given this the following surprised me:

  ?0[steffen@sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
    {if (length($15) && $15 = $13) print}' |wc -l
      1051
  ?0[steffen@sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
    {if (length($15) && $15 != $13) print}' |wc -l
        12

(I.e., 1051 times the redundant mapping is defined.)

  $ <UnicodeData.txt >UnicodeData.txt.new \
    awk 'BEGIN{FS=";"; OFS=";"}\
    {if (length($15) && $15 = $13) $15=""; print}'

--steffen

Reply via email to