In message <[EMAIL PROTECTED]>
          Asmus Freytag <[EMAIL PROTECTED]> wrote:

> On top of that, it looks like 950 maps a bogus symbol or punctuation 
> character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for 
> starters. Fonts covering CP950 give a way different image for that 
> character than you'd expect from either the charts or the names...

I recently had to sort out our systems' Big5<->Unicode mapping table, and
there seems to be great confusion in the punctuation space. The table (that
used to be) on the Unicode site was unsatisfactory, and Microsoft's CP950
mapping also doesn't seem to make sense (eg with that U+2574 mapping, and
CIRCLED PLUS and DOT OPERATOR instead of EARTH and SUN).

One point of note is that there are a whole cluster of characters in the
compatibility area of Unicode from U+FE30 to U+FE6B that are designed to
handle mapping CNS11643, whose punctuation area is almost identical to
Big5's. Mapping tables I've seen don't make proper use of them.

I was able to come up with a good Big5 mapping by taking the best ideas from
various Big5 and CNS11643 tables on the net, then making sure each of those
Unicode compatibility characters was used once, AND IN THE ORDER THEY APPEAR
IN UNICODE. This ends up mapping A15A to U+FE58 SMALL EM DASH, which still
might not be right, but it looks like a confused character anyway - it
appears different in Big5 and CNS11643 tables, so it could just be a glyph
variant issue.

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc                     Tel: +44 (0) 1223 518566
645 Newmarket Road                            Fax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United Kingdom            WWW: http://www.pace.co.uk/

Reply via email to