From: Marcel <[EMAIL PROTECTED]>
Date: Fri, 16 Jun 2006 17:20:33 +0200
Does this mean, that there are several codes for the same character
- =20=
like a "=E9" in Unicode?
Yes, there are several "codes". I think it's called a serialisation
of code points, but I'm also a bit loose on the Unicode terminology.
Basically, Unicode has some history behind it. It tried to be all
things to all people, and thus we often two codes for the same
accented letter. Some people wanted their favourite letters remaining
as just one code-point, despite that the more modern way that
Unicode.org suggests, is to have one letter + one accent as a
seperate letter. The accent is called a "combining character".
So Unicode added both variants, the accents and one code-point that
equals both, to try to please both kinds of users.
Even if they didn't do this thing, we'd still need normalisation
code. What if you have a letter with two accents?
Well, the letter can look like the same letter, even if the accent is
above or below.
For example: A + above-accent + below-accent, looks the same to the
user as: A + below-accent + above-accent.
So, Unicode have specified a correct order for combining characters
to be reordered. My UnicodeStuff module has a ReorderCombiners method
now :)
But not for UTF-8 encoding, or?
The encoding has got nothing to do with this. You'll get exactly the
same problem, no more or less problems, by using UTF-8, UTF-16 or
UTF-32.
This FAQ http://www.unicode.org/faq/normalization.html tries to
explain a bit more, but unfortunately it's all written in gobbedly-
gook, so I don't think it'll help ;)
--
http://elfdata.com/plugin/
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>