Re: Cross-platform Unicode normalisation, NFD

Theodore H. Smith Fri, 16 Jun 2006 10:22:05 -0700

From: Marcel <[EMAIL PROTECTED]>
Date: Fri, 16 Jun 2006 17:20:33 +0200
Does this mean, that there are several codes for the same character- =20=
like a "=E9" in Unicode?

Yes, there are several "codes". I think it's called a serialisationof code points, but I'm also a bit loose on the Unicode terminology.

Basically, Unicode has some history behind it. It tried to be allthings to all people, and thus we often two codes for the sameaccented letter. Some people wanted their favourite letters remainingas just one code-point, despite that the more modern way thatUnicode.org suggests, is to have one letter + one accent as aseperate letter. The accent is called a "combining character".

So Unicode added both variants, the accents and one code-point thatequals both, to try to please both kinds of users.

Even if they didn't do this thing, we'd still need normalisationcode. What if you have a letter with two accents?

Well, the letter can look like the same letter, even if the accent isabove or below.

For example: A + above-accent + below-accent, looks the same to theuser as: A + below-accent + above-accent.

So, Unicode have specified a correct order for combining charactersto be reordered. My UnicodeStuff module has a ReorderCombiners methodnow :)

But not for UTF-8 encoding, or?

The encoding has got nothing to do with this. You'll get exactly thesame problem, no more or less problems, by using UTF-8, UTF-16 orUTF-32.

This FAQ http://www.unicode.org/faq/normalization.html tries toexplain a bit more, but unfortunately it's all written in gobbedly-gook, so I don't think it'll help ;)


--
http://elfdata.com/plugin/



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: Cross-platform Unicode normalisation, NFD

Reply via email to