Re: Normalization forms

John Cowan Mon, 13 May 2002 14:48:44 -0700

Lars Marius Garshol scripsit:

>  - will string comparison methods based on NFC and NFD always give the
>    same results?


By intention, yes.

>  - is it correct that methods based on NFKC and NFKD will give
>    different results from ones based on NFC/NFD?

Yes.

>  - if NFC and NFD give the same results, why are both specified? Why
>    would an implementation choose one over the other?

Originally, only NFD was given, as it is sufficient.  However, text
converted from non-Unicode encodings is generally already in NFC,
so specifying NFC (which is conceptually NFD with a post-processing
pass to re-create certain precomposed characters) has certain practical
advantages.  In particular, if you are doing "early normalization",
near the point of creation, then NFC allows easy step-down to
non-Unicode encodings.

>  - NFKC/NFKD seem to lose significant information; in what contexts
>    are they intended to be used?

Compatibility distinctions may or may not be important in particular
cases: often they represent distinctions that are merely historical.
One context where compatibility distinctions are typically unimportant
is in identifiers.

-- 
John Cowan <[EMAIL PROTECTED]>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_

Re: Normalization forms

Reply via email to