On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna <[email protected]> wrote:
2009/3/29 Anne van Kesteren <[email protected]>:
I'm not sure if you're correct about those differences, but even if you are they are not the only differences. E.g. LEIRIs perform normalization if the input encoding is non-Unicode. URLs do not. URLs can encode their query component per the input encoding (and do so for HTML and some APIs). LEIRIs cannot.

What is the problem with normalization? Is there a standard for
conversion to non-Unicode to Unicode?
I guess no, so normalization (which should always be done) is perfectly legal.

It's about Unicode Normalization. (And it should not always be done.)


In addition, IRIs are defined as a sequence of Unicode codepoints. It
does not matter how those codepoints are stored (ASCII, ISO-8859-1,
UTF-8), only the Unicode version of them.

Please read the IRI specification again. Specifically section 3.1.


This is the same as URL5s, by the way, because none of them is defined
on octets and both use the RFC3986 method for percent-encoding (using
UTF-8)

No, it's not always using UTF-8.


--
Anne van Kesteren
http://annevankesteren.nl/

Reply via email to