Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)

Anne van Kesteren Sun, 29 Mar 2009 06:07:01 -0700

On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna<[email protected]> wrote:

2009/3/29 Anne van Kesteren <[email protected]>:
I'm not sure if you're correct about those differences, but even if youare they are not the only differences. E.g. LEIRIs performnormalization if the input encoding is non-Unicode. URLs do not. URLscan encode their querycomponent per the input encoding (and do so for HTML and some APIs).LEIRIs cannot.
What is the problem with normalization? Is there a standard for
conversion to non-Unicode to Unicode?
I guess no, so normalization (which should always be done) is perfectlylegal.


It's about Unicode Normalization. (And it should not always be done.)

In addition, IRIs are defined as a sequence of Unicode codepoints. It
does not matter how those codepoints are stored (ASCII, ISO-8859-1,
UTF-8), only the Unicode version of them.


Please read the IRI specification again. Specifically section 3.1.

This is the same as URL5s, by the way, because none of them is defined
on octets and both use the RFC3986 method for percent-encoding (using
UTF-8)


No, it's not always using UTF-8.


--
Anne van Kesteren
http://annevankesteren.nl/

Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)

Reply via email to