In 10.1.6 Comments the current HTML spec 
http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#comments
 says:

> Following this sequence, the comment may have text, with the additional
> restriction that the text must not [...] contain two consecutive U+002D
> HYPHEN-MINUS characters (--) [...]

Section 5 of RFC 3490 http://tools.ietf.org/html/rfc3490#section-5 defines the 
ACE-prefix in Internationalized Domain Names to be "xn--", i.e. always 
containing two consecutive hyphen-minus characters.

This leads to the odd situation that correctly ASCII-compatible encoded IDNs 
cannot be used in HTML comments. For example, the wide-spread habit of 
commenting out parts of HTML code in web pages fails when the code contains 
those otherwise valid URLs. This really happens in practice when working with 
IDNs (my personal experience) and I assume this incompatibility will cause a 
growing number of pages to be invalid in future, as the number of used IDNs 
grows, which will happen for sure, as ICANN has approved internationalized top 
level domain names this year.

Can the problems be prevented? E.g. by making "xn--" and "XN--" valid in 
comments?

May it even be justified to make "--" valid in comments again? As I understand 
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-May/006337.html and 
following replies, "--" used to be valid earlier in the spec and was then 
changed to make HTML more compatible with SGML, although HTML(5) is explicitly 
not SGML anymore. Making "--" valid won't affect any previously valid or 
invalid HTML page in any negative way, will it?

Regards,
Martin Janecke

Reply via email to