Josh Hoyt schrieb: > There has been a little discussion in the past about the restriction > on allowed character entity references. I don't think there has been > any about numeric character references, except in lumping them in with > character entity references. > > These restrictions live on from the OpenID 1 specification, and were > preserved primarily to ease backwards compatibility (IIRC).
It seems that it has been taken from the pingback specification: http://www.hixie.ch/specs/pingback/pingback#TOC2.2 The rationale given is that it should not be necessary to implement a full HTML parser. Unfortunatly, this allegation is completly bogous: As HTML has a context-sensitive grammar, you just can't parse it with regular expressions. If you try, you will ineviteably write a parser that falls for some HTML constructs users might expect to work. (For example: comments. It nearly unimaginable but users might even try to put comment markers around an OpenID link, add another OpenID link and expect RPs to use the one not within a comment?) Others will also try and will ineviteably write a parser that falls for some _different_ HTML constructs. The result is that one RP might work with an URL (because it can handle comments within the HTML) and another one does not. Without looking at the code of the RP's HTML parser, it is nearly impossible for the user to tell why some RPs fail. If that isn't extremly bad user experience, what is? (As a side note: There's no telling whether there's a security risk with some RPs, either.) The only way around this is using a real HTML parser. If you do, there's no reason not to parse and handle all character references. Claus _______________________________________________ specs mailing list [email protected] http://openid.net/mailman/listinfo/specs
