Yeah, the next time we update the spec, we'll probably need to reference http://url.spec.whatwg.org/, which has the needed definitions.
Adam On Wed, Feb 13, 2013 at 6:09 AM, Bjoern Hoehrmann <[email protected]> wrote: > Hi, > > https://tools.ietf.org/html/rfc6454 fails to properly account for a > number of cases where URIs and URI schemes are slightly unusual, e.g. > > The origin of a URI is the value computed by the following algorithm: > > 1. If the URI does not use a hierarchical element as a naming > authority (see [RFC3986], Section 3.2) or if the URI is not an > absolute URI, then generate a fresh globally unique identifier > and return that value. > ... > 2. Let uri-scheme be the scheme component of the URI, converted to > lowercase. > > 3. If the implementation doesn't support the protocol given by uri- > scheme, then generate a fresh globally unique identifier and > return that value. > > Consider `javascript://example.org`. In order to make the determination > whether "the URI" uses "a hierarchical element as a naming authority" > you have to know the scheme, but the scheme is not mentioned until after > the first step, which may lead one to believe that you can make this de- > termination without knowing the scheme. > > For 'javascript' in particular there is no "over the wire" "protocol", > so it's not clear what to do in the third step. Consider this from the > perspective of someone making a generic URI library and giving URI > objects some `.origin` property: how would that work? A browser might > support "ftp" but a user might disable loading resources over FTP in the > browser; or it might phase out FTP support but keep supporting 'ftp' > URIs (like by still knowing the default port). What is the "Origin of a > URI" then? Does it matter if you do not actually load content from such > a URI, or don't do it in a web-browser-like fashion? I am not sure... > > Further down there is > > 5. Let uri-host be the host component of the URI, converted to lower > case (using the i;ascii-casemap collation defined in [RFC4790]). > > What if there is no `host` component? `news:de.comp.text.xml` does not > have one, even though the scheme does use "a hierarchical element as a > naming authority" and the URI is valid? For that matter, what if there > is such a component but it's the empty string (like in `file:///`, if > you ignore the specific provision for 'file')? It seems the empty string > would pass through the "algorithm", but it's unclear if that is inten- > tional and what the security considerations are in this regard. > > 6. If there is no port component of the URI: > > 1. Let uri-port be the default port for the protocol given by > uri-scheme. > > Otherwise: > > 2. Let uri-port be the port component of the URI. > > Per RFC 3986 schemes may define a default port but do not have to. What > if a scheme does not define a default port? Also, what if the component > is present, but is the empty string? In section 6.1 I'm told > > 1. Append a U+003A COLON code point (":") and the given port, in > base ten, to result. > > I can't give the empty string in base ten. Per RFC 3986 the port compo- > nent should be omitted when it is the empty string, which would lead to > use of the default port if any, but there is no provision in RFC 6454 > for normalising URIs and it's valid to use the empty string as value so > that is valid input into the "Origin of a URI" "algorithm". > > regards, > -- > Björn Höhrmann · mailto:[email protected] · http://bjoern.hoehrmann.de > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ > _______________________________________________ > websec mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/websec _______________________________________________ websec mailing list [email protected] https://www.ietf.org/mailman/listinfo/websec
