Re: URI Comparisons: RFC 2616 vs. RDF

Nathan Thu, 20 Jan 2011 06:34:36 -0800

Hi Dave,

Generally I agree, will address a few specific points in line (just toaddress them) then summarize my intended goals at the end (being thesubstance of the mail).


Dave Reynolds wrote:

The URI spec (rfc3986[1]) does allow this usage. In particular Section 6
Normalization and Comparison says:

"""URI comparison is performed for some particular purpose. Protocolsor implementations that compare URIs for different purposes will

   often be subject to differing design trade-offs in regards to how
   much effort should be spent in reducing aliased identifiers.  This
   section describes various methods that may be used to compare URIs,
   the trade-offs between them, and the types of applications that might
   use them."""

and

"""We use the terms "different" and
   "equivalent" to describe the possible outcomes of such comparisons,
   but there are many application-dependent versions of equivalence."""

While RDF predates this spec it seems to me that the RDF usage remains
consistent with it. The purpose of comparison in RDF is different from
that of cache retrieval of web pages or message delivery of email.


Indeed, I also read though:

   For all URIs, the hexadecimal digits within a percent-encoding
   triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
   should be normalized to use uppercase letters for the digits A-F.

   When a URI uses components of the generic syntax, the component
   syntax equivalence rules always apply; namely, that the scheme and
   host are case-insensitive and therefore should be normalized to
   lowercase...
   - http://tools.ietf.org/html/rfc3986#section-6.2.2.1

And took the "For all" and "always" to literally mean "for all" and"always".


Unsure where this leaves things, and which takes precedence.

This quote also makes clear that there is no single definitive
normalization. There are different levels of normalization possible

depending on your needs.


agree

So I claim that in terms of formal published specifications:
(1) RDF, OWL and RIF do not require any normalization of URIs (beyond
the character encoding level) and compare URIs by simple string
comparison.


One potential issue on the % encoding, clarified further down.

(2) This usage is *not* precluded by the URI specs, at least by 3986
which sets the current framework for the application of scheme-specific
specs.

Not a 100% sure but tempted to agree with you, would make sense not topreclude it.

As we've already mentioned :) there are no specs for linked data so we
move onto more subjective grounds.


Would be nice to get some specs at some point...

The linked data convention is that dereferencing some URI U in your RDF
document should return information about U, including further onward
links. So if data set A spells a URI hTTp://example.com/foo but the data
you get from dereferencing that URI talks only about
http://example.com/foo then someone has a problem somewhere. The
question is who, where and how to fix it.


agree, good way of putting it.

against both the RDF Specification [1] and the URI specification whenthey say /not/ to encode permitted US-ASCII characters (like ~ %7E)?
Where did that example come from?


   The encoding consists of... %-escaping octets that do not correspond
   to permitted US-ASCII characters.
   - http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.
   - http://tools.ietf.org/html/rfc3986#section-2.3

I read those quotes as saying do not encode permitted US-ASCIIcharacters in RDF URI References.

At what point have we suggested doing that?


As above

whyforce case-sensitive matching on the scheme and domain on URIs matchingthe generic syntax when the specs say must be compared caseinsensitively?
No, the specs do not say that, see above.


See "for all" and "always" quote earlier on.

So use normalized URIs in the first place.

...

RDF/OWL/RIF aren't designed the way they are because someone thought it
would be a good idea to allow such things to be used side by side or
because they *want* people to use denormalized URIs.

...

The point is that there is no single, simple, universal (i.e. across all
schemes) normalization algorithm that could be used.
The current approach gives stable, well-defined behaviour which doesn't
change as people invent new URI schemes. The RDF serializations give you
enough control to enable you to be certain about what URI you are
talking about. Job done.

Okay, I agree, and I'm really not looking to create a lot of work here,the general gist of what I'm hoping for is along the lines of:

RDF Publishers MUST perform Case Normalization and Percent-EncodingNormalization on all URIs prior to publishing. When using relative URIspublishers SHOULD include a well defined base using a serializationspecific mechanism. Publishers are advised to perform additionalnormalization steps as specified by URI (RFC 3986) where possible.

RDF Consumers MAY normalize URIs they encounter and SHOULD performCase Normalization and Percent-Encoding Normalization.

Two RDF URIs are equal if and only if they compare as equal,character by character, as Unicode strings.

For many reasons it would be good to solve this at the publishing phase,allow normalization at the consuming phase (can't be precluded asintermediary components may normalize), and keep simple case sensitivestring comparison throughout the stack and specs (so implementationsremain simple and fast.)


Does anybody find the above disagreeable?

Best, and cheers for the reply Dave,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

Reply via email to