subject:"URI Comparisons\: RFC 2616 vs. RDF"

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-22 Thread Toby Inkster

On Tue, 18 Jan 2011 21:43:08 -0600
Peter DeVries pete.devr...@gmail.com wrote:

 I have URI's where case is important only at the terminal identifier.
 (HTML URI's in this example)
 http://lod.taxonconcept.org/ses/v6n7p.html
 should be different than
 http://lod.taxonconcept.org/ses/v6N7p.html
 Am I correct in thinking that this is OK?

Yes, HTTP URIs are case-sensitive apart from the scheme (http), host
(lod.taxonconcept.org) and percent-escaped characters (e.g. %7e vs %7E).

Any URI canonicalisation tool that treats the above two URIs as the
same is plain broken.

-- 
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-22 Thread Kingsley Idehen


On 1/22/11 8:27 AM, Toby Inkster wrote:

On Tue, 18 Jan 2011 21:43:08 -0600
Peter DeVriespete.devr...@gmail.com  wrote:


I have URI's where case is important only at the terminal identifier.
(HTML URI's in this example)
http://lod.taxonconcept.org/ses/v6n7p.html
should be different than
http://lod.taxonconcept.org/ses/v6N7p.html
Am I correct in thinking that this is OK?

Yes, HTTP URIs are case-sensitive apart from the scheme (http), host
(lod.taxonconcept.org) and percent-escaped characters (e.g. %7e vs %7E).

Any URI canonicalisation tool that treats the above two URIs as the
same is plain broken.



Amen!

A URI is an Identifier. The fact that it can be used to Identify a Data 
Source  i.e.,  an Address via HTTP scheme that provides actual access to 
Data doesn't negate the fact that it's fundamentally an Identifier.  The 
fact that the Web has manifested back to front (URLs usage before URI 
groking) doesn't mean everything has to follow this warped pattern.


The Web is part of a technology continuum. Computing did exist before 
the WWW became ubiquitous.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-21 Thread Nathan


Harry Halpin wrote:

On Thu, Jan 20, 2011 at 11:15 AM, Nathan nat...@webr3.org wrote:

Out of interest, where is that process defined? I was looking for it the
other day - for instance in the quoted specification we have the example:

edi:price xmlns:edi='http://ecommerce.example.org/schema'
units='Euro'32.18/edi:price

Where's the bit of the XML specification which says you join them up by
concatenating 'http://ecommerce.example.org/schema' with #(?assumed?) and
'Euro' to get 'http://ecommerce.example.org/schema#Euro'?



Actually you don't. A namespace is just that - a tuple (namespace,
localname) in XML. That's why namespaces in XML are far all intents
and purposes broken and why, to a large extent, Web browser developers
in HTML stopped using them and hate implementing them in the DOM, and
so refuse to have them in HTML5. And that's one reason RDF(A) will
probably continue getting a sort of bad rap in the HTML world, as
prefixes are not associated with just making URIs, but with this
terrible namespace tuple.

For an archeology of the relevant standards, check out Section What
Namespaces Do of this paper. While the paper is focussed on why
namespace documents are a mess, the relevant information is in that
section and extensively referenced, with examples:

http://xml.coverpages.org/HHalpinXMLVS-Extreme.html


Ahh, thanks for explaining that one Harry, most helpful :)

Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


Alan Ruttenberg wrote:

On Wed, Jan 19, 2011 at 4:45 PM, Nathan nat...@webr3.org wrote:

David Wood wrote:

On Jan 19, 2011, at 10:59, Nathan wrote:


ps: as an illustration of how engrained URI normalization is, I've
capitalized the domain names in the to: and cc: fields, I do hope the mail
still come through, and hope that you'll accept this email as being sent to
you. Hopefully we'll also find this mail in the archives shortly at
htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd
hope that any statements made using these URIs (asserted by man or machine)
would remain valid regardless of the (incorrect?-)casing.


Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF
RFC 2822 (and its predecessor RFC 822), which defers the interpretation to
RFC 1035 (Domain names - implementation and specification).  RFC 1035
section 2.3.3 states that domain names in DNS, and therefore in (E)SMTP, are
to be compared in a case-insensitive manner.

As far as I know, the W3C specs do not so refer to RFC 1035.


And I'll bite in the other direction, why not treat URIs as URIs? why go
against both the RDF Specification [1] and the URI specification when they
say /not/ to encode permitted US-ASCII characters (like ~ %7E)? why force
case-sensitive matching on the scheme and domain on URIs matching the
generic syntax when the specs say must be compared case insensitively? and
so on and so forth.


[AR]
Which specs?


The various URI/IRI specs and previous revisions of.


http://www.w3.org/TR/REC-xml-names/#NSNameComparison

URI references identifying namespaces

..

In a namespace declaration, the URI reference is

..

The URI references below are all different for the purposes of identifying
namespaces

..

The URI references below are also all different for the purposes of
identifying namespaces

..

So here is another spec that *explicitly* disagrees with the idea that URI
normalization should be a built-in processing.


As far as I can see, that's only for a URI reference used within a 
namespace, and does not govern usage or normalization when you join the 
URI reference up with the local name to make the full URI.


Out of interest, where is that process defined? I was looking for it the 
other day - for instance in the quoted specification we have the example:


edi:price xmlns:edi='http://ecommerce.example.org/schema' 
units='Euro'32.18/edi:price


Where's the bit of the XML specification which says you join them up by 
concatenating 'http://ecommerce.example.org/schema' with #(?assumed?) 
and 'Euro' to get 'http://ecommerce.example.org/schema#Euro'?


And finally, this is why I specifically asked if the non-normalization 
of RDF URI References had XML Namespace heritage, which had then 
filtered down through OWL, SPARQL and RIF.



[AR] More to document, please: Which data is being junked and scrapped?


will document, but essentially every statement made using a non 
normalized URI when other statements are also being made about the 
same resource using normalized URIs - the two most common cases for 
this will be when people are using CMS systems and enter their domain 
name as uppercase in some admin, only to have that filter through to 
URIs in serialized RDF/RDFa, and where bugs in software have led to 
inconsistent URIs over time (for instance where % encoding has been 
fixed, or a :80 has been removed from a URI).



[AR] Hmm. Are you suggesting that the behavior of libraries and clients
should have precedence over specification? My view is that one first looks
to specifications, and then only if specifications are poor or do not speak
to the issue do we look at existing behavior.


Yes I am, that specification should standardize the behaviour of 
libraries and clients - the level of normalization in URIs published, 
consumed or used by these tools is often determined by non sem web stack 
components, and the sem web components are blocked from normalizing 
these should-not-be-differing-URIs by the sem web specifications.



[AR] I think there are many ways to lose in this scenario. For instance, if
the server redirects then the base is the last in the chain of redirects.
http://tools.ietf.org/html/rfc3986#page-29, 5.1.3. Base URI from the
Retrieval URI. My conclusion - don't engineer this way.


That would be my conclusion too, but as RDF(a) moves in to the realms of 
the CMS systems and out of the hands of the sem web community, it will 
be increasingly engineered this way, it's a very common pattern when 
working with (X)HTML (allows people to test locally or on dev servers 
without changing the content).



Further, essentially all RDFa ever encountered by a browser has the casing
on all URIs in href and src, and all these which are resolved, automatically
normalized - so even if you set the base to htTp://EXAMPLE.org/ or use it
in a URI, browser tools, extensions, and js based libraries will only ever
see the normalized URIs (and thus be incompatible with the rest

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Kingsley Idehen


On 1/19/11 11:27 PM, Alan Ruttenberg wrote:



On Wed, Jan 19, 2011 at 11:11 AM, Kingsley Idehen 
kide...@openlinksw.com mailto:kide...@openlinksw.com wrote:


On 1/19/11 10:59 AM, Nathan wrote:

htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ -
Personally I'd hope that any statements made using these URIs
(asserted by man or machine) would remain valid regardless of the
(incorrect?-)casing. 

Okay for Data Source Address Ref. (URL), no good for Entity (Data
Item or Data Object) Name Ref., bar system specific handling via
IFP property or owl:sameAs :-)


Kingsley, same for you as Nathan. To what specification do you refer 
to for the definitions and behavior of:

 - Data source address ref
 - Entity
 - Statement.

-Alan


Alan,

My response is purely about managing Identifiers that are used as 
functional unambiguous Name or Address References. Not quoting a W3C 
spec. Basically, expressing a view based on my understanding of what's 
practical.


A system (e.g. a database or client app.) can (should) make a decision 
about how it handles resolvable Identifiers when used as Name or Address 
references.


Kingsley





-- 


Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web:http://www.openlinksw.com
Weblog:http://www.openlinksw.com/blog/~kidehen  
http://www.openlinksw.com/blog/%7Ekidehen
Twitter/Identi.ca: kidehen









--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Dave Reynolds

On Wed, 2011-01-19 at 21:45 +, Nathan wrote: 
 David Wood wrote:
  On Jan 19, 2011, at 10:59, Nathan wrote:
  ps: as an illustration of how engrained URI normalization is, I've 
  capitalized the domain names in the to: and cc: fields, I do hope the mail 
  still come through, and hope that you'll accept this email as being sent 
  to you. Hopefully we'll also find this mail in the archives shortly at 
  htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd 
  hope that any statements made using these URIs (asserted by man or 
  machine) would remain valid regardless of the (incorrect?-)casing.
  
  Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF 
  RFC 2822 (and its predecessor RFC 822), which defers the interpretation to 
  RFC 1035 (Domain names - implementation and specification).  RFC 1035 
  section 2.3.3 states that domain names in DNS, and therefore in (E)SMTP, 
  are to be compared in a case-insensitive manner.
  
  As far as I know, the W3C specs do not so refer to RFC 1035.
 
 And I'll bite in the other direction, why not treat URIs as URIs? 

It seems to me the underlying question here is whether aliasing of URIs
(whether they dereference to the same resource) should imply semantic
equality (i.e. use as an identifier in a web logic language like RDF or
OWL).

The position so far in RDF, OWL and RIF has been no

As far as the specifications for those languages are concerned a URI is
just a convenient spelling for an identifier and they require
comparison of identifiers to be stable and context-independent. 
Those specs don't constrain what you get back from dereferencing some
URI U to include statements about U.

The URI spec (rfc3986[1]) does allow this usage. In particular Section 6
Normalization and Comparison says:

URI comparison is performed for some particular purpose.  Protocols 
or implementations that compare URIs for different purposes will
   often be subject to differing design trade-offs in regards to how
   much effort should be spent in reducing aliased identifiers.  This
   section describes various methods that may be used to compare URIs,
   the trade-offs between them, and the types of applications that might
   use them.

and

We use the terms different and
   equivalent to describe the possible outcomes of such comparisons,
   but there are many application-dependent versions of equivalence.

While RDF predates this spec it seems to me that the RDF usage remains
consistent with it. The purpose of comparison in RDF is different from
that of cache retrieval of web pages or message delivery of email.

This quote also makes clear that there is no single definitive
normalization. There are different levels of normalization possible
depending on your needs. 

Earlier you pointed out that the place where the URI specs and RDF do
collide is in resolving relative URIs into absolute URIs. Again rfc3986
does not preclude the RDF usage. Section 5.2.1 says:

Normalization of the base URI, as described in Sections 6.2.2 and 
   6.2.3, is optional.

So I claim that in terms of formal published specifications:
(1) RDF, OWL and RIF do not require any normalization of URIs (beyond
the character encoding level) and compare URIs by simple string
comparison.
(2) This usage is *not* precluded by the URI specs, at least by 3986
which sets the current framework for the application of scheme-specific
specs.

** Now we turn to linked data ...

As we've already mentioned :) there are no specs for linked data so we
move onto more subjective grounds.

The linked data convention is that dereferencing some URI U in your RDF
document should return information about U, including further onward
links. So if data set A spells a URI hTTp://example.com/foo but the data
you get from dereferencing that URI talks only about
http://example.com/foo then someone has a problem somewhere. The
question is who, where and how to fix it.

It seems to me that this is primarily a issue with publishing, and a
little about being sensible about how you pass on links. If I'm going to
put up some linked data I should mint normalized URIs; I should use the
same spelling of the URIs throughout my data; I'll make sure those URIs
dereference and that the data that comes back is stable and useful. If
someone else refers to my resources using an aliased URI (such as a
different case for the protocol) and makes statements about those
aliases then they have simply made a mistake.

To make sure that dereference returns what I expect, independent of
aliasing, then I should publish data with explicit base URIs (or just
absolute URIs). Publishing with relative URIs and no base is a recipe
for having your data look different from different places. Just don't do
it. No surprise there.

None of this requires us to force URI normalization into the heart of
identifier comparison in RDF itself. It is not a necessary solution and
it is not a sufficient one because there is no universal

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


Hi Dave,

Generally I agree, will address a few specific points in line (just to 
address them) then summarize my intended goals at the end (being the 
substance of the mail).


Dave Reynolds wrote:

The URI spec (rfc3986[1]) does allow this usage. In particular Section 6
Normalization and Comparison says:

URI comparison is performed for some particular purpose.  Protocols 
or implementations that compare URIs for different purposes will

   often be subject to differing design trade-offs in regards to how
   much effort should be spent in reducing aliased identifiers.  This
   section describes various methods that may be used to compare URIs,
   the trade-offs between them, and the types of applications that might
   use them.

and

We use the terms different and
   equivalent to describe the possible outcomes of such comparisons,
   but there are many application-dependent versions of equivalence.

While RDF predates this spec it seems to me that the RDF usage remains
consistent with it. The purpose of comparison in RDF is different from
that of cache retrieval of web pages or message delivery of email.


Indeed, I also read though:

   For all URIs, the hexadecimal digits within a percent-encoding
   triplet (e.g., %3a versus %3A) are case-insensitive and therefore
   should be normalized to use uppercase letters for the digits A-F.

   When a URI uses components of the generic syntax, the component
   syntax equivalence rules always apply; namely, that the scheme and
   host are case-insensitive and therefore should be normalized to
   lowercase...
   - http://tools.ietf.org/html/rfc3986#section-6.2.2.1

And took the For all and always to literally mean for all and 
always.


Unsure where this leaves things, and which takes precedence.


This quote also makes clear that there is no single definitive
normalization. There are different levels of normalization possible
depending on your needs. 


agree


So I claim that in terms of formal published specifications:
(1) RDF, OWL and RIF do not require any normalization of URIs (beyond
the character encoding level) and compare URIs by simple string
comparison.


One potential issue on the % encoding, clarified further down.


(2) This usage is *not* precluded by the URI specs, at least by 3986
which sets the current framework for the application of scheme-specific
specs.


Not a 100% sure but tempted to agree with you, would make sense not to 
preclude it.



As we've already mentioned :) there are no specs for linked data so we
move onto more subjective grounds.


Would be nice to get some specs at some point...


The linked data convention is that dereferencing some URI U in your RDF
document should return information about U, including further onward
links. So if data set A spells a URI hTTp://example.com/foo but the data
you get from dereferencing that URI talks only about
http://example.com/foo then someone has a problem somewhere. The
question is who, where and how to fix it.


agree, good way of putting it.

against both the RDF Specification [1] and the URI specification when 
they say /not/ to encode permitted US-ASCII characters (like ~ %7E)? 


Where did that example come from? 


   The encoding consists of... %-escaping octets that do not correspond
   to permitted US-ASCII characters.
   - http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.
   - http://tools.ietf.org/html/rfc3986#section-2.3

I read those quotes as saying do not encode permitted US-ASCII 
characters in RDF URI References.



At what point have we suggested doing that?


As above

why 
force case-sensitive matching on the scheme and domain on URIs matching 
the generic syntax when the specs say must be compared case 
insensitively?


No, the specs do not say that, see above.


See for all and always quote earlier on.

So use normalized URIs in the first place. 

...

RDF/OWL/RIF aren't designed the way they are because someone thought it
would be a good idea to allow such things to be used side by side or
because they *want* people to use denormalized URIs.

...

The point is that there is no single, simple, universal (i.e. across all
schemes) normalization algorithm that could be used.
The current approach gives stable, well-defined behaviour which doesn't
change as people invent new URI schemes. The RDF serializations give you
enough control to enable you to be certain about what URI you are
talking about. Job done.


Okay, I agree, and I'm really not looking to create a lot of work here, 
the general gist of what I'm hoping for is along the lines of:


  RDF Publishers MUST perform Case Normalization and Percent-Encoding 
Normalization on all

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread David Booth

On Thu, 2011-01-20 at 13:08 +, Dave Reynolds wrote:
[ . . . ]
 It seems to me that this is primarily a issue with publishing, and a
 little about being sensible about how you pass on links. If I'm going to
 put up some linked data I should mint normalized URIs; I should use the
 same spelling of the URIs throughout my data; I'll make sure those URIs
 dereference and that the data that comes back is stable and useful. If
 someone else refers to my resources using an aliased URI (such as a
 different case for the protocol) and makes statements about those
 aliases then they have simply made a mistake.
 
 To make sure that dereference returns what I expect, independent of
 aliasing, then I should publish data with explicit base URIs (or just
 absolute URIs). Publishing with relative URIs and no base is a recipe
 for having your data look different from different places. Just don't do
 it. 

This advice sounds like an excellent candidate for publication in a best
practices document.  And if it is merely best practice guidance, perhaps
that *is* something that the new RDF working group could address.



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


David Booth wrote:

On Thu, 2011-01-20 at 13:08 +, Dave Reynolds wrote:
[ . . . ]

It seems to me that this is primarily a issue with publishing, and a
little about being sensible about how you pass on links. If I'm going to
put up some linked data I should mint normalized URIs; I should use the
same spelling of the URIs throughout my data; I'll make sure those URIs
dereference and that the data that comes back is stable and useful. If
someone else refers to my resources using an aliased URI (such as a
different case for the protocol) and makes statements about those
aliases then they have simply made a mistake.

To make sure that dereference returns what I expect, independent of
aliasing, then I should publish data with explicit base URIs (or just
absolute URIs). Publishing with relative URIs and no base is a recipe
for having your data look different from different places. Just don't do
it. 


This advice sounds like an excellent candidate for publication in a best
practices document.  And if it is merely best practice guidance, perhaps
that *is* something that the new RDF working group could address.


+1 from me, address at the publishing phase, allow at the consuming 
phase, keep comparison simple.

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread William Waites

* [2011-01-20 14:29:35 +] Nathan nat...@webr3.org écrit:

]   RDF Publishers MUST perform Case Normalization and Percent-Encoding 
] Normalization on all URIs prior to publishing. When using relative URIs 
] publishers SHOULD include a well defined base using a serialization 
] specific mechanism. Publishers are advised to perform additional 
] normalization steps as specified by URI (RFC 3986) where possible.
] 
]   RDF Consumers MAY normalize URIs they encounter and SHOULD perform 
] Case Normalization and Percent-Encoding Normalization.
] 
]   Two RDF URIs are equal if and only if they compare as equal, 
] character by character, as Unicode strings.
] 
] For many reasons it would be good to solve this at the publishing phase, 
] allow normalization at the consuming phase (can't be precluded as 
] intermediary components may normalize), and keep simple case sensitive 
] string comparison throughout the stack and specs (so implementations 
] remain simple and fast.)
] 
] Does anybody find the above disagreeable?


Sounds about right to me, but what about port numbers,
http://example.org/ vs http://example.org:80/?

-w

-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Martin Hepp


Hi:

On 20.01.2011, at 15:40, Nathan wrote:


David Booth wrote:

On Thu, 2011-01-20 at 13:08 +, Dave Reynolds wrote:
[ . . . ]


To make sure that dereference returns what I expect, independent of
aliasing, then I should publish data with explicit base URIs (or  
just
absolute URIs). Publishing with relative URIs and no base is a  
recipe
for having your data look different from different places. Just  
don't do

it.
This advice sounds like an excellent candidate for publication in a  
best
practices document.  And if it is merely best practice guidance,  
perhaps

that *is* something that the new RDF working group could address.


+1 from me, address at the publishing phase, allow at the consuming  
phase, keep comparison simple.





I am not sure whether you are also talking of RDFa, but in case you  
do, I would like to add the following:


Our experiences with helping about 2,000 sites with adding  
GoodRelations via our form-based tools shows that


1. RDFa is in many cases the only viable way for people to publish RDF
2. They can often not control and not even predict the exact URI of  
the page that will contain the markup (imagine uncool URIs loaded  
with parameters etc.)


In those scenarios, relative URIs are essential.

We even recommend that people include an empty

   div rel=foaf:page resource=/div

at the proper position in the nesting so that there will be a link  
between the data entity and the page that contains it.


Martin

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


Martin Hepp wrote:

On 20.01.2011, at 15:40, Nathan wrote:

David Booth wrote:

On Thu, 2011-01-20 at 13:08 +, Dave Reynolds wrote:
[ . . . ]


To make sure that dereference returns what I expect, independent of
aliasing, then I should publish data with explicit base URIs (or just
absolute URIs). Publishing with relative URIs and no base is a recipe
for having your data look different from different places. Just 
don't do

it.

This advice sounds like an excellent candidate for publication in a best
practices document.  And if it is merely best practice guidance, perhaps
that *is* something that the new RDF working group could address.


+1 from me, address at the publishing phase, allow at the consuming 
phase, keep comparison simple.


I am not sure whether you are also talking of RDFa, but in case you do, 
I would like to add the following:


Hi Martin,

Yes (re RDFa), see: http://webr3.org/urinorm/2 - all the browsers do the 
normalization so you can't even get to the non-normalized URI.


in a browser you'll note that all the URIs get normalized automatically, 
in that it's impossible to programmatically access the correct casing. 
That's a problem.


if you run it through the RDFa distiller at w3.org [2] you'll find:

  htTp://WEBR3.org/urinorm/2 dc:creator http://WEBR3.org/nathan#me .

  http://WEBR3.org/urinorm/2#example dc:title URI Normalization 
Example 2 .


note one of the URIs (the one which required relative path resolution) 
has the scheme normalised.


if you run if through check.rdfa.info you'll find that all the URIs are 
normalized. [3]


if you run it through sigma [4] you'll find everything has been 
normalized. You can also see an RDF view of this [5]


if you run it through URI Burner [6], you'll find that /some/ URIs have 
been normalized. It's also worth noting that this caused all kinds of 
problems - I ended up having to create a new resource at this point w/ 
some RDF  N3 to test URI Burner:


  http://webr3.org/urinorm/3

which lead to the empty [7] then I figured I'd try [8] and if you click 
the creator ( htTp://WEBR3.org/nathan#me ) since in this case there's no 
normalization (not it was normalized in [6]) you get a 400 Bad Request [9].


and so on and so forth - far from ideal.

Best,

Nathan

[1] http://www.rdfabout.com/demo/validator/ (normalizes all RDF URIs)
[2] http://www.w3.org/2007/08/pyRdfa/
[3] http://check.rdfa.info/check?url=http://webr3.org/urinorm/2version=1.0
[4] http://sig.ma/search?q=http://webr3.org/urinorm/2
[5] http://sig.ma/entity/e6a2c8319bb3bf21f4b4639216f114a4.rdf#this
[6] 
http://linkeddata.uriburner.com/about/html/http/webr3.org/urinorm/2%01this

[7] http://linkeddata.uriburner.com/about/html/http/webr3.org/urinorm/3
[8] http://linkeddata.uriburner.com/about/html/htTp://WEBR3.org/urinorm/3
[9] http://linkeddata.uriburner.com/about/html/htTp/WEBR3.org/nathan%01me

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Dave Reynolds


Hi Nathan,

I largely agree but have a few quibbles :)

On 20/01/2011 2:29 PM, Nathan wrote:

Dave Reynolds wrote:

The URI spec (rfc3986[1]) does allow this usage. In particular Section 6
Normalization and Comparison says:

URI comparison is performed for some particular purpose. Protocols
or implementations that compare URIs for different purposes will
often be subject to differing design trade-offs in regards to how
much effort should be spent in reducing aliased identifiers. This
section describes various methods that may be used to compare URIs,
the trade-offs between them, and the types of applications that might
use them.

and

We use the terms different and
equivalent to describe the possible outcomes of such comparisons,
but there are many application-dependent versions of equivalence.

While RDF predates this spec it seems to me that the RDF usage remains
consistent with it. The purpose of comparison in RDF is different from
that of cache retrieval of web pages or message delivery of email.


Indeed, I also read though:

For all URIs, the hexadecimal digits within a percent-encoding
triplet (e.g., %3a versus %3A) are case-insensitive and therefore
should be normalized to use uppercase letters for the digits A-F.

When a URI uses components of the generic syntax, the component
syntax equivalence rules always apply; namely, that the scheme and
host are case-insensitive and therefore should be normalized to
lowercase...
- http://tools.ietf.org/html/rfc3986#section-6.2.2.1

And took the For all and always to literally mean for all and
always.


Those quotes come from section (6.2.2) describing normalization but the 
earlier quote is from the start of section 6 saying that choice of 
normalization is application dependent. I interpret the two together as 
*if* you are normalizing then always ...blah 


That was certainly the RIF position where we explicitly said that 
sections 6.2.2 and 6.2.3 of rfc3986 were not applicable.



against both the RDF Specification [1] and the URI specification when
they say /not/ to encode permitted US-ASCII characters (like ~ %7E)?


Where did that example come from?


The encoding consists of... %-escaping octets that do not correspond
to permitted US-ASCII characters.
- http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.
- http://tools.ietf.org/html/rfc3986#section-2.3

I read those quotes as saying do not encode permitted US-ASCII
characters in RDF URI References.


At what point have we suggested doing that?


As above


Sorry, I didn't mean to dispute that you shouldn't %-encode ~, I was 
wondering where the suggestion that you should do so came from.


I believe there are some corner cases, such as the handling of spaces, 
which differ between the RDF spec and the IRI spec. This was down to 
timing. The RDF Core WG was doing its best to anticipate what the IRI 
spec would look like but couldn't wait until that was finalized. 
Resolving any such small discrepancies between that anticipation and the 
actual IRI specs is something I believe to be in scope for the proposed 
new RDF WG.



So use normalized URIs in the first place.

...

RDF/OWL/RIF aren't designed the way they are because someone thought it
would be a good idea to allow such things to be used side by side or
because they *want* people to use denormalized URIs.

...

The point is that there is no single, simple, universal (i.e. across all
schemes) normalization algorithm that could be used.
The current approach gives stable, well-defined behaviour which doesn't
change as people invent new URI schemes. The RDF serializations give you
enough control to enable you to be certain about what URI you are
talking about. Job done.


Okay, I agree, and I'm really not looking to create a lot of work here,
the general gist of what I'm hoping for is along the lines of:

RDF Publishers MUST perform Case Normalization and Percent-Encoding
Normalization on all URIs prior to publishing. When using relative URIs
publishers SHOULD include a well defined base using a serialization
specific mechanism. Publishers are advised to perform additional
normalization steps as specified by URI (RFC 3986) where possible.

RDF Consumers MAY normalize URIs they encounter and SHOULD perform Case
Normalization and Percent-Encoding Normalization.

Two RDF URIs are equal if and only if they compare as equal, character
by character, as Unicode strings.


I sort of OK with that but ...

Terms like RDF Publisher and RDF Consumer need to be defined in 
order to make formal statements like these. The RDF/OWL/RIF specs are 
careful to define what sort of processors are subject to conformance 
statements and I don't think RDF

Standardizing linked data - was Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


Dave Reynolds wrote:

Okay, I agree, and I'm really not looking to create a lot of work here,
the general gist of what I'm hoping for is along the lines of:

RDF Publishers MUST perform Case Normalization and Percent-Encoding
Normalization on all URIs prior to publishing. When using relative URIs
publishers SHOULD include a well defined base using a serialization
specific mechanism. Publishers are advised to perform additional
normalization steps as specified by URI (RFC 3986) where possible.

RDF Consumers MAY normalize URIs they encounter and SHOULD perform Case
Normalization and Percent-Encoding Normalization.

Two RDF URIs are equal if and only if they compare as equal, character
by character, as Unicode strings.


I sort of OK with that but ...

Terms like RDF Publisher and RDF Consumer need to be defined in 
order to make formal statements like these. The RDF/OWL/RIF specs are 
careful to define what sort of processors are subject to conformance 
statements and I don't think RDF Publisher is a conformance point for 
the existing specs.


This may sound like nit-picking that's life with specifications. You 
need to be clear how the last para about RDF URIs relates to notions 
like RDF Consumer.


I wonder whether you might want to instead define notions of Linked Data 
Publisher and Linked Data Consumer to which these MUST/MAY/SHOULD 
conformance statements apply. That way it is clear that a component such 
as an RDF store or RDF parser is correct in following the existing RDF 
specs and not doing any of these transformations but that in order to 
construct a Linked Data Consumer/Publisher some other component can be 
introduced to perform the normalizations. Linked Data as a set of 
constraints and conventions layered on top of the RDF/OWL specs.


Fully agree, had the same conversation with DanC this afternoon and he 
too immediately suggested changing RDF Publisher/Consumer to Linked Data 
Publisher/Consumer. Also ties in with earlier comments about 
standardizing Linked Data, however it's done, or worded, my only care 
here is that it positively impacts the current situation, and doesn't 
negatively impact anybody else.


The specific point on the normalization ladder would have to defined, of 
course, and you would need to define how to handle schemes unknown to 
the consumer.


All this presupposes some work to formalize and specify linked data. Is 
there anything like that planned?  In some ways Linked Data is an 
engineering experiment and benefits from that freedom to experiment. On 
the other hand interoperability eventually needs clear specifications.


Unsure, but I'll also ask the question, is there anything planned? I'd 
certainly +1 standardization and do anything I could to help the process 
along.



For many reasons it would be good to solve this at the publishing phase,
allow normalization at the consuming phase (can't be precluded as
intermediary components may normalize), and keep simple case sensitive
string comparison throughout the stack and specs (so implementations
remain simple and fast.)


Agreed.


cool, thanks again Dave,

Nathan

Re: Standardizing linked data - was Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Nathan


Nathan wrote:

Dave Reynolds wrote:
All this presupposes some work to formalize and specify linked data. 
Is there anything like that planned?  In some ways Linked Data is an 
engineering experiment and benefits from that freedom to experiment. 
On the other hand interoperability eventually needs clear specifications.


Unsure, but I'll also ask the question, is there anything planned? I'd 
certainly +1 standardization and do anything I could to help the process 
along.


or perhaps an IG/XG follow up to the SWEO, taking in to account Read 
Write Web of Data, hopefully with a some protocol or best practice 
report giving a migration path to standardization?


There are certainly plenty of other groups to take in to account and 
consider in all of this, like the WebID XG.


Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Alan Ruttenberg

On Thu, Jan 20, 2011 at 5:15 AM, Nathan nat...@webr3.org wrote:

 As far as I can see, that's only for a URI reference used within a
 namespace, and does not govern usage or normalization when you join the URI
 reference up with the local name to make the full URI.

 Out of interest, where is that process defined? I was looking for it the
 other day - for instance in the quoted specification we have the example:

 edi:price xmlns:edi='http://ecommerce.example.org/schema'
 units='Euro'32.18/edi:price

 Where's the bit of the XML specification which says you join them up by
 concatenating 'http://ecommerce.example.org/schema' with #(?assumed?) and
 'Euro' to get 'http://ecommerce.example.org/schema#Euro'?


My understanding is that this is governed by the definition of qnames. As I
understand things, the concatenation you write would happen only if the
attribute was defined in the schema to be an xsi:type
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#xsi_type,
and without the #. The only case where a # would be added is when rdf:id
or xml:id is used.

And finally, this is why I specifically asked if the non-normalization of
 RDF URI References had XML Namespace heritage, which had then filtered down
 through OWL, SPARQL and RIF.


I don't believe so. I believe the genesis are the reasons that I discussed
earlier - the difficulty of actually implementing it combined with the
indeterminacy. But I would be glad if someone else has better information
and can either confirm or deny this.

-Alan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread Harry Halpin

On Thu, Jan 20, 2011 at 11:15 AM, Nathan nat...@webr3.org wrote:
 Alan Ruttenberg wrote:

 On Wed, Jan 19, 2011 at 4:45 PM, Nathan nat...@webr3.org wrote:

 David Wood wrote:

 On Jan 19, 2011, at 10:59, Nathan wrote:

 ps: as an illustration of how engrained URI normalization is, I've
 capitalized the domain names in the to: and cc: fields, I do hope the
 mail
 still come through, and hope that you'll accept this email as being
 sent to
 you. Hopefully we'll also find this mail in the archives shortly at
 htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally
 I'd
 hope that any statements made using these URIs (asserted by man or
 machine)
 would remain valid regardless of the (incorrect?-)casing.

 Heh.  OK, I'll bite.  Domain names in email addressing are defined in
 IETF
 RFC 2822 (and its predecessor RFC 822), which defers the interpretation
 to
 RFC 1035 (Domain names - implementation and specification).  RFC 1035
 section 2.3.3 states that domain names in DNS, and therefore in (E)SMTP,
 are
 to be compared in a case-insensitive manner.

 As far as I know, the W3C specs do not so refer to RFC 1035.

 And I'll bite in the other direction, why not treat URIs as URIs? why go
 against both the RDF Specification [1] and the URI specification when
 they
 say /not/ to encode permitted US-ASCII characters (like ~ %7E)? why force
 case-sensitive matching on the scheme and domain on URIs matching the
 generic syntax when the specs say must be compared case insensitively?
 and
 so on and so forth.

 [AR]
 Which specs?

 The various URI/IRI specs and previous revisions of.

 http://www.w3.org/TR/REC-xml-names/#NSNameComparison

 URI references identifying namespaces

 ..

 In a namespace declaration, the URI reference is

 ..

 The URI references below are all different for the purposes of identifying
 namespaces

 ..

 The URI references below are also all different for the purposes of
 identifying namespaces

 ..

 So here is another spec that *explicitly* disagrees with the idea that URI
 normalization should be a built-in processing.

 As far as I can see, that's only for a URI reference used within a
 namespace, and does not govern usage or normalization when you join the URI
 reference up with the local name to make the full URI.

 Out of interest, where is that process defined? I was looking for it the
 other day - for instance in the quoted specification we have the example:

 edi:price xmlns:edi='http://ecommerce.example.org/schema'
 units='Euro'32.18/edi:price

 Where's the bit of the XML specification which says you join them up by
 concatenating 'http://ecommerce.example.org/schema' with #(?assumed?) and
 'Euro' to get 'http://ecommerce.example.org/schema#Euro'?


Actually you don't. A namespace is just that - a tuple (namespace,
localname) in XML. That's why namespaces in XML are far all intents
and purposes broken and why, to a large extent, Web browser developers
in HTML stopped using them and hate implementing them in the DOM, and
so refuse to have them in HTML5. And that's one reason RDF(A) will
probably continue getting a sort of bad rap in the HTML world, as
prefixes are not associated with just making URIs, but with this
terrible namespace tuple.

For an archeology of the relevant standards, check out Section What
Namespaces Do of this paper. While the paper is focussed on why
namespace documents are a mess, the relevant information is in that
section and extensively referenced, with examples:

http://xml.coverpages.org/HHalpinXMLVS-Extreme.html

 And finally, this is why I specifically asked if the non-normalization of
 RDF URI References had XML Namespace heritage, which had then filtered down
 through OWL, SPARQL and RIF.

Indeed, they should be normalized in a sane manner across all Semantic
Web specs, and dependencies on XML Namespaces should obviously be
dropped IMHO.


 [AR] More to document, please: Which data is being junked and scrapped?

 will document, but essentially every statement made using a non normalized
 URI when other statements are also being made about the same resource
 using normalized URIs - the two most common cases for this will be when
 people are using CMS systems and enter their domain name as uppercase in
 some admin, only to have that filter through to URIs in serialized RDF/RDFa,
 and where bugs in software have led to inconsistent URIs over time (for
 instance where % encoding has been fixed, or a :80 has been removed from a
 URI).

 [AR] Hmm. Are you suggesting that the behavior of libraries and clients
 should have precedence over specification? My view is that one first looks
 to specifications, and then only if specifications are poor or do not
 speak
 to the issue do we look at existing behavior.

Which is the case with namespaces and URI normalization :)


 Yes I am, that specification should standardize the behaviour of libraries
 and clients - the level of normalization in URIs published, consumed or used
 by these tools is often

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Dave Reynolds


On 19/01/2011 3:55 AM, Alan Ruttenberg wrote:


The information on how to fully determine equivalence according to the
URI spec is distributed across a wide and growing number of different
specifications (because it is schema dependent) and could, in
principle, change over time. Because of the distributed nature of the
information it is not feasible to fully implement these rules.
Optionally implementing these rules (each implementor choosing where
on the ladder they want to be) would mean that documents written in
RDF (and derivative languages) would be interpreted differently by
different implementations, which is an unacceptable feature of
languages designed for unambiguous communication. The fact that the
set of rules is growing and possibly changing would lead to a similar
situation - documents that meant one thing at one time could mean
different things later, which is also unacceptable, for the same
reason.


Well put, I meant to point out the implications of scheme-dependence and 
you've covered it very clearly.



David (Wood) clarifies (surprisingly to me as well) that the issue of
normalization could be addressed by the working group. I expect,
however, that any proposed change would quickly be determined to be
counter to the instructions given in the charter on Compatibility and
Deployment Expectation, and if not, would be rejected after justified
objections on this basis from reviewers outside the working group.


+1

Dave

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Nathan


Dave Reynolds wrote:

On 19/01/2011 3:55 AM, Alan Ruttenberg wrote:


The information on how to fully determine equivalence according to the
URI spec is distributed across a wide and growing number of different
specifications (because it is schema dependent) and could, in
principle, change over time. Because of the distributed nature of the
information it is not feasible to fully implement these rules.
Optionally implementing these rules (each implementor choosing where
on the ladder they want to be) would mean that documents written in
RDF (and derivative languages) would be interpreted differently by
different implementations, which is an unacceptable feature of
languages designed for unambiguous communication. The fact that the
set of rules is growing and possibly changing would lead to a similar
situation - documents that meant one thing at one time could mean
different things later, which is also unacceptable, for the same
reason.


Well put, I meant to point out the implications of scheme-dependence and 
you've covered it very clearly.


Whilst I share the same end goal, I have to stress that *several 
important factors have been omitted*.


The semantic web specifications are not the only ones which affect 
interoperability and compatibility with regard to URIs. Many (most) RDF 
serializations include the use of relative URIs, are affected by base 
mechanisms which are defined by the URIs RFC, dependent on the protocol, 
and by base mechanisms provided by host serialization languages, and 
each of the respective implementations thereof. This covers everything 
from implementations of the http protocol on clients, servers and 
intermediaries, through to implementations of the DOM in XML tooling, 
HTML tooling and the major browsers. It also covers every potential 
component which provides URI support, from open source libraries and 
classes through embedded support in black box applications.


Every single one of the aforementioned are free to (silently) implement 
any of the URI normalization techniques in the URI/IRI RFCs. Each 
implementer of these specifications chooses where on the ladder they 
want to be, and that decision affects  often determines the URIs seen 
by implementations of the semantic web specifications.


These factors cannot be ignored, and they are the factors which the RDF 
specification and semantic web specifications must strive to be 
compatible with, and to normalize the actions of.


Every additional step on the ladder added as a requirement to the RDF 
specification is a step closer to interoperability and compatibility.



David (Wood) clarifies (surprisingly to me as well) that the issue of
normalization could be addressed by the working group. I expect,
however, that any proposed change would quickly be determined to be
counter to the instructions given in the charter on Compatibility and
Deployment Expectation, and if not, would be rejected after justified
objections on this basis from reviewers outside the working group.


+1


As per the above, I'd expect the polar opposite.

+1 to compatibility (with the real, deployed, web - the one we all use)

Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Alan Ruttenberg

Nathan,

If you are going to make claims about the effect of other
specifications on RDF, could you please include pointers to the parts
of specifications that you are referring to, ideally with illustrative
examples of the problems you are? Absent that it is too difficult to
evaluate your claims.

The conversations on such topics too often devolve into serial opinion
dumping. If this is to be at all productive we need to be as precise
as possible.

-Alan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Nathan


Hi Alan,

Alan Ruttenberg wrote:

Nathan,

If you are going to make claims about the effect of other
specifications on RDF, could you please include pointers to the parts
of specifications that you are referring to, ideally with illustrative
examples of the problems you are? Absent that it is too difficult to
evaluate your claims.

The conversations on such topics too often devolve into serial opinion
dumping. If this is to be at all productive we need to be as precise
as possible.


Good idea :)

I'll create a new page on the wiki and add some examples over the next 
few days, then reply with a pointer later in the week.


ps: as an illustration of how engrained URI normalization is, I've 
capitalized the domain names in the to: and cc: fields, I do hope the 
mail still come through, and hope that you'll accept this email as being 
sent to you. Hopefully we'll also find this mail in the archives shortly 
at htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally 
I'd hope that any statements made using these URIs (asserted by man or 
machine) would remain valid regardless of the (incorrect?-)casing.


Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Kingsley Idehen


On 1/19/11 10:59 AM, Nathan wrote:
htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally 
I'd hope that any statements made using these URIs (asserted by man or 
machine) would remain valid regardless of the (incorrect?-)casing. 
Okay for Data Source Address Ref. (URL), no good for Entity (Data Item 
or Data Object) Name Ref., bar system specific handling via IFP property 
or owl:sameAs :-)



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Yrjana Rankka


On 1/19/11 16:59 , Nathan wrote:

Hi Alan,

Alan Ruttenberg wrote:

Nathan,

If you are going to make claims about the effect of other
specifications on RDF, could you please include pointers to the parts
of specifications that you are referring to, ideally with illustrative
examples of the problems you are? Absent that it is too difficult to
evaluate your claims.

The conversations on such topics too often devolve into serial opinion
dumping. If this is to be at all productive we need to be as precise
as possible.


Good idea :)

I'll create a new page on the wiki and add some examples over the next 
few days, then reply with a pointer later in the week.



+1!

ps: as an illustration of how engrained URI normalization is, I've 
capitalized the domain names in the to: and cc: fields, I do hope the 
mail still come through, and hope that you'll accept this email as 
being sent to you. Hopefully we'll also find this mail in the archives 
shortly at htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - 
Personally I'd hope that any statements made using these URIs 
(asserted by man or machine) would remain valid regardless of the 
(incorrect?-)casing.


Best,

Nathan


Yrjänä

--
Mr. Yrjana Rankka| gh...@openlinksw.com
Developer, Virtuoso Team | http://www.openlinksw.com
 | Making Technology Work For You

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread William Waites

* [2011-01-19 11:11:20 -0500] Kingsley Idehen kide...@openlinksw.com écrit:

] On 1/19/11 10:59 AM, Nathan wrote:
] htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally 
] I'd hope that any statements made using these URIs (asserted by man or 
] machine) would remain valid regardless of the (incorrect?-)casing. 
]
] Okay for Data Source Address Ref. (URL), no good for Entity (Data Item 
] or Data Object) Name Ref., bar system specific handling via IFP property 
] or owl:sameAs :-)

FWIW I've just added a FuXi builtin for the curate tool [1]
that does URI comparisons using ll.uri [2] (deliberately
pushing the choice of place on the ladder into a library).
It is used like this:

@prefix curate: http://eris.okfn.org/ww/2010/12/curate#.

{ ?s1 ?p1 ?o1 .
  ?s2 ?p2 ?o2 .
  ?s1 curate:cmpURI ?s2 } =
{ ?s1 = ?s2 }.

And results in statements like this:

HTTP://example.org:80/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

http://EXAMPLE.ORG/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

http://example.org/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

Cheers,
-w

[1] https://bitbucket.org/okfn/curate/src/1f6ba3c360c3/curate/builtins.py#cl-9
[2] http://www.livinglogic.de/Python/url/Howto.html
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread David Wood

On Jan 19, 2011, at 10:59, Nathan wrote:

 Hi Alan,
 
 Alan Ruttenberg wrote:
 Nathan,
 If you are going to make claims about the effect of other
 specifications on RDF, could you please include pointers to the parts
 of specifications that you are referring to, ideally with illustrative
 examples of the problems you are? Absent that it is too difficult to
 evaluate your claims.
 The conversations on such topics too often devolve into serial opinion
 dumping. If this is to be at all productive we need to be as precise
 as possible.
 
 Good idea :)
 
 I'll create a new page on the wiki and add some examples over the next few 
 days, then reply with a pointer later in the week.
 
 ps: as an illustration of how engrained URI normalization is, I've 
 capitalized the domain names in the to: and cc: fields, I do hope the mail 
 still come through, and hope that you'll accept this email as being sent to 
 you. Hopefully we'll also find this mail in the archives shortly at 
 htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd hope 
 that any statements made using these URIs (asserted by man or machine) would 
 remain valid regardless of the (incorrect?-)casing.


Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF RFC 
2822 (and its predecessor RFC 822), which defers the interpretation to RFC 1035 
(Domain names - implementation and specification).  RFC 1035 section 2.3.3 
states that domain names in DNS, and therefore in (E)SMTP, are to be compared 
in a case-insensitive manner.

As far as I know, the W3C specs do not so refer to RFC 1035.

:)

Regards,
Dave



 
 Best,
 
 Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Nathan


David Wood wrote:

On Jan 19, 2011, at 10:59, Nathan wrote:

ps: as an illustration of how engrained URI normalization is, I've capitalized 
the domain names in the to: and cc: fields, I do hope the mail still come 
through, and hope that you'll accept this email as being sent to you. Hopefully 
we'll also find this mail in the archives shortly at 
htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd hope 
that any statements made using these URIs (asserted by man or machine) would 
remain valid regardless of the (incorrect?-)casing.


Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF RFC 2822 
(and its predecessor RFC 822), which defers the interpretation to RFC 1035 
(Domain names - implementation and specification).  RFC 1035 section 2.3.3 
states that domain names in DNS, and therefore in (E)SMTP, are to be compared in a 
case-insensitive manner.

As far as I know, the W3C specs do not so refer to RFC 1035.


And I'll bite in the other direction, why not treat URIs as URIs? why go 
against both the RDF Specification [1] and the URI specification when 
they say /not/ to encode permitted US-ASCII characters (like ~ %7E)? why 
force case-sensitive matching on the scheme and domain on URIs matching 
the generic syntax when the specs say must be compared case 
insensitively? and so on and so forth.


I have to be honest, I can't see what good this is doing anybody, in 
fact it's the complete opposite scenario, where data is being junked and 
scrapped because we are ignoring the specifications which are designed 
to enable interoperability and limit unexpected behaviour.


I'm currently preparing a list of errors I'm finding in RDF, RDFa and 
linked data tooling to do with this, and I have to admit even I'm 
surprised at the sheer number of tools which are affected.


Additionally there's a very nasty, and common, use case which I can't 
test fully, so would appreciate people taking the time to check their 
own libraries/clients, as follows:


If you find some data with the following setup (example):

  @base htTp://EXAMPLE.org/foo/bar .
  #t x:rel ../baz .

and then you follow your nose to htTp://EXAMPLE.org/baz, will you 
find any triples about it? (problem 1) and if there's no base on the 
second resource, and it uses relative URIs, then the base you'll be 
using is htTp://EXAMPLE.org/baz, and thus, you'll effectively create a 
new set of statements which the author never wrote, or intended (problem 2).


In other words, in this scenario, no matter what you do you're either 
going to get no data (even though it's there) or get a set of statements 
which were never said by the author (because the casing is different).


Further, essentially all RDFa ever encountered by a browser has the 
casing on all URIs in href and src, and all these which are resolved, 
automatically normalized - so even if you set the base to 
htTp://EXAMPLE.org/ or use it in a URI, browser tools, extensions, and 
js based libraries will only ever see the normalized URIs (and thus be 
incompatible with the rest of the RDF world).


I'll continue on getting the specific examples for current RDF tooling 
and resources and get it on the wiki, but I'll say now that almost every 
tool I've encountered so far does it wrong in inconsistent 
non-compatible ways.


Finally, I'll ask again, if anybody has any use case which benefits from 
htTp://EXAMPLE.org/%7efoo and http://example.org/~foo being classed 
as different RDF URIs, I'd love to hear it.


[1] The encoding consists of: ... 2. %-escaping octets that do not 
correspond to permitted US-ASCII characters.

 - http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Alan Ruttenberg

On Wed, Jan 19, 2011 at 11:11 AM, Kingsley Idehen kide...@openlinksw.comwrote:

  On 1/19/11 10:59 AM, Nathan wrote:

 htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd
 hope that any statements made using these URIs (asserted by man or machine)
 would remain valid regardless of the (incorrect?-)casing.

 Okay for Data Source Address Ref. (URL), no good for Entity (Data Item or
 Data Object) Name Ref., bar system specific handling via IFP property or
 owl:sameAs :-)


Kingsley, same for you as Nathan. To what specification do you refer to for
the definitions and behavior of:
 - Data source address ref
 - Entity
 - Statement.

-Alan




 --

 Regards,

 Kingsley Idehen   
 President  CEO
 OpenLink Software
 Web: http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread Alan Ruttenberg

[for some reason my client isn't quoting previous mail properly, so my
comments are prefixed with [AR]]

On Wed, Jan 19, 2011 at 4:45 PM, Nathan nat...@webr3.org wrote:

 David Wood wrote:

 On Jan 19, 2011, at 10:59, Nathan wrote:

 ps: as an illustration of how engrained URI normalization is, I've
 capitalized the domain names in the to: and cc: fields, I do hope the mail
 still come through, and hope that you'll accept this email as being sent to
 you. Hopefully we'll also find this mail in the archives shortly at
 htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd
 hope that any statements made using these URIs (asserted by man or machine)
 would remain valid regardless of the (incorrect?-)casing.


 Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF
 RFC 2822 (and its predecessor RFC 822), which defers the interpretation to
 RFC 1035 (Domain names - implementation and specification).  RFC 1035
 section 2.3.3 states that domain names in DNS, and therefore in (E)SMTP, are
 to be compared in a case-insensitive manner.

 As far as I know, the W3C specs do not so refer to RFC 1035.


 And I'll bite in the other direction, why not treat URIs as URIs? why go
 against both the RDF Specification [1] and the URI specification when they
 say /not/ to encode permitted US-ASCII characters (like ~ %7E)? why force
 case-sensitive matching on the scheme and domain on URIs matching the
 generic syntax when the specs say must be compared case insensitively? and
 so on and so forth.


[AR]
Which specs? (or is it singular spec) I just had a look at the XML
namespace spec, for instance, which partially governs the RDF/XML
serialization specification.
http://www.w3.org/TR/REC-xml-names/#NSNameComparison

URI references identifying namespaces are compared when determining whether
a name belongs to a given namespace, and whether two names belong to the
same namespace. [Definition: The two URIs are treated as strings, and they
are *identical* if and only if the strings are identical, that is, if they
are the same sequence of characters. ] The comparison is case-sensitive, and
no %-escaping is done or undone.

A consequence of this is that URI references which are not identical in this
sense may resolve to the same resource. Examples include URI references
which differ only in case or %-escaping, or which are in external entities
which have different base URIs (but note that relative URIs are deprecated
as namespace names).

In a namespace declaration, the URI reference is the normalized
valuehttp://www.w3.org/TR/REC-xml/#AVNormalize of
the attribute, so replacement of XML character and entity references has
already been done before any comparison.

Examples:

The URI references below are all different for the purposes of identifying
namespaces, since they differ in case:

http://www.example.org/wine

http://www.Example.org/wine

http://www.example.org/Wine

The URI references below are also all different for the purposes of
identifying namespaces:

http://www.example.org/~wilbur

http://www.example.org/%7ewilbur

http://www.example.org/%7Ewilbur;
So here is another spec that *explicitly* disagrees with the idea that URI
normalization should be a built-in processing.


 I have to be honest, I can't see what good this is doing anybody, in fact
 it's the complete opposite scenario, where data is being junked and scrapped
 because we are ignoring the specifications which are designed to enable
 interoperability and limit unexpected behaviour.


[AR] More to document, please: Which data is being junked and scrapped?

 I'm currently preparing a list of errors I'm finding in RDF, RDFa and
linked data tooling to do with this, and I have to admit even I'm surprised
at the sheer number of tools which are affected.

 Additionally there's a very nasty, and common, use case which I can't test
fully, so would appreciate people taking the time to check their own
libraries/clients, as follows:

[AR] Hmm. Are you suggesting that the behavior of libraries and clients
should have precedence over specification? My view is that one first looks
to specifications, and then only if specifications are poor or do not speak
to the issue do we look at existing behavior.

 If you find some data with the following setup (example):

 @base htTp://EXAMPLE.org/foo/bar .
 #t x:rel ../baz .

and then you follow your nose to htTp://EXAMPLE.org/baz, will you find
any triples about it? (problem 1) and if there's no base on the second
resource, and it uses relative URIs, then the base you'll be using is 
htTp://EXAMPLE.org/baz, and thus, you'll effectively create a new set of
statements which the author never wrote, or intended (problem 2).

In other words, in this scenario, no matter what you do you're either going
to get no data (even though it's there) or get a set of statements which
were never said by the author (because the casing is different).

[AR] I think there are many ways to lose in this scenario. For

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-18 Thread Dave Reynolds

On Mon, 2011-01-17 at 18:16 +, Nathan wrote: 
 Dave Reynolds wrote:
  On Mon, 2011-01-17 at 16:52 +, Nathan wrote: 
  I'd suggest that it's a little more complex than that, and that this may 
  be an issue to clear up in the next RDF WG (it's on the charter I believe).
  
  I beg to differ.
  
  The charter does state: 
  
  Clarify the usage of IRI references for RDF resources, e.g., per SPARQL
  Query §1.2.4.
  
  However, I was under the impression that was simply removing the small
  difference between RDF URI References and the IRI spec (that they had
  anticipated). Specifically I thought the only substantive issue there
  was the treatment of space and many RDF processors already take the
  conservation position on that anyway.
 
 Likewise, apologies as I should have picked my choice of words more 
 appropriately, I intended to say that the usage of IRI references was up 
 for clarification, and if normalization were deemed an issue then the 
 RDF WG may be the place to raise such an issue, and address if needed.

OK, that makes sense.

 As for RIF and GRDDL, can anybody point me to the reasons why 
 normalization are not performed, does this have xmlns heritage?

Not as far as I know. At least in RIF we were just trying to be
compatible with the RDF specs which (cwm not withstanding) do not
specify normalization other than the IRI-compatible character encoding. 

Dave

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-18 Thread David Wood

On Jan 17, 2011, at 13:16, Nathan wrote:

 Dave Reynolds wrote:
 On Mon, 2011-01-17 at 16:52 +, Nathan wrote: 
 I'd suggest that it's a little more complex than that, and that this may be 
 an issue to clear up in the next RDF WG (it's on the charter I believe).
 I beg to differ.
 The charter does state: Clarify the usage of IRI references for RDF 
 resources, e.g., per SPARQL
 Query §1.2.4.
 However, I was under the impression that was simply removing the small
 difference between RDF URI References and the IRI spec (that they had
 anticipated). Specifically I thought the only substantive issue there
 was the treatment of space and many RDF processors already take the
 conservation position on that anyway.
 
 Likewise, apologies as I should have picked my choice of words more 
 appropriately, I intended to say that the usage of IRI references was up for 
 clarification, and if normalization were deemed an issue then the RDF WG may 
 be the place to raise such an issue, and address if needed.


I agree with that.  The treatment of spaces is an example in the charter, not a 
constraint.  Clarification may also occur in the updated RDF Primer if the 
community deems it necessary.

Regards,
Dave


 
 As for RIF and GRDDL, can anybody point me to the reasons why normalization 
 are not performed, does this have xmlns heritage?
 
 Best,
 
 Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-18 Thread Peter DeVries

Hi Martin,

I have URI's where case is important only at the terminal identifier. (HTML
URI's in this example)

http://lod.taxonconcept.org/ses/v6n7p.html

http://lod.taxonconcept.org/ses/v6n7p.htmlshould be different than

http://lod.taxonconcept.org/ses/v6N7p.html

Am I correct in thinking that this is OK?

I went with this structure so I could have short bit.ly like identifiers for
potentially millions of species.

Thanks,

- Pete

On Mon, Jan 17, 2011 at 9:51 AM, Martin Hepp 
martin.h...@ebusiness-unibw.org wrote:

 Dear all:

 RFC 2616 [1, section 3.2.3] says that

 When comparing two URIs to decide if they match or not, a client  SHOULD
 use a case-sensitive octet-by-octet comparison of the entire
   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html
 

 Does this also hold for identifying RDF resources

 a) in theory and
 b) in practice (e.g. in popular triplestores)?

 I did not test it yet, but I assume that not all implementations would
 treat

   http://purl.org/NET/c4dm/event.owl#Event
   HTTP://purl.org/NET/c4dm/event.owl#Event
   http://PURL.org/NET/c4dm/event.owl#Event
   http://purl.org:80/NET/c4dm/event.owl#Event

 as the same class.

 Any facts or opinions?

 Best

 Martin


 [1] http://www.ietf.org/rfc/rfc2616.txt

 
 martin hepp
 e-business  web science research group
 universitaet der bundeswehr muenchen

 e-mail:  h...@ebusiness-unibw.org
 phone:   +49-(0)89-6004-4217
 fax: +49-(0)89-6004-4620
 www: http://www.unibw.de/ebusiness/ (group)
 http://www.heppnetz.de/ (personal)
 skype:   mfhepp
 twitter: mfhepp





-- 
---
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies
Knowledge Base http://lod.geospecies.org/
About the GeoSpecies Knowledge Base http://about.geospecies.org/

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-18 Thread Alan Ruttenberg

On Tue, Jan 18, 2011 at 3:47 AM, Dave Reynolds
dave.e.reyno...@gmail.com wrote:
 As for RIF and GRDDL, can anybody point me to the reasons why
 normalization are not performed, does this have xmlns heritage?

 Not as far as I know. At least in RIF we were just trying to b
 compatible with the RDF specs which (cwm not withstanding) do not
 specify normalization other than the IRI-compatible character encoding.


Similarly OWL. OWL says, following the sense of the anticipation of
the IRI spec: Two IRIs are structurally equivalent if and only if
their string representations are identical.

As far as I can tell, you (Dave) are the only person in this
conversation who cites the specification relevant to answering the
question posed.  That specification makes clear, as you have cited,
exactly how RDF interpreters are to compare URI references.

The information on how to fully determine equivalence according to the
URI spec is distributed across a wide and growing number of different
specifications (because it is schema dependent) and could, in
principle, change over time. Because of the distributed nature of the
information it is not feasible to fully implement these rules.
Optionally implementing these rules (each implementor choosing where
on the ladder they want to be) would mean that documents written in
RDF (and derivative languages) would be interpreted differently by
different implementations, which is an unacceptable feature of
languages designed for unambiguous communication. The fact that the
set of rules is growing and possibly changing would lead to a similar
situation - documents that meant one thing at one time could mean
different things later, which is also unacceptable, for the same
reason.

David (Wood) clarifies (surprisingly to me as well) that the issue of
normalization could be addressed by the working group. I expect,
however, that any proposed change would quickly be determined to be
counter to the instructions given in the charter on Compatibility and
Deployment Expectation, and if not, would be rejected after justified
objections on this basis from reviewers outside the working group.

-Alan

URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Martin Hepp


Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client   
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources

a) in theory and
b) in practice (e.g. in popular triplestores)?

I did not test it yet, but I assume that not all implementations would  
treat


   http://purl.org/NET/c4dm/event.owl#Event
   HTTP://purl.org/NET/c4dm/event.owl#Event
   http://PURL.org/NET/c4dm/event.owl#Event
   http://purl.org:80/NET/c4dm/event.owl#Event

as the same class.

Any facts or opinions?

Best

Martin


[1] http://www.ietf.org/rfc/rfc2616.txt


martin hepp
e-business  web science research group
universitaet der bundeswehr muenchen

e-mail:  h...@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
 http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Kingsley Idehen


On 1/17/11 10:51 AM, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client  
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources


Yes, where an RDF resource is a Data Container at an Address (URL). 
Thus, equivalent results for de-referencing a URL en route to accessing 
data.


No, when resource also implies an Entity (Data Item or Data Object) 
that is assigned a Name via URI.


The examples above strike me as URLs. Of course,  cURL could indicate 
otherwise, but for now (via my visual senses) they appear to be URLs 
(resource addresses).




a) in theory and
b) in practice (e.g. in popular triplestores)?

I did not test it yet, but I assume that not all implementations would 
treat


   http://purl.org/NET/c4dm/event.owl#Event
   HTTP://purl.org/NET/c4dm/event.owl#Event
   http://PURL.org/NET/c4dm/event.owl#Event
   http://purl.org:80/NET/c4dm/event.owl#Event

as the same class.

Any facts or opinions?


See my comments above.

Kingsley



Best

Martin


[1] http://www.ietf.org/rfc/rfc2616.txt


martin hepp
e-business  web science research group
universitaet der bundeswehr muenchen

e-mail:  h...@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
 http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp






--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Dave Reynolds

On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 
 Dear all:
 
 RFC 2616 [1, section 3.2.3] says that
 
 When comparing two URIs to decide if they match or not, a client   
 SHOULD use a case-sensitive octet-by-octet comparison of the entire
 URIs, with these exceptions:
 
- A port that is empty or not given is equivalent to the default
  port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of /.
 
 Characters other than those in the reserved and unsafe sets (see
 RFC 2396 [42]) are equivalent to their % HEX HEX encoding.
 
 For example, the following three URIs are equivalent:
 
http://abc.com:80/~smith/home.html
http://ABC.com/%7Esmith/home.html
http://ABC.com:/%7esmith/home.html
 
 
 Does this also hold for identifying RDF resources
 
 a) in theory and

No. RDF Concepts defines equality of RDF URI References [1] as simply
character-by-character equality of the %-encoded UTF-8 Unicode strings.

Note the final Note in that section:


Note: Because of the risk of confusion between RDF URI references that
would be equivalent if derefenced, the use of %-escaped characters in
RDF URI references is strongly discouraged. 


which explicitly calls out the difference between URI equivalence
(dereference to the same resource) and RDF URI Reference equality.

BTW the more up to date RFC for looking at equivalence (as opposed to
equality) issues is probably the IRI spec [2] which defines a comparison
ladder for testing equivalence.

Dave

[1]
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref

[2] http://www.ietf.org/rfc/rfc3987.txt

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Dave Reynolds wrote:
On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client   
SHOULD use a case-sensitive octet-by-octet comparison of the entire

URIs, with these exceptions:

   - A port that is empty or not given is equivalent to the default
 port for that URI-reference;
   - Comparisons of host names MUST be case-insensitive;
   - Comparisons of scheme names MUST be case-insensitive;
   - An empty abs_path is equivalent to an abs_path of /.

Characters other than those in the reserved and unsafe sets (see
RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

For example, the following three URIs are equivalent:

   http://abc.com:80/~smith/home.html
   http://ABC.com/%7Esmith/home.html
   http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources

a) in theory and


No. RDF Concepts defines equality of RDF URI References [1] as simply
character-by-character equality of the %-encoded UTF-8 Unicode strings.

Note the final Note in that section:


Note: Because of the risk of confusion between RDF URI references that
would be equivalent if derefenced, the use of %-escaped characters in
RDF URI references is strongly discouraged. 



which explicitly calls out the difference between URI equivalence
(dereference to the same resource) and RDF URI Reference equality.


I'd suggest that it's a little more complex than that, and that this may 
be an issue to clear up in the next RDF WG (it's on the charter I believe).


For example:

   When a URI uses components of the generic syntax, the component
   syntax equivalence rules always apply; namely, that the scheme and
   host are case-insensitive and therefore should be normalized to
   lowercase.  For example, the URI HTTP://www.EXAMPLE.com/ is
   equivalent to http://www.example.com/.

- http://tools.ietf.org/html/rfc3986#section-6.2.2.1

However, that's only for URIs which use the generic syntax (which most 
URIs we ever touch do use).


It would be great if a normalized-IRI with specific normalization rules 
could be drafted up as part of the next WG charter - after all they are 
a pretty pivotal part of the sem web setup, and it would be relatively 
easy to clear up these issues.


Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Kingsley Idehen


On 1/17/11 11:37 AM, Dave Reynolds wrote:

On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
 URIs, with these exceptions:

- A port that is empty or not given is equivalent to the default
  port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of /.

 Characters other than those in the reserved and unsafe sets (see
 RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

 For example, the following three URIs are equivalent:

http://abc.com:80/~smith/home.html
http://ABC.com/%7Esmith/home.html
http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources

a) in theory and

No. RDF Concepts defines equality of RDF URI References [1] as simply
character-by-character equality of the %-encoded UTF-8 Unicode strings.

Note the final Note in that section:


Note: Because of the risk of confusion between RDF URI references that
would be equivalent if derefenced, the use of %-escaped characters in
RDF URI references is strongly discouraged.


which explicitly calls out the difference between URI equivalence
(dereference to the same resource) and RDF URI Reference equality.

BTW the more up to date RFC for looking at equivalence (as opposed to
equality) issues is probably the IRI spec [2] which defines a comparison
ladder for testing equivalence.

Dave

[1]
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref

[2] http://www.ietf.org/rfc/rfc3987.txt




Dave,

Important RFC excerpt:

A mapping from IRIs to URIs is defined, which means that IRIs can be 
used instead of URIs, where appropriate, to identify resources. .


The context for resources is not equivalent or identical to the notion 
of an Identifier used as a Data Object (Item or Entity) Name. This 
context is all about good old machine addressable resources.


In Linked Data context (aka. Distributed Data Object context) an 
Identifier usex as a Name Reference can de-reference to a resource that 
bears (or carries) a Representation of its Description ( a graph 
pictorial where Attribute=Value pairs coalesce around a Name Reference).


Names are Names, if they are Unique, they should be Unique. Of course, 
not so when dealing with Addresses of data, which is what the RFC 
context applies to as I understand it.


Until we clarify Resource confusion will reign.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Kingsley Idehen wrote:

On 1/17/11 10:51 AM, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client  
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources


Yes, where an RDF resource is a Data Container at an Address (URL). 
Thus, equivalent results for de-referencing a URL en route to accessing 
data.


No, when resource also implies an Entity (Data Item or Data Object) 
that is assigned a Name via URI.


Logically, yes on both counts, we should/could be normalizing these URIs 
as we consume and publish using the syntax based normalization rules [1] 
which apply to all URI/IRIs with the generic syntax (such as the 
examples above)


Any client consuming data, or server publishing data, can use the 
normalization rules, so it stands to reason that it's pretty important 
that we all do it to avoid false negatives.


[1] http://tools.ietf.org/html/rfc3986#section-6.2.2

Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Renaud Delbru


Hi,

I am particularly interested about this issue, because I am currently 
struggling with such a problem within the Sindice project.
Given also the answer of Dave, what would be the best practices within a 
(RDF) system to correctly handle URIs ?


Should the system implements URI normalisation based on the RFC 2616 
exceptions:


  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

and should take care of decoding all percent-encoded characters ?

However, when dealing with percent-encoded character, some cases become 
tricky to handle. For example, some URIs [1] have a space encoded at the 
end of the string. By decoding it, certain systems/applications could 
automatically trim it. Also, some URIs [2] are 'recursively' encoded, 
and need multiple decoding pass before getting the right one.


[1] http://geo.linkeddata.es/resource/Pozo/Moro%2C%20Pou%2047%20o%20del%20
[2] http://sioc-project.org/sioc/user/1%2523user

Any opinions on how to correctly handle URis is welcome. It will be 
useful to have a document for best practices for correctly handling 
URIs in a RDF system.


Best,
--
Renaud Delbru

On 17/01/11 15:51, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client  
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources

a) in theory and
b) in practice (e.g. in popular triplestores)?

I did not test it yet, but I assume that not all implementations would 
treat


   http://purl.org/NET/c4dm/event.owl#Event
   HTTP://purl.org/NET/c4dm/event.owl#Event
   http://PURL.org/NET/c4dm/event.owl#Event
   http://purl.org:80/NET/c4dm/event.owl#Event

as the same class.

Any facts or opinions?

Best

Martin


[1] http://www.ietf.org/rfc/rfc2616.txt


martin hepp
e-business  web science research group
universitaet der bundeswehr muenchen

e-mail:  h...@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
 http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Better be a bit more specific.. in-line..

Nathan wrote:

Kingsley Idehen wrote:

On 1/17/11 10:51 AM, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client  
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


As per the percent encoding rules and the set of unreserved characters 
[1], percent encoded octets in certain ranges (see [1]) should not be 
created by URI producers, and when found in a URI should be decoded 
correctly, this includes %7E - also percent encoding is case insensitive 
so %7e and %7E are equivalent, thus you should not produce URIs like 
this, and when found you should fix the error, to produce:


   http://abc.com:80/~smith/home.html
   http://ABC.com/~smith/home.html
   http://ABC.com:/~smith/home.html

The above URIs all use the generic syntax, so the generic component 
syntax equivalence rules always apply [2], so normalization after these 
rules would produce:


   http://abc.com:80/~smith/home.html
   http://abc.com/~smith/home.html
   http://abc.com:/~smith/home.html

Then finally, scheme specific normalization rules can be applied which 
treat all the port values as being equivalent (for the purpose of naming 
and dereferencing, it's the specification for URIs with that scheme), 
which allows you to normalize to:


   http://abc.com/~smith/home.html
   http://abc.com/~smith/home.html
   http://abc.com/~smith/home.html

[1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
[2] http://tools.ietf.org/html/rfc3986#section-2.3
[3] http://tools.ietf.org/html/rfc3986#section-6.2.3

Hope that helps refine my previous comments,


Does this also hold for identifying RDF resources


Yes, where an RDF resource is a Data Container at an Address (URL). 
Thus, equivalent results for de-referencing a URL en route to 
accessing data.


No, when resource also implies an Entity (Data Item or Data Object) 
that is assigned a Name via URI.


Logically, yes on both counts, we should/could be normalizing these URIs 
as we consume and publish using the syntax based normalization rules [1] 
which apply to all URI/IRIs with the generic syntax (such as the 
examples above)


Any client consuming data, or server publishing data, can use the 
normalization rules, so it stands to reason that it's pretty important 
that we all do it to avoid false negatives.


[1] http://tools.ietf.org/html/rfc3986#section-6.2.2

Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Nuno Bettencourt wrote:
Hi, 


Even though I'll be deviating the point just a bit, since we're discussing URI 
comparison in terms of RDF, I would like to request some help.

I have a doubt about URLs when it comes to RDF URI comparison. Is there any RFC that establishes if 


http://abc.com:80/~smith/home.html
https://abc.com:80/~smith/home.html
or even
ftp://abc.com:80/~smith/home.html
 
should or not be considered the same resource?


No, and no such rules can be written (as they are case specific, and all 
the above URIs could easily, and often do, point to differing resources) 
- if all URIs point to the same resource then it should be stated as 
such by some other means, which in RDF would mean owl:sameas.


Best,

Nathan

RE: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nuno Bettencourt

Hi, 

Even though I'll be deviating the point just a bit, since we're discussing URI 
comparison in terms of RDF, I would like to request some help.

I have a doubt about URLs when it comes to RDF URI comparison. Is there any RFC 
that establishes if 

http://abc.com:80/~smith/home.html
https://abc.com:80/~smith/home.html
or even
ftp://abc.com:80/~smith/home.html
 
should or not be considered the same resource?

Best regards,

Nuno Bettencourt

 -Original Message-
 From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On
 Behalf Of Nathan
 Sent: segunda-feira, 17 de Janeiro de 2011 16:53
 To: Dave Reynolds; Sandro Hawke
 Cc: Martin Hepp; public-lod@w3.org
 Subject: Re: URI Comparisons: RFC 2616 vs. RDF
 
 Dave Reynolds wrote:
  On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote:
  Dear all:
 
  RFC 2616 [1, section 3.2.3] says that
 
  When comparing two URIs to decide if they match or not, a client
  SHOULD use a case-sensitive octet-by-octet comparison of the entire
  URIs, with these exceptions:
 
 - A port that is empty or not given is equivalent to the default
   port for that URI-reference;
 - Comparisons of host names MUST be case-insensitive;
 - Comparisons of scheme names MUST be case-insensitive;
 - An empty abs_path is equivalent to an abs_path of /.
 
  Characters other than those in the reserved and unsafe sets (see
  RFC 2396 [42]) are equivalent to their % HEX HEX encoding.
 
  For example, the following three URIs are equivalent:
 
 http://abc.com:80/~smith/home.html
 http://ABC.com/%7Esmith/home.html
 http://ABC.com:/%7esmith/home.html
  
 
  Does this also hold for identifying RDF resources
 
  a) in theory and
 
  No. RDF Concepts defines equality of RDF URI References [1] as simply
  character-by-character equality of the %-encoded UTF-8 Unicode strings.
 
  Note the final Note in that section:
 
  
  Note: Because of the risk of confusion between RDF URI references that
  would be equivalent if derefenced, the use of %-escaped characters in
  RDF URI references is strongly discouraged.
  
 
  which explicitly calls out the difference between URI equivalence
  (dereference to the same resource) and RDF URI Reference equality.
 
 I'd suggest that it's a little more complex than that, and that this may be an
 issue to clear up in the next RDF WG (it's on the charter I believe).
 
 For example:
 
 When a URI uses components of the generic syntax, the component
 syntax equivalence rules always apply; namely, that the scheme and
 host are case-insensitive and therefore should be normalized to
 lowercase.  For example, the URI HTTP://www.EXAMPLE.com/ is
 equivalent to http://www.example.com/.
 
 - http://tools.ietf.org/html/rfc3986#section-6.2.2.1
 
 However, that's only for URIs which use the generic syntax (which most URIs
 we ever touch do use).
 
 It would be great if a normalized-IRI with specific normalization rules could 
 be
 drafted up as part of the next WG charter - after all they are a pretty 
 pivotal
 part of the sem web setup, and it would be relatively easy to clear up these
 issues.
 
 Best,
 
 Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Christopher Gutteridge

In the short term, it sounds like there's a gap in the code-ecosystem 
for a really lightweight tool which took a stream of N-Triples and just 
output a normalised stream of N-Triples ready for import. The examples 
below would make a good initial test set for it. I'd write it if I 
didn't have a bunch of code-bunnies biting my ankles and demanding to be 
created.



As for triple stores; I know that the number of triples-per-second on 
import can be important, so if you already know you're data is clean 
you'd want to at least make normalise-on-input optional to improve 
performance.


On 17/01/11 16:57, Nathan wrote:

Kingsley Idehen wrote:

On 1/17/11 10:51 AM, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client  
SHOULD use a case-sensitive octet-by-octet comparison of the entire

   URIs, with these exceptions:

  - A port that is empty or not given is equivalent to the default
port for that URI-reference;
  - Comparisons of host names MUST be case-insensitive;
  - Comparisons of scheme names MUST be case-insensitive;
  - An empty abs_path is equivalent to an abs_path of /.

   Characters other than those in the reserved and unsafe sets (see
   RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

   For example, the following three URIs are equivalent:

  http://abc.com:80/~smith/home.html
  http://ABC.com/%7Esmith/home.html
  http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources


Yes, where an RDF resource is a Data Container at an Address (URL). 
Thus, equivalent results for de-referencing a URL en route to 
accessing data.


No, when resource also implies an Entity (Data Item or Data Object) 
that is assigned a Name via URI.


Logically, yes on both counts, we should/could be normalizing these 
URIs as we consume and publish using the syntax based normalization 
rules [1] which apply to all URI/IRIs with the generic syntax (such as 
the examples above)


Any client consuming data, or server publishing data, can use the 
normalization rules, so it stands to reason that it's pretty important 
that we all do it to avoid false negatives.


[1] http://tools.ietf.org/html/rfc3986#section-6.2.2

Best,

Nathan



--
Christopher Gutteridge -- http://id.ecs.soton.ac.uk/person/1248

/ Lead Developer, EPrints Project, http://eprints.org/
/ Web Projects Manager, ECS, University of Southampton, 
http://www.ecs.soton.ac.uk/
/ Webmaster, Web Science Trust, http://www.webscience.org/

RE: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nuno Bettencourt

Hi,

The doubt just kept on because in all protocols we were still referring to the 
same URN.

Thank you for your explanation, and we've been using the owl:sameAs property 
for this. 

Nuno Bettencourt

 -Original Message-
 From: Nathan [mailto:nat...@webr3.org]
 Sent: segunda-feira, 17 de Janeiro de 2011 17:34
 To: Nuno Bettencourt
 Cc: 'Dave Reynolds'; 'Martin Hepp'; public-lod@w3.org
 Subject: Re: URI Comparisons: RFC 2616 vs. RDF
 
 Nuno Bettencourt wrote:
  Hi,
 
  Even though I'll be deviating the point just a bit, since we're discussing 
  URI
 comparison in terms of RDF, I would like to request some help.
 
  I have a doubt about URLs when it comes to RDF URI comparison. Is
  there any RFC that establishes if
 
  http://abc.com:80/~smith/home.html
  https://abc.com:80/~smith/home.html
  or even
  ftp://abc.com:80/~smith/home.html
 
  should or not be considered the same resource?
 
 No, and no such rules can be written (as they are case specific, and all the
 above URIs could easily, and often do, point to differing resources)
 - if all URIs point to the same resource then it should be stated as such by
 some other means, which in RDF would mean owl:sameas.
 
 Best,
 
 Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Nuno Bettencourt wrote:

Hi,

The doubt just kept on because in all protocols we were still referring to the 
same URN.


do you mean that there were RDF statements which linked each of the 
protocol specific URIs to a single URN via the same property? eg:


  http://... x:foo urn:here
  https://... x:foo urn:here
  ftp://... x:foo urn:here

If so, then you could define the property (x:foo above) as an Inverse 
Functional Property which would take care of the sameness for you.


Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Dave Reynolds

On Mon, 2011-01-17 at 16:52 +, Nathan wrote: 
 Dave Reynolds wrote:
  On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 
  Dear all:
 
  RFC 2616 [1, section 3.2.3] says that
 
  When comparing two URIs to decide if they match or not, a client   
  SHOULD use a case-sensitive octet-by-octet comparison of the entire
  URIs, with these exceptions:
 
 - A port that is empty or not given is equivalent to the default
   port for that URI-reference;
 - Comparisons of host names MUST be case-insensitive;
 - Comparisons of scheme names MUST be case-insensitive;
 - An empty abs_path is equivalent to an abs_path of /.
 
  Characters other than those in the reserved and unsafe sets (see
  RFC 2396 [42]) are equivalent to their % HEX HEX encoding.
 
  For example, the following three URIs are equivalent:
 
 http://abc.com:80/~smith/home.html
 http://ABC.com/%7Esmith/home.html
 http://ABC.com:/%7esmith/home.html
  
 
  Does this also hold for identifying RDF resources
 
  a) in theory and
  
  No. RDF Concepts defines equality of RDF URI References [1] as simply
  character-by-character equality of the %-encoded UTF-8 Unicode strings.
  
  Note the final Note in that section:
  
  
  Note: Because of the risk of confusion between RDF URI references that
  would be equivalent if derefenced, the use of %-escaped characters in
  RDF URI references is strongly discouraged. 
  
  
  which explicitly calls out the difference between URI equivalence
  (dereference to the same resource) and RDF URI Reference equality.
 
 I'd suggest that it's a little more complex than that, and that this may 
 be an issue to clear up in the next RDF WG (it's on the charter I believe).

I beg to differ.

The charter does state: 

Clarify the usage of IRI references for RDF resources, e.g., per SPARQL
Query §1.2.4.

However, I was under the impression that was simply removing the small
difference between RDF URI References and the IRI spec (that they had
anticipated). Specifically I thought the only substantive issue there
was the treatment of space and many RDF processors already take the
conservation position on that anyway.

Replacing encoded string equality by deference-equivalence would be a
pretty big change to RDF and I hadn't realized that was being
considered.

Could one of the nominated chairs or a W3C rep clarify this?

 For example:
 
 When a URI uses components of the generic syntax, the component
 syntax equivalence rules always apply; namely, that the scheme and
 host are case-insensitive and therefore should be normalized to
 lowercase.  For example, the URI HTTP://www.EXAMPLE.com/ is
 equivalent to http://www.example.com/.
 
 - http://tools.ietf.org/html/rfc3986#section-6.2.2.1

Sure but the later RDF-related specs such as GRDDL and RIF clarify the
application of that in RDF. For example in RIF [1] we said:

Neither Syntax-Based Normalization nor Scheme-Based Normalization
(described in Sections 6.2.2 and 6.2.3 of RFC-3986) are performed.

A form of words that, I think, we lifted verbatim from GRDDL which in
turn had chosen them to clarify how the original RDF URI References spec
should be interpreted in the light of the updated URI/IRI RFCs.

Changing RDF to require syntax or scheme based normalization would
require changing at least RIF and GRDDL as well. If that was really on
the cards I would have expected it to have been more broadly publicized.

Dave

[1] http://www.w3.org/TR/2010/PR-rif-dtb-20100511/#Relative_IRIs

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Nathan


Dave Reynolds wrote:
On Mon, 2011-01-17 at 16:52 +, Nathan wrote: 
I'd suggest that it's a little more complex than that, and that this may 
be an issue to clear up in the next RDF WG (it's on the charter I believe).


I beg to differ.

The charter does state: 


Clarify the usage of IRI references for RDF resources, e.g., per SPARQL
Query §1.2.4.

However, I was under the impression that was simply removing the small
difference between RDF URI References and the IRI spec (that they had
anticipated). Specifically I thought the only substantive issue there
was the treatment of space and many RDF processors already take the
conservation position on that anyway.


Likewise, apologies as I should have picked my choice of words more 
appropriately, I intended to say that the usage of IRI references was up 
for clarification, and if normalization were deemed an issue then the 
RDF WG may be the place to raise such an issue, and address if needed.


As for RIF and GRDDL, can anybody point me to the reasons why 
normalization are not performed, does this have xmlns heritage?


Best,

Nathan

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Kingsley Idehen


On 1/17/11 12:27 PM, Nuno Bettencourt wrote:

Hi,

Even though I'll be deviating the point just a bit, since we're discussing URI 
comparison in terms of RDF, I would like to request some help.

I have a doubt about URLs when it comes to RDF URI comparison. Is there any RFC 
that establishes if

http://abc.com:80/~smith/home.html
https://abc.com:80/~smith/home.html
or even
ftp://abc.com:80/~smith/home.html

should or not be considered the same resource?


All of the above are Addresses (based on what I can infer via my visual 
senses). The URI abstraction enables multiple scheme data access. ftp: 
and http: are schemes. None of them isA resource. They simply provide 
access to data why may be serialized in a variety of formats to a user 
agent that de-references any of these Addresses. Basically, network 
aware pointers with data representation dexterity courtesy of URI 
abstraction and HTTP's content negotiation.


Kingsley



Best regards,

Nuno Bettencourt


-Original Message-
From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On
Behalf Of Nathan
Sent: segunda-feira, 17 de Janeiro de 2011 16:53
To: Dave Reynolds; Sandro Hawke
Cc: Martin Hepp; public-lod@w3.org
Subject: Re: URI Comparisons: RFC 2616 vs. RDF

Dave Reynolds wrote:

On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote:

Dear all:

RFC 2616 [1, section 3.2.3] says that

When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
 URIs, with these exceptions:

- A port that is empty or not given is equivalent to the default
  port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of /.

 Characters other than those in the reserved and unsafe sets (see
 RFC 2396 [42]) are equivalent to their % HEX HEX encoding.

 For example, the following three URIs are equivalent:

http://abc.com:80/~smith/home.html
http://ABC.com/%7Esmith/home.html
http://ABC.com:/%7esmith/home.html


Does this also hold for identifying RDF resources

a) in theory and

No. RDF Concepts defines equality of RDF URI References [1] as simply
character-by-character equality of the %-encoded UTF-8 Unicode strings.

Note the final Note in that section:


Note: Because of the risk of confusion between RDF URI references that
would be equivalent if derefenced, the use of %-escaped characters in
RDF URI references is strongly discouraged.


which explicitly calls out the difference between URI equivalence
(dereference to the same resource) and RDF URI Reference equality.

I'd suggest that it's a little more complex than that, and that this may be an
issue to clear up in the next RDF WG (it's on the charter I believe).

For example:

 When a URI uses components of the generic syntax, the component
 syntax equivalence rules always apply; namely, that the scheme and
 host are case-insensitive and therefore should be normalized to
 lowercase.  For example, the URIHTTP://www.EXAMPLE.com/  is
 equivalent tohttp://www.example.com/.

- http://tools.ietf.org/html/rfc3986#section-6.2.2.1

However, that's only for URIs which use the generic syntax (which most URIs
we ever touch do use).

It would be great if a normalized-IRI with specific normalization rules could be
drafted up as part of the next WG charter - after all they are a pretty pivotal
part of the sem web setup, and it would be relatively easy to clear up these
issues.

Best,

Nathan







--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Tim Berners-Lee


On 2011-01 -17, at 16:37, Dave Reynolds wrote:

 On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 
 Dear all:
 
 RFC 2616 [1, section 3.2.3] says that
 
 When comparing two URIs to decide if they match or not, a client   
 SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:
 
   - A port that is empty or not given is equivalent to the default
 port for that URI-reference;
   - Comparisons of host names MUST be case-insensitive;
   - Comparisons of scheme names MUST be case-insensitive;
   - An empty abs_path is equivalent to an abs_path of /.
 
Characters other than those in the reserved and unsafe sets (see
RFC 2396 [42]) are equivalent to their % HEX HEX encoding.
 
For example, the following three URIs are equivalent:
 
   http://abc.com:80/~smith/home.html
   http://ABC.com/%7Esmith/home.html
   http://ABC.com:/%7esmith/home.html
 
 
 Does this also hold for identifying RDF resources
 
 a) in theory and

Yes this does hold for RDF systems.
You can't guarantee that all RDF systems will do it, so
RDF systems should in general exchange canonicalized URIs.
There is a ladder of levels at  which smarter and smarter systems 
are aware of more and more equivalences. 
Good to make your system smart and not end up
with widow graphs about http://WWW.w3.org/foo.

cwm for example canonicalizes URIs when it loads them into the store.



 
 No. RDF Concepts defines equality of RDF URI References [1] as simply
 character-by-character equality of the %-encoded UTF-8 Unicode strings.
 
 Note the final Note in that section:
 
 
 Note: Because of the risk of confusion between RDF URI references that
 would be equivalent if derefenced, the use of %-escaped characters in
 RDF URI references is strongly discouraged. 
 
 
 which explicitly calls out the difference between URI equivalence
 (dereference to the same resource) and RDF URI Reference equality.
 
 BTW the more up to date RFC for looking at equivalence (as opposed to
 equality) issues is probably the IRI spec [2] which defines a comparison
 ladder for testing equivalence.

Exactly.

 
 Dave
 
 [1]
 http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref
 
 [2] http://www.ietf.org/rfc/rfc3987.txt

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-17 Thread Kingsley Idehen


On 1/17/11 4:54 PM, Nuno Bettencourt wrote:

Hi,

thank you for the suggestion. This had been a problem before, which in fact 
becomes easier to solve like that.

In my current situation, we were dealing with public/private/protected 
resources (files), secured by https.

So, if a person/agent has a private/protected resource (file) (that only shares 
with some specific individuals and is only accessible using https protocol) it 
would be hosted under https://server/abc.html.

For this, I would have for example, the following triple:

1) http://server/#me dc:publisher https://server/abc.html

Nevertheless, if afterwards I publically publish that resource (file), for 
technical reasons that same resource (file) would be given a new URI 
http://server/abc.html so that it would not require authentication and a new 
triple would be created (for terms of simplicity I'm omitting other triples 
that are generated):

2) http://server/#me dc:publisher http://server/abc.html

In fact, both those resources (files) are the same, mapped for the same 
physical file but while the first required SSL credentials, the second does not.

In order for those users who before had access to the private resource, to keep 
accessing the resource, since it is now public (but has been moved from 
protected), I would had a triple in order for the semantic system to be able to 
retrieve the same resource, since it is no longer available under its original 
location.

3) https://server/abc.html owl:sameAs http://server/abc.html


But at this point your context has changed, you are now make an 
assertion in a deductive data space. Basically, a record that is also a 
proposition re. RDF (or any other) deductive system. Again, the moment 
you make a triple, you are making a propositional statement. And the 
moment you do that, in the context of HTTP based Linked Data, it has to 
be something like this:


https://server/abc.html#this owl:sameAs http://server/abc.html#this .

If you don't care about Linked Data via HTTP user agents following links etc. ; 
meaning you're happy with a local graph of propositions that is SPARQL 
queryable, for instance, then this works too:

https://server/abc.html owl:sameAs http://server/abc.html .


This unfortunately leads to a minimal and probably unrealistic problem like an 
open URI https://server/abc.html that might not have any content, since there's 
no need for it as it has become public and no authentication is needed for 
accessing it - but it is necessary to keep that triple 1) alive  as others 
might be consuming that information. Triple 3) helps those in finding the 
resource again.

One and more rich possible solution might be implementing time reasoning 
mechanisms over this, in order to eliminate those 'fake' URIs, but that would 
grow the triple store and make reasoning even more time consuming (for now).



No need for fake URIs (I guess you might think the #this above == fake), 
it's just comes down to Name References and the need for them to resolve 
to something useful, which may or may not be useful (e.g. navigable) to 
an HTTP agent, or deliver factual basis for inference by a deduction 
oriented engine (logic reasoner).


I hope this helps.


Kingsley


Nuno


-Original Message-
From: Nathan [mailto:nat...@webr3.org]
Sent: segunda-feira, 17 de Janeiro de 2011 18:06
To: Nuno Bettencourt
Cc: public-lod@w3.org
Subject: Re: URI Comparisons: RFC 2616 vs. RDF

Nuno Bettencourt wrote:

Hi,

The doubt just kept on because in all protocols we were still referring to

the same URN.

do you mean that there were RDF statements which linked each of the
protocol specific URIs to a single URN via the same property? eg:

http://...  x:foourn:here
https://...  x:foourn:here
ftp://...  x:foourn:here

If so, then you could define the property (x:foo above) as an Inverse
Functional Property which would take care of the sameness for you.

Best,

Nathan






--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

50 matches

Mail list logo