On 07/01/2025 08:45, zPlus wrote:
Thank you for the answer. I think I can see the problem, but I'm still convinced
that it should be left to the data producers to standardize on one form, if they
need to.
In RDF, by definition, all URI are resolved against the base URI and
there is always a base URI of some kind.
It looks to me like a common problem that multiple URIs exist for representing
the same entity.
Worse - the URI may not mean what you think it means.
In RDF, by definition, all URI are resolved against the base URI and
there is always a base URI of some kind.
RDF concepts says:
https://www.w3.org/TR/rdf12-concepts/#section-IRIs
"""
IRIs in the RDF abstract syntax MUST be resolved per [RFC3986], and MAY
contain a fragment identifier.
"""
For example when combining different sources or different
ontologies. I think data producers get around this either by agreeing on one
particular URI,
This is a web technology - data producers don't know each other, and
combining may be by another party which is why we have standards.
or by creating a new URI, or resorting to reasoning
(owl:sameAs). But I would not expect the database to automatically change any
URI, that's why this was surprising to me.
I think a very similar example is
These are not similar (RFC3986 - teh definition of URI/IRIs) applies here)
(Nowadays URI and IRI are synonymous and URI using UTF-8 is taking over).
http:example.org/path/to/file
Input: <http:example.org/path/to/file>
Scheme |http|
Authority |null|
Host |null|
Path |example.org/path/to/file|
Query |null|
Fragment |null|
Scheme specific warnings:
<http:example.org/path/to/file> http and https URI schemes require
//host/
<http:example.org/path/to/file> http and https URI schemes do not
allow the host to be empty
----
it no authority compoent (user,host,port),it has a relative path and is
not a resolved URI.
If the base is http://base/maybe/
that resolves to:
http://base/maybe/example.org/path/to/file
http:/example.org/path/to/file
Similarly, but a rooted path.
Resolved:
http://base/example.org/path/to/file
http://example.org/path/to/file
OK!
HTTP URI and resolves to itself.
http:///example.org/path//to///file
Empty host (authority present and the empty string); path starts at the
third "/" /example.org/....
Scheme |http|
Authority ||
Host ||
Path |/example.org/path/to/file|
Query |null|
Fragment |null|
Scheme specific warnings:
<http:///example.org/path/to/file> http and https URI schemes do not
allow the host to be empty
It resolves to http:///example.org/path/to/file - empty authority - and
is still violating the HTTP scheme rule.
When importing these, Jena does not change them, and treats them as different
URIs instead. I would expect this behaviour for every URI, unless "file:" needs
to be treated differently.
Every URI is resolved against the base -- this is a URI rule, not a
scheme specific rule. "file": happens to allow empty authority, http:
does not -- those are scheme-specific.
It also means "file:///" is a more logical choice than "file:/" because
the "file:/" has no authority (there is no // -- RFC3986 grammar),
whereas "file:///" has an empty authority.
Andy
On Mon, 2025-01-06 at 21:05 +0000, Andy Seaborne wrote:
On 06/01/2025 19:14, zPlus wrote:
But we need one form for URI matching otherwise "file:/path" does not
match "file:///path"
Why does Jena need to match "file:/path" and "file:///path"? Shouldn't it be
left to the user to choose one form or the other in their data?
There is no "right" answer for file: URLs.
Having one normalized form means the same name is for data producer
(load database) and data consumer (SPARQL query) whether they write it
file:/ or file:/// or a mixture; or when multiple sources of data are
combined. And across operating systems.
There isn't "the user".
https://datatracker.ietf.org/doc/html/rfc8089.html#appendix-B
"""
o A traditional file URI for a local file with an empty authority.
This is the most common format in use today. For example:
* "file:///path/to/file"
"""
And on Windows ...
C:/path is the "C:" URI scheme.
file:C:/path is going to be interpreted different on Windows and linux/Mac.
The whole thing is messy.
Andy