Hi John,
Our 'RDF Schema' approach, based on many years of multilingual
vocabulary development and exemplified by the RDA Vocabularies:
http://www.rdaregistry.info/
…might be helpful. Some more inline...
On 23 Jul 2014, at 7:22, john.walker wrote:
Hi There,
There is plenty of advice/help out there regarding URI schemes for
instance
data, for example the EC study on persistent URIs [1].
I was wondering if there are any similar studies or guidelines about
URI schemes
for RDF schema (using this as catch all term for vocabulary, data
dictionary,
schema, ontology).
The particular use case I have is a ISO 13584 compliant data
dictionary with a
few hundred classes and over 1000 properties which I'd like to convert
to RDF.
Everything in the dictionary (including the dictionary itself) is
identified
with an IRDI [2].
Points to consider:
1. (I'll get this one out of the way first :) ) Hash vs. slash URIs:
What's the
latest advice/pros/cons? Currently I am leaning towards slash URIs so
the user
is not forced to download the entire schema in one file (of course we
can always
provide a dump for those who want it). Any best practices here?
I can't say that it's a best practice, but we strongly prefer slash URIs
even though it presents some management challenges wrt content
negotiation.
2. URN or HTTP URI: A URN scheme for IRDIs has previously been mooted,
but seems
a distinct lack of progress. Following linked data principles I was
planning to
use HTTP URIs instead. Would there be any advantage to use URNs
instead?
The main disadvantage to a non-HTTP URN is the need to maintain some
form of URN resolution service over time, assuming that you want/need
the URNs to resolve. It also limits public/global reuse and mapping of
your vocabularies, which hopefully isn't desirable.
3. Human-readable URIs: Many widely used schema (e.g. Schema.org,
FOAF) have a
human-readable component in the URI, typically a URI-friendly version
of the
label. I can see this makes things a lot easier for human consumers
when reading
raw Turtle or writing a SPARQL query. However the labels are subject
to change
over time, are in multiple languages and are not unique. It is simple
to define
a mapping from IRDI to URI, but this does not give a meaningful URI
(e.g.
http://example.com/myDictionary/c_abc123), but would guarantee
uniqueness and
persistence. Given the opacity axiom [3] does this really matter? I
could
imagine that one could allow the editor of the dictionary to define
slugs that
would be to build the URI rather than generating from the IRDI. These
could be
optional and you might only define such a slug for the most commonly
used terms.
Alternatively one could define these as aliases with additional
statements
defining some equivalence links (perhaps using owl:sameAs,
owl:equivalentClass
and owl:equivalentProperty).
<http://example.com/myDictionary/c_abc123> owl:equivalentClass
<http://example.com/myDictionary/Person> .
Has anyone ever tried such an approach?
The RDA developers are using this approach:
http://www.rdaregistry.info/rgFAQ.html
and
http://www.slideshare.net/jonphipps1/ala-presentation-36888593, slides
11-12
We've coined a reg:lexicalAlias (intended to be a more semantically
specific subproperty of owl:sameAs) attribute to describe the
relationship between a mutable, language-specific, label-based URI and a
canonical, language-independent, 'opaque' URI. We're returning an HTTP
308 header (newly redefined) when a lexical URI is resolved to a
canonical URI.
See http://tools.ietf.org/html/rfc7238
4. Versioning: The IRDI includes a version identifier where there are
clearly
defined rules about what type of change can be done within a version
(e.g.
editorial changes), what can be done as a version change (e.g.
upward-compatible
change) and what requires a new identifier (breaking change). I was
thinking to
exclude this version identifier from the URI, but perhaps (if needed)
expose the
different versions/states of the resource using Memento [4]. Any
experiences
with using such an approach?
We prefer to have URI resolution always be to the most current version
and aren't planning to offer versioned resolution anytime soon. That
said, we recognize that public linked data that absolutely depends on
stable semantics defined by a specific version of the vocabularies will
need to be able to dynamically reference that specific version, and
probably as part of the URI -- it's unlikely (although possible) that
linked-data-based systems will be able to effectively utilize any of the
other non-URI-based versioning methods. When we do implement support for
specific version declarations it may be something like Memento, but it's
more likely to be something like:
https://www.npmjs.org/doc/package.json.html#version
or
https://getcomposer.org/doc/01-basic-usage.md#package-versions
or
http://guides.rubygems.org/patterns/#declaring-dependencies
As an interim alternative, we make each version of the vocabularies
available as a download:
http://www.rdaregistry.info/rgAbout/versions.html
…and this can be loaded into a triple store along with its dependent
linked data, eliminating the need for dynamic resolution, although
there's currently no broadly accepted best practice around defining the
requirement for a specific vocabulary version, and it's download
location, that I'm aware of.
5. Serving representations: Maybe this is a moot point, but I would
consider the
'things' described in the dictionary to be abstract entities and, as
such, to
give a 303 response if used with slash URIs. The response would then
include a
redirect to the information resource that would use conneg to serve
the
different representations/states of that resource. However I do not
see this
practice widely used for other RDF schemas. Any reason why?
Not that I'm personally aware of. It's the practice we generally follow:
$ curl -I http://rdaregistry.info/Elements/a/P50026
HTTP/1.1 303 See Other
Location: http://rdaregistry.info/Elements/a/P50026.n3
HTTP/1.1 303 See Other
Location: http://rdaregistry.info/Elements/a.n3
n3 is the default but the above URI redirects again to the full
vocabulary because
at the moment only jsonld representations serve individual elements
(server issues)
$ curl -I -H "Accept:text/html"
http://rdaregistry.info/Elements/a/P50026
HTTP/1.1 303 See Other
Location: http://www.rdaregistry.info/Elements/a/#P50026
(note that the HTML document is a single resource with IDs for each
vocabulary element)
$ curl -I -H "Accept:application/ld+json"
http://rdaregistry.info/Elements/a/P50026
HTTP/1.1 303 See Other
Location: http://rdaregistry.info/Elements/a/P50026.jsonld
Hope this helps some,
Jon Phipps
http://metadataregistry.org/
http://managemetadata.com/
[1] http://philarcher.org/diary/2013/uripersistence/
[2] http://wiki.eclass.eu/wiki/IRDI
[3] http://www.w3.org/DesignIssues/Axioms.html#opaque
[4] http://mementoweb.org/
Regards,
John Walker