Re: [whatwg] Link rot is not dangerous

Laurens Holst Sat, 16 May 2009 16:59:17 -0700

Tab Atkins Jr. schreef:

Ho, ho, you’re making a big leap there! By me explaining that dereferencible
URIs are not needed to make RDF work on a core level, which makes RDF
robust, do not jump to the conclusion that it is of no benefit! URIs are
there for the benefit of linking, and help discoverability a lot (just like
HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
is effective. Incidentally, if an ontology disappears from its original
address, this kind of spidering will likely lead you to a copy thereof
stored elsewhere. For example on a different spider which has the triples
cached.


You had just stated in the previous email, however, that few (if any)
major consumers of RDFa *use* what is located on the far end of the
URI.  If they're not even paying attention to it, where is the value
in it?

I said that the ontologies were not used by many RDF consumers. This isbecause they can be computationally expensive, especially for large datasets, not because they are useless.


I think the most clear way I can put this is by comparison:

Your argument is like arguing against XML or JSON Schemas, concludingthat because they are externally referenced and not used by most XML orJSON applications, they are useless, and in fact that XML and JSONthemselves are useless. This is clearly false; removing a reference to aschema from a document, or a document not having a schema, does not makethe document itself useless, nor the document format it is expressed in.

Although RDF Schema and OWL are definitely part of the ‘RDF ecosystem’,they are built on top of the base RDF framework and they are not inthemselves required for RDF to function. However the schema does providea useful description about the document structures and has the abilityto express certain semantics, and is thus a worthy technology in its ownright.

I don't really understand the 'discoverability' argument here, at
least in the context of it being similar to HTML hyperlinks.
Hyperlinks are useful for people because they make it simple to
navigate to a new page.  You just click and it works, no need to
copypasta the address into a new browser window.

By what means the user dereferences the link is not relevant. The factthat an URI is there, identifying a unique location on the world wideweb, and thus contributing to the web of linked documents that we callthe World Wide Web. Without links and URIs, there would be no ‘web’.There would be a big set of networked yet isolated computers that alllive in their own walled garden.

Links provide discoverability of data provided elsewhere, by indicatinga location. Users can find other documents because of this. Searchengines like Google can spider the web based on this.


The Web of Linked Data is Tim Berners-Lee’s vision of a WWW for data.

I'm also not sure how a rotted link helps you compare vocabularies
with other spiders, which in a hypothetical world you are
communicating with (at this point we're *far* into theory, not
practice).  Any uniquifier would allow you to compare things in the
same way, no?

Just a simple rdfs:seeAlso statement referencing it in one single placewill allow a spider to ‘follow its own nose’ and find the triples of theontology in the republished location. This republication can beanywhere, a new ontology location, or a copy cached by another spiderthat republishes the triples it harvests on the web (such as archive.org[1]).

I agree we’re getting far into the theory-not-practice realm, which iswhy Shelley is right in saying that in practice vocabularies are servedfrom a location that is well cared for, e.g. using services like purl toprovide permanent URLs, or having a solid organisational backing, andPhilip Taylor’s list [2] does not do much to discredit this.

[Side note: To point out some flaws in Philip’s list, many of the sitesin his ‘404’ and ‘not responding’ list are experimental URLs.Additionally, the list fails to list usage frequency. Finally, it doesnot (and can not, obviously) list whether there was any RDF Schema atthose locations in the first place. Because, as I explained before, Ican make up the following RDF triple right here on the spot, and therewould be nothing wrong with it:


_:a rdf:type <http://grauw.nl/rdf#Game>

The type referenced in this triple’s subject has no ontology at thislocation. The fact that it is a type is inferred by it being referencedthrough rdf:type, and that is enough. There is no requirement that thistype resolves into a document containing RDF Schema triples. A creativeexample of this on the list is “java:java.util.Date”.]

You are now only considering the ontologies, that is, types and properties.
You’re forgetting (or ignoring) that in RDF, objects are also named with
URIs so that data at other locations can refer to it. You know, that ‘web of
linked data’ people refer to, core principle of RDF. No ‘simple’ scheme
based on what Ian proposed can provide a sufficient level of uniqueness for
that. URIs are the best and most natural fit for use as web-scale
identifiers.


Define 'sufficient', as used here.  I believe that this is an area
where absolute uniqueness is not a requirement.  Worst case, you get a
little bit of data pollution with weird triples being produced by
badly-written pages.  Perhaps your browser offers to add an event to
your calendar when no event shows up on the page, or a fraction of a
search engine's microdata collection is spurious.  Neither of these
are big deals.

That being said, I agree that URIs provide a very convenient source of
uniqueness.  Ian's microdata allows them to be used either in normal
form or in reverse-domain form; either way provides the necessary
uniqueness.

I am talking about individual triples for MANY pieces of data here. Takefor example the identifier of the band Coldplay on Zitgist:


 <http://zitgist.com/music/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234>

A reverse domain version of such an identifier would look like this:

 com.zitgist.music.artist.cc197bad-dc9c-440d-a5b5-d52ba2e14234

How exactly is this really shorter, or different other than ‘for thesake of doing it different’ and failing to build upon the well-knownconcept of URIs? Note that you can browse to the above URL in yourbrowser of choice and view the data.

Also, creating a framework to configure DNS servers to resolve to usefuldocuments for these domains will be pretty tedious. If you ask me, Hixieusing the ‘reverse DNS’ notation in his Microdata proposal is just atrick to pretend he is using something that is different from what RDFuses. If the domain were not ‘reversed’, people would see the similaritywith URIs too easily.

Note that in RDF, if you do not need this global identifying, you caneasily create anonymous nodes called blank nodes (‘bnodes’). Also, URIscan be written in relative form, making a triple statement often assimple as about="#laurens".

Example of some completely anonymous statements using bnodes (aside fromusing basic RDF building blocks):


_:a rdf:type _:Game
_:Game rdf:type rdfs:Class
_:Game rdfs:label "Game"

So as you can see, RDF also caters for the use cases you mentioned abovewhere uniqueness is not required. In RDFa, you achieve this by using a‘typeof’ attribute without corresponding ‘about’ attribute.

If you reuse properties from widely-used vocabularies though (such asFOAF, or Dublin Core), it seems obvious that they need to be identifiedglobally to avoid namespace conflicts. Instead of long‘org.foaf-project.Person’ identifiers as Hixie proposes, RDF uses URIsand most RDF serialisations go for a (shorter) prefix-based‘foaf:Person’ solution, which IMO is pretty user-friendly.


~Laurens

[1] http://web.archive.org/web/*/http://www.grauw.nl/foaf.rdf
[2] http://philip.html5.org/data/rdf-namespace-status.txt

--
Note: New email address! Please update your address book.

~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, student, Utrecht University, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com

begin:vcard
fn:Laurens Holst
n:Holst;Laurens
email;internet:laurens.s...@grauw.nl
version:2.1
end:vcard

Re: [whatwg] Link rot is not dangerous

Reply via email to