Re: [whatwg] Link rot is not dangerous

2009-05-20 Thread Toby A Inkster

On 20 May 2009, at 05:23, Tab Atkins Jr. wrote:


Specifically, people can use a search engine to find information about
foaf.  I know that typing foaf into my browser's address bar and
clicking on the first likely link is *way* faster than digging into a
document with a foaf namespace declared, finding the url, and
copy/pasting that into the location bar.



FOAF is a very famous vocabulary, so this happens to work quite well  
for FOAF.


Consider Dublin Core though. Typing dc into Google brings up  
results for DC Comics, DC Shoes, Washington DC and a file sharing  
application called Direct Connect, all ahead of Dublin Core, which is  
the nineth result. Even if I spot that result, clicking through takes  
me to the Dublin Core Metadata Initiative's homepage, which is mostly  
full of conference and event information - not the definitions I'm  
looking for.


On the other hand, typing http://purl.org/dc/terms/issued into my  
browser's address bar gives me an RDFS definition of the term  
immediately.


Your suggestion also makes the assumption that there is a single  
correct answer that Google/Yahoo/whatever could give to such a query  
- that any given string used as a prefix will only ever be  
legitimately bound to one vocabulary. That is simply not the case:  
dc for example is most often used with Dublin Core Elements 1.1,  
but still occasionally seen as a prefix for the older 1.0 version,  
and increasingly being used with the new Dublin Core Terms  
collection. While Elements 1.0 and 1.1 are largely compatible (the  
latter introduces two extra terms IIRC), Dublin Core Terms has  
significant differences. bio is another string commonly bound to  
different vocabularies - both the biographical vocab often used in  
conjunction with FOAF, plus various life-science-related vocabularies.


--
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk





Re: [whatwg] Link rot is not dangerous

2009-05-20 Thread Julian Reschke

Kristof Zelechovski wrote:

Following the URL to discover the semantic of properties is not only useful
but can also be necessary for CURIE, e.g. when the author uses a paradoxical
prefix just for the fun of it.  A language without CURIE would not expose
the users to this necessity.
If you have to CURIE, you have to FYN.
Just my POV,
Chris


CURIEs vs URI is only a syntactical difference; you don't need to FYN as 
long as you are happy with a URI as identifier.


BR, Julian


Re: [whatwg] Link rot is not dangerous

2009-05-20 Thread Tab Atkins Jr.
On Wed, May 20, 2009 at 2:35 AM, Toby A Inkster m...@tobyinkster.co.uk wrote:
 On 20 May 2009, at 05:23, Tab Atkins Jr. wrote:

 Specifically, people can use a search engine to find information about
 foaf.  I know that typing foaf into my browser's address bar and
 clicking on the first likely link is *way* faster than digging into a
 document with a foaf namespace declared, finding the url, and
 copy/pasting that into the location bar.


 FOAF is a very famous vocabulary, so this happens to work quite well for
 FOAF.

 Consider Dublin Core though. Typing dc into Google brings up results for
 DC Comics, DC Shoes, Washington DC and a file sharing application called
 Direct Connect, all ahead of Dublin Core, which is the nineth result. Even
 if I spot that result, clicking through takes me to the Dublin Core Metadata
 Initiative's homepage, which is mostly full of conference and event
 information - not the definitions I'm looking for.

 On the other hand, typing http://purl.org/dc/terms/issued into my
 browser's address bar gives me an RDFS definition of the term immediately.

As Kristof said, while typing dc isn't very helpful, typing pretty
much any relevant property works great.  dc:title, dc:creator,
whatever.  It all brings up some decent results right at the top of a
Google search.

 Your suggestion also makes the assumption that there is a single correct
 answer that Google/Yahoo/whatever could give to such a query - that any
 given string used as a prefix will only ever be legitimately bound to one
 vocabulary. That is simply not the case: dc for example is most often used
 with Dublin Core Elements 1.1, but still occasionally seen as a prefix for
 the older 1.0 version, and increasingly being used with the new Dublin Core
 Terms collection. While Elements 1.0 and 1.1 are largely compatible (the
 latter introduces two extra terms IIRC), Dublin Core Terms has significant
 differences. bio is another string commonly bound to different
 vocabularies - both the biographical vocab often used in conjunction with
 FOAF, plus various life-science-related vocabularies.

And yet, given an example use of the vocabulary, I'm quite certain I
can easily find the page I want describing the vocab, even when there
are overlaps in prefixes such as with bio.

FYN is nearly never necessary for humans.  We have the intelligence to
craft search queries and decide which returned result is correct.

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-20 Thread Dan Brickley

On 20/5/09 22:54, Tab Atkins Jr. wrote:

On Wed, May 20, 2009 at 2:35 AM, Toby A Inksterm...@tobyinkster.co.uk  wrote:



And yet, given an example use of the vocabulary, I'm quite certain I
can easily find the page I want describing the vocab, even when there
are overlaps in prefixes such as with bio.

FYN is nearly never necessary for humans.  We have the intelligence to
craft search queries and decide which returned result is correct.


What happens in practice is that many of these perfectly intelligent 
humans ask in email or IRC questions that are clearly answered directly 
in the relevant documentation. You can lead humans to the documentation, 
but you can't make 'em read...


cheers,

Dan


Re: [whatwg] Link rot is not dangerous

2009-05-20 Thread Tab Atkins Jr.
On Wed, May 20, 2009 at 4:02 PM, Dan Brickley dan...@danbri.org wrote:
 On 20/5/09 22:54, Tab Atkins Jr. wrote:

 On Wed, May 20, 2009 at 2:35 AM, Toby A Inksterm...@tobyinkster.co.uk
  wrote:

 And yet, given an example use of the vocabulary, I'm quite certain I
 can easily find the page I want describing the vocab, even when there
 are overlaps in prefixes such as with bio.

 FYN is nearly never necessary for humans.  We have the intelligence to
 craft search queries and decide which returned result is correct.

 What happens in practice is that many of these perfectly intelligent humans
 ask in email or IRC questions that are clearly answered directly in the
 relevant documentation. You can lead humans to the documentation, but you
 can't make 'em read...

This is an unfortunate reality, and one which cannot be cured simply
be embedding a url reasonably close to the location.  I humbly suggest
using www.lmgtfy.com to chastise the lazy bums while remaining
helpful.  ^_^

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-19 Thread Tab Atkins Jr.
On Mon, May 18, 2009 at 7:26 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 18, 2009, at 14:45, Dan Brickley wrote:
 Since there is useful information to know about FOAF properties and terms
 from its schema and human-oriented docs, it would be a shame if people
 ignored that. Since domain names can be lost, it would also be a shame if
 directly de-referencing URIs to the schema was the only way people could
 find that info. Fortunately, neither is the case.

 I wasn't talking about people but about apps dereferencing NS URIs to enable
 their functionality.

Specifically, people can use a search engine to find information about
foaf.  I know that typing foaf into my browser's address bar and
clicking on the first likely link is *way* faster than digging into a
document with a foaf namespace declared, finding the url, and
copy/pasting that into the location bar.

There are always decent search terms around to help people find the
information at least as easily, and certainly more reliably, than an
embedded url.

The just use a search engine position has been brought up by Ian
with respect to multiple cases in the overall discussion as well.  For
humans, search engines are just more reliable and easier to use than a
uri (at least, a uri in a non-clickable context).

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-18 Thread Henri Sivonen

On May 15, 2009, at 19:20, Manu Sporny wrote:


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that  
there
must be a single vocabulary document in existence, for all time, at  
one

location.


The flawed conclusion flows out of Follow Your Nose advocacy, and  
is not flawed if one takes Follow Your Nose seriously.


It seems to me that the positions that RDF applications should Follow  
Their Nose and that link rot is not dangerous (to RDF) are  
contradictory positions.


That link rot hasn't been a practical problem to the Semantic Web  
community suggests that applications don't really Follow Their Nose in  
practice. Can anyone point me to a deployed end user application that  
uses RDF internally and Follows Its Nose?


(For clarity: I'm not saying that link rot is dangerous to RDF apps.  
I'm saying that taking the position that it is not dangerous  
contradicts Follow Your Nose advocacy. I think Follow Your Nose is  
impractical on the Web scale and is alien to naming schemes used in  
technologies that have been successfully deployed on the Web scale  
[e.g. HTML, CSS, JavaScript, DOM and Unicode].)


- RDFa parsers can be given an override list of legacy vocabularies  
that

will be loaded from disk (from a cached copy).


Cache means that you can still go find the original and the cache is  
just nearer.


If a cached copy of the vocabulary cannot be found, it can be re- 
created from scratch if necessary.


Do any end user applications that use RDF internally provide a UI for  
installing local re-creations?


On May 15, 2009, at 20:25, Shelley Powers wrote:

Also don't lose sight that this is really no more serious an issue  
than, say, a company originating com.sun.* being purchased by  
another company, named com.oracle.*.  And you can't say, Well  
that's not the same, because it is.



It's not the same. A Java classloader doesn't Follow Its Nose. A  
classloader will find classes in my classpath even if there weren't a  
server at sun.com. Likewise, http://sun.com/foo RDF predicates would  
continue to work in applications that don't Follow Their Nose even  
if the server at sun.com disappeared.


However, if the com.sun.* classes were renamed to com.oracle.* and the  
com.sun.* copies withdrawn in a new release of a library, other  
classes that have been compiled against com.sun.* classes would cease  
to load. This is analogous to applications programmed to recognize http://web.resource.org/cc/* 
 predicates not recognizing http://creativecommons.org/ns#*  
predicates. (You can't Follow Your Nose from the former to the latter,  
BTW.)


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/




Re: [whatwg] Link rot is not dangerous

2009-05-18 Thread Dan Brickley

On 18/5/09 10:34, Henri Sivonen wrote:

On May 15, 2009, at 19:20, Manu Sporny wrote:


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location.


The flawed conclusion flows out of Follow Your Nose advocacy, and is
not flawed if one takes Follow Your Nose seriously.

It seems to me that the positions that RDF applications should Follow
Their Nose and that link rot is not dangerous (to RDF) are
contradictory positions.


That's a strong claim. There is certainly a balance to be found between 
taking advantage of de-referencable URIs and relying on their 
de-referencability. De-referencing is a privilege not a right, after all.


If I lost control of xmlns.com tommorrow, and it became un-rescuably 
owned by offshore spam-virus-malware pirates, that doesn't change 
history. For nine years, the FOAF documentation has lived there, and we 
can use URIs to ask other services about what they saw during that 
period: http://web.archive.org/web/*/http://xmlns.com/foaf/0.1/


Since there is useful information to know about FOAF properties and 
terms from its schema and human-oriented docs, it would be a shame if 
people ignored that. Since domain names can be lost, it would also be a 
shame if directly de-referencing URIs to the schema was the only way 
people could find that info. Fortunately, neither is the case.



That link rot hasn't been a practical problem to the Semantic Web
community suggests that applications don't really Follow Their Nose in
practice. Can anyone point me to a deployed end user application that
uses RDF internally and Follows Its Nose?


The search site, sindice.com does this:

Yes Sindice dereferences URIs it finds in RDF instance data, including 
class and property URIs. It performs OWL reasoning using the retrieved 
information, mostly to infer additional triples based on subclass and 
subproperty relationships. Doing this helps us to increase recall in 
queries. (from Richard Cyganiak, who I asked offlist for confirmation)


Whether you consider sindice.com end-user facing or not, I don't know. I 
put in roughly the same category as Google's Social Graph API. But it's 
a non-trivial implementation that aggregates and integrates a lot of data.


BTW here's another use case for identifying properties and classes by 
URI: we can decentralise the translation of their labels into other 
languages. Here are some Korean descriptions of FOAF, for example: 
http://svn.foaf-project.org/foaftown/foaf18n/foaf-kr.rdf


cheers,

Dan


Re: [whatwg] Link rot is not dangerous

2009-05-18 Thread Henri Sivonen

On May 18, 2009, at 14:45, Dan Brickley wrote:


On 18/5/09 10:34, Henri Sivonen wrote:
It seems to me that the positions that RDF applications should  
Follow

Their Nose and that link rot is not dangerous (to RDF) are
contradictory positions.


That's a strong claim. There is certainly a balance to be found  
between taking advantage of de-referencable URIs and relying on  
their de-referencability. De-referencing is a privilege not a right,  
after all.


If there's value in apps dereferencing namespace URIs, those URIs  
going undereferencable leads to loss of value. Hence, link rot would  
cause loss of value i.e. be 'dangerous' by breaking something.


If I lost control of xmlns.com tommorrow, and it became un-rescuably  
owned by offshore spam-virus-malware pirates, that doesn't change  
history. For nine years, the FOAF documentation has lived there, and  
we can use URIs to ask other services about what they saw during  
that period: http://web.archive.org/web/*/http://xmlns.com/foaf/0.1/


Do any RDF consumer apps that dereference namespace URIs actually fall  
back on web.archive.org?


If I'm a FOAF author, what recourse do I have if URI dereferencing- 
based functionality breaks in some apps due to xmlns.com going  
unavailable when other apps have hard-coded xmlns.com URIs so if I  
simply changed my predicates I'd break existing apps? At least authors  
who rely on Y!/AOL/Google serving JS libraries can start using a copy  
of any JS library on another CDN without changing how the script runs.


Since there is useful information to know about FOAF properties and  
terms from its schema and human-oriented docs, it would be a shame  
if people ignored that. Since domain names can be lost, it would  
also be a shame if directly de-referencing URIs to the schema was  
the only way people could find that info. Fortunately, neither is  
the case.


I wasn't talking about people but about apps dereferencing NS URIs to  
enable their functionality.



That link rot hasn't been a practical problem to the Semantic Web
community suggests that applications don't really Follow Their Nose  
in

practice. Can anyone point me to a deployed end user application that
uses RDF internally and Follows Its Nose?


The search site, sindice.com does this:


Thanks.


Whether you consider sindice.com end-user facing or not, I don't know.


I wouldn't characterize it as an end-user app. It exposes terms like  
RDF and triples and shows qnames to the user.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/




[whatwg] Link rot is not dangerous

2009-05-16 Thread Leif Halvard Silli

Geoffrey Sneddon Fri May 15 14:27:03 PDT 2009


On 15 May 2009, at 18:25, Shelley Powers wrote:

 One of the very first uses of RDF, in RSS 1.0, for feeds, is still  
 in existence, still viable. You don't have to take my word, check it  
 out yourselves:


 http://purl.org/rss/1.0/

Who actually treats RSS 1.0 as RDF? Every major feed reader just uses  
a generic XML parser for it (quite frequently a non-namespace aware  
one) and just totally ignores any RDF-ness of it.


What does it mean to treat as RDF? An RSS 1.0 feed is essentially a 
stream of items that has been lifted from the page(s) and placed in an 
RDF/XML feed. When I read e.g. 
http://www.w3.org/2000/08/w3c-synd/home.rss in Safari, I can sort the 
news items according to date, source, title. Which means - I think - 
that Safari sees the feed as machine readable.  It is certainly 
possible to do more - I guess, and Safari does the same to non-RDF 
feeds, but still. And search engines should have the same opportunities 
w.r.t. creating indexes based on RSS 1.0 as on RDFa. (Though here 
perhaps comes in between the fact that search engines prefers to help us 
locate HTML pages rather than feeds.)

--
leif halvard silli


Re: [whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-16 Thread Toby A Inkster

On 15 May 2009, at 17:20, Manu Sporny wrote:


The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is  
fairly

easy to recover from it, as outlined above.


I was talking about this recently somewhere (can't remember where).

The RDF model is different from {key:value} models in that it has a  
third component - a subject. This means that while a description for  
http://xmlns.com/foaf/0.1/Person (which I'll refer to as  
'foaf:Person' from now on, for brevity) can be found at the URL  
foaf:Person, it's also possible for descriptions of foaf:Person to be  
found elsewhere.


While the description for foaf:Person at foaf:Person is clearly much  
easier to find than other descriptions for foaf:Person, under the RDF  
model, they are all afforded equal weight.


If foaf:Person disappeared tomorrow, and even if I couldn't find an  
alternative source for that definition, the URI would still not be  
useless. I'd still know, say, that Toby Inkster is a foaf:Person, and  
Manu Sporny is a foaf:Person and from that I'd be able to conclude  
that they're the same sort of thing in some way.


Given enough instance data like that, I might even be able to analyse  
the instance data, looking at what all the instances of foaf:Person  
had in common and rediscover the original definition of foaf:Person.


The ability to dereference an RDF class or property to discover more  
about it is very useful. A data format without that ability is all  
the poorer for not having it. But, when that dereferencing fails, all  
is not lost.


So when in use cases, RDF fans talk about it being 'essential' to be  
able to follow their noses to definitions of terms, what is meant is  
that it's essential that a mechanism exists to enable this technique  
- it not essential that the definitions are always found.


--
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk



Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Maciej Stachowiak


On May 15, 2009, at 11:08 PM, Leif Halvard Silli wrote:


Geoffrey Sneddon Fri May 15 14:27:03 PDT 2009


On 15 May 2009, at 18:25, Shelley Powers wrote:

 One of the very first uses of RDF, in RSS 1.0, for feeds, is  
still   in existence, still viable. You don't have to take my  
word, check it   out yourselves:


 http://purl.org/rss/1.0/

Who actually treats RSS 1.0 as RDF? Every major feed reader just  
uses  a generic XML parser for it (quite frequently a non-namespace  
aware  one) and just totally ignores any RDF-ness of it.


What does it mean to treat as RDF? An RSS 1.0 feed is  
essentially a stream of items that has been lifted from the  
page(s) and placed in an RDF/XML feed. When I read e.g. http://www.w3.org/2000/08/w3c-synd/home.rss 
 in Safari, I can sort the news items according to date, source,  
title. Which means - I think - that Safari sees the feed as machine  
readable.  It is certainly possible to do more - I guess, and  
Safari does the same to non-RDF feeds, but still. And search engines  
should have the same opportunities w.r.t. creating indexes based on  
RSS 1.0 as on RDFa. (Though here perhaps comes in between the fact  
that search engines prefers to help us locate HTML pages rather than  
feeds.)


Safari's underlying feed parsing code completely ignores the RDF  
nature of RSS 1.0. It parses it the same way as an RSS 2.0 or Atom  
feed, which is to say parsing as XML (possibly broken XML in the case  
of RSS variants) and then examining the parsed XML in a completely ad- 
hoc fashion.


Regards,
Maciej



Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Toby Inkster
Philip Taylor wrote:

 The source data is the list of common RDF namespace URIs at
 http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
 from three years ago. Out of those 284:
  * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
 ought to exist. In the other cases, it'd be possible that only the
 prefix+suffix URIs are meant to exist. Some of the cases are just
 typos, but I'm not sure how many.)
  * 2 are Forbidden. (Of those, 1 looks like a typo.)
  * 2 are Bad Gateway.
  * 22 could not connect to the server. (Of those, 2 weren't http://
 URIs, and 1 was a typo. The others represent 13 different domains.)

While this analysis is interesting, looking at the 56 which 404, it
doesn't seem like a massive loss to me. Some of them are clearly typos
(e.g. DOAP and RSS syndication which are both on HTTP 200 and HTTP 3xx
lists in their correct form). In many cases I think you'll find that
it's not that the link has rotted with time, but that there was
*never* a file at the other end.

Even the ones which are genuinely lost are probably only used by a
handful of people. The *really* commonly used URIs - RDF, RDFS, OWL,
FOAF, Dublin Core (1.1 and Terms), RSS (1.0, plus commonly used
modules), SKOS, SIOC, dbpedia, geo, Geonames, vCard and iCalendar - all
seem to have been pretty stable so far.

Judging the stability of RDF URIs by looking at the 284 most common
namespace URIs is akin to judging the provision of light rail in British
cities by looking at the UK's 284 most populated areas - the results
would actually be more helpful if you restricted yourself to a smaller
sample.

Lastly, the RDF model tends to be very resilient against loss of
information anyway. Generally, data tends to be structured such that if
a collection of triples is true, any subset is also true. So if the
meaning of certain triples within a document is lost because of link
rot, the document as a whole will probably still be useful.

-- 
Toby Inkster m...@tobyinkster.co.uk


Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Tab Atkins Jr.
2009/5/16 Laurens Holst laurens.nos...@grauw.nl:
 Tab Atkins Jr. schreef:
 Once you remove discovery as a strong requirement, then you remove the
 need for large urls, and that removes the need for CURIEs, or any
 other form of prefixing.  You still want to uniquify your identifiers
 to avoid accidental clashes, but that's not that hard, nor is it
 absolutely necessary.  The system can be robust and usable even with a
 bit of potential ambiguity if small authors design their private
 vocabs badly.  As a bonus, everything gets simpler.  Essentially it
 devolves into something relatively close to Ian's microdata proposal,
 perhaps with datatype added in (though I do question how necessary
 that is, given a half-intelligent parser can recognize things as
 numbers or dates).

 Ho, ho, you’re making a big leap there! By me explaining that dereferencible
 URIs are not needed to make RDF work on a core level, which makes RDF
 robust, do not jump to the conclusion that it is of no benefit! URIs are
 there for the benefit of linking, and help discoverability a lot (just like
 HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
 is effective. Incidentally, if an ontology disappears from its original
 address, this kind of spidering will likely lead you to a copy thereof
 stored elsewhere. For example on a different spider which has the triples
 cached.

You had just stated in the previous email, however, that few (if any)
major consumers of RDFa *use* what is located on the far end of the
URI.  If they're not even paying attention to it, where is the value
in it?

I don't really understand the 'discoverability' argument here, at
least in the context of it being similar to HTML hyperlinks.
Hyperlinks are useful for people because they make it simple to
navigate to a new page.  You just click and it works, no need to
copypasta the address into a new browser window.

I'm also not sure how a rotted link helps you compare vocabularies
with other spiders, which in a hypothetical world you are
communicating with (at this point we're *far* into theory, not
practice).  Any uniquifier would allow you to compare things in the
same way, no?

 You are now only considering the ontologies, that is, types and properties.
 You’re forgetting (or ignoring) that in RDF, objects are also named with
 URIs so that data at other locations can refer to it. You know, that ‘web of
 linked data’ people refer to, core principle of RDF. No ‘simple’ scheme
 based on what Ian proposed can provide a sufficient level of uniqueness for
 that. URIs are the best and most natural fit for use as web-scale
 identifiers.

Define 'sufficient', as used here.  I believe that this is an area
where absolute uniqueness is not a requirement.  Worst case, you get a
little bit of data pollution with weird triples being produced by
badly-written pages.  Perhaps your browser offers to add an event to
your calendar when no event shows up on the page, or a fraction of a
search engine's microdata collection is spurious.  Neither of these
are big deals.

That being said, I agree that URIs provide a very convenient source of
uniqueness.  Ian's microdata allows them to be used either in normal
form or in reverse-domain form; either way provides the necessary
uniqueness.

 And then there is of course also the thing that there is already an existing
 framework, which has already been here for a long time, has had a lot of
 clever people work on it and is gaining in popularity, and here we have
 ‘HTML5’ wanting to reinvent the wheel and making an entirely new framework
 ‘just for them’. You’d think that of all places, in a standards body people
 would be compelled to adopt existing standards :).

There are compelling reasons to make any proposal *compatible* with
RDF at the least.  Ian's microdata does this, though not
perfectly/completely.  I've said in another thread that I dislike
*all* of the inline microdata proposals.  RDFa sucks, Ian's microdata
sucks, they all suck.  They force structure completely inline, which
solves what I feel is a minority issue (carrying microdata while
copypasting sourcecode) while introducing several larger downsides
(carrying possibly *incorrect* microdata while copypasting source,
duplication of meta structure when there is a regular page structure
that can obviate this, etc.).  It's the exact same problems that
inline event handlers or inline @style attributes have.  I think Ian
is trying to limit the suckiness by at least making it as simple as
possible to write.  It's probably half as difficult or less to write
properly, while solving 90% or more of the cases that RDFa does.  This
is an effort that I'm in favor of.

I won't be using RDF in my pages at all unless I know that I can use
something like RDF-EASE or CRDF; they allow me to just write my page
as normal, then specify what the page's data means in a separate file.
 Plus, honestly, CRDF's inline syntax seems just as expressive as
microdata and RDFa, while 

Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Geoffrey Sneddon


On 16 May 2009, at 07:08, Leif Halvard Silli wrote:


Geoffrey Sneddon Fri May 15 14:27:03 PDT 2009


On 15 May 2009, at 18:25, Shelley Powers wrote:

 One of the very first uses of RDF, in RSS 1.0, for feeds, is  
still   in existence, still viable. You don't have to take my  
word, check it  out yourselves:


 http://purl.org/rss/1.0/

Who actually treats RSS 1.0 as RDF? Every major feed reader just  
uses  a generic XML parser for it (quite frequently a non-namespace  
aware one) and just totally ignores any RDF-ness of it.


What does it mean to treat as RDF? An RSS 1.0 feed is  
essentially a stream of items that has been lifted from the  
page(s) and placed in an RDF/XML feed. When I read e.g. http://www.w3.org/2000/08/w3c-synd/home.rss 
 in Safari, I can sort the news items according to date, source,  
title. Which means - I think - that Safari sees the feed as machine  
readable.  It is certainly possible to do more - I guess, and  
Safari does the same to non-RDF feeds, but still. And search engines  
should have the same opportunities w.r.t. creating indexes based on  
RSS 1.0 as on RDFa. (Though here perhaps comes in between the fact  
that search engines prefers to help us locate HTML pages rather than  
feeds.)


I mean using an RDF processor, and treating it as an RDF graph.  
Everything just creates from an XML stream (or object model) a bunch  
of items with a certain title, date, and description, and acts on that  
(and parses it out in a format specific manner, so it creates the same  
sort of item for, e.g., Atom) — it doesn't actually use an RDF graph  
for it. If you can find any widely used software that actually treats  
it as an RDF graph I'd be interested to know.



--
Geoffrey Sneddon
http://gsnedders.com/
http://simplepie.org/



Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Laurens Holst

Tab Atkins Jr. schreef:

Ho, ho, you’re making a big leap there! By me explaining that dereferencible
URIs are not needed to make RDF work on a core level, which makes RDF
robust, do not jump to the conclusion that it is of no benefit! URIs are
there for the benefit of linking, and help discoverability a lot (just like
HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
is effective. Incidentally, if an ontology disappears from its original
address, this kind of spidering will likely lead you to a copy thereof
stored elsewhere. For example on a different spider which has the triples
cached.



You had just stated in the previous email, however, that few (if any)
major consumers of RDFa *use* what is located on the far end of the
URI.  If they're not even paying attention to it, where is the value
in it?
  


I said that the ontologies were not used by many RDF consumers. This is 
because they can be computationally expensive, especially for large data 
sets, not because they are useless.


I think the most clear way I can put this is by comparison:

Your argument is like arguing against XML or JSON Schemas, concluding 
that because they are externally referenced and not used by most XML or 
JSON applications, they are useless, and in fact that XML and JSON 
themselves are useless. This is clearly false; removing a reference to a 
schema from a document, or a document not having a schema, does not make 
the document itself useless, nor the document format it is expressed in.


Although RDF Schema and OWL are definitely part of the ‘RDF ecosystem’, 
they are built on top of the base RDF framework and they are not in 
themselves required for RDF to function. However the schema does provide 
a useful description about the document structures and has the ability 
to express certain semantics, and is thus a worthy technology in its own 
right.



I don't really understand the 'discoverability' argument here, at
least in the context of it being similar to HTML hyperlinks.
Hyperlinks are useful for people because they make it simple to
navigate to a new page.  You just click and it works, no need to
copypasta the address into a new browser window.
  


By what means the user dereferences the link is not relevant. The fact 
that an URI is there, identifying a unique location on the world wide 
web, and thus contributing to the web of linked documents that we call 
the World Wide Web. Without links and URIs, there would be no ‘web’. 
There would be a big set of networked yet isolated computers that all 
live in their own walled garden.


Links provide discoverability of data provided elsewhere, by indicating 
a location. Users can find other documents because of this. Search 
engines like Google can spider the web based on this.


The Web of Linked Data is Tim Berners-Lee’s vision of a WWW for data.


I'm also not sure how a rotted link helps you compare vocabularies
with other spiders, which in a hypothetical world you are
communicating with (at this point we're *far* into theory, not
practice).  Any uniquifier would allow you to compare things in the
same way, no?
  


Just a simple rdfs:seeAlso statement referencing it in one single place 
will allow a spider to ‘follow its own nose’ and find the triples of the 
ontology in the republished location. This republication can be 
anywhere, a new ontology location, or a copy cached by another spider 
that republishes the triples it harvests on the web (such as archive.org 
[1]).


I agree we’re getting far into the theory-not-practice realm, which is 
why Shelley is right in saying that in practice vocabularies are served 
from a location that is well cared for, e.g. using services like purl to 
provide permanent URLs, or having a solid organisational backing, and 
Philip Taylor’s list [2] does not do much to discredit this.


[Side note: To point out some flaws in Philip’s list, many of the sites 
in his ‘404’ and ‘not responding’ list are experimental URLs. 
Additionally, the list fails to list usage frequency. Finally, it does 
not (and can not, obviously) list whether there was any RDF Schema at 
those locations in the first place. Because, as I explained before, I 
can make up the following RDF triple right here on the spot, and there 
would be nothing wrong with it:


_:a rdf:type http://grauw.nl/rdf#Game

The type referenced in this triple’s subject has no ontology at this 
location. The fact that it is a type is inferred by it being referenced 
through rdf:type, and that is enough. There is no requirement that this 
type resolves into a document containing RDF Schema triples. A creative 
example of this on the list is “java:java.util.Date”.]



You are now only considering the ontologies, that is, types and properties.
You’re forgetting (or ignoring) that in RDF, objects are also named with
URIs so that data at other locations can refer to it. You know, that ‘web of
linked data’ people refer to, core principle of RDF. No ‘simple’ 

[whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Manu Sporny
Kristof Zelechovski wrote:
 Therefore, link rot is a bigger problem for CURIE
 prefixes than for links.

There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if necessary.

The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Dan Brickley

On 15/5/09 18:20, Manu Sporny wrote:

Kristof Zelechovski wrote:

Therefore, link rot is a bigger problem for CURIE
prefixes than for links.


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if necessary.

The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.


A few other points:

1. It's for the community of vocabulary-creators to help each other out 
w.r.t. hosting/publishing these: I just nudged a friend to put another 5 
years on the DNS rental for a popular namespace. I think we should put a 
bit more structure around these kinds of habit, so that popular 
namespaces won't drop off the Web through accident.


2. digitally signing the schemas will become part of the story, I'm 
sure. While it's a bit fiddly, there are advantages to having other 
mechanisms beyond URI de-referencing for knowing where a schema came from


3. Parties worried about external dependencies when using namespaces can 
always indirect through their own namespace, whose schema document can 
declare subclass/subproperty relations to other URIs


cheers

Dan




Re: [whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Kristof Zelechovski
I understand that there are ways to recover resources that disappear from
the Web; however, the postulated advantage of RDFa you can go see what it
means simply does not hold.  The recovery mechanism, Web search/cache,
would be as good for CURIE URL as for domain prefixes.  Creating a redirect
is not always possible and the built-in redirect dictionary (CURIE catalog?)
smells of a central repository.  This is no better than public entity
identifiers in XML.

Serving the vocabulary from the own domain is not always possible, e.g. in
case of reader-contributed content, and only guarantees that the vocabulary
will be alive while it is supported by the domain owner.  (WHATWG wants HTML
documents to be readable 1000 years from now.)  It is not always practical
either as it could confuse URL-based tools that do not retrieve the
resources referenced.

All this does not imply, of course, that RDFa is no good.  It is only
intended to demonstrate that the postulated advantage of the CURIE lookup is
wishful thinking.

Best regards,
Chris



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Dan Brickley wrote:

On 15/5/09 18:20, Manu Sporny wrote:

Kristof Zelechovski wrote:

Therefore, link rot is a bigger problem for CURIE
prefixes than for links.


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if 
necessary.


The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.


A few other points:

1. It's for the community of vocabulary-creators to help each other 
out w.r.t. hosting/publishing these: I just nudged a friend to put 
another 5 years on the DNS rental for a popular namespace. I think we 
should put a bit more structure around these kinds of habit, so that 
popular namespaces won't drop off the Web through accident.


2. digitally signing the schemas will become part of the story, I'm 
sure. While it's a bit fiddly, there are advantages to having other 
mechanisms beyond URI de-referencing for knowing where a schema came from


3. Parties worried about external dependencies when using namespaces 
can always indirect through their own namespace, whose schema document 
can declare subclass/subproperty relations to other URIs


cheers

Dan




The most important point to take from all of this, though, is that link 
rot within the RDF world is an extremely rare and unlikely occurrence. 
I've been working with RDF for close to a decade, and link rot has never 
been an issue.


One of the very first uses of RDF, in RSS 1.0, for feeds, is still in 
existence, still viable. You don't have to take my word, check it out 
yourselves:


http://purl.org/rss/1.0/

Even if, and I want to strongly emphasize if link rot does occur, both 
Manu and Dan have demonstrated multiple ways of ensuring that no meaning 
is lost, and nothing is broken. However, I hope that people are open 
enough to take away from their discussions that  they are trying to 
treat this concern respectfully, and trying to demonstrate that there's 
more than one solution. Not that this forms a proof that Oh my god, 
if we use RDF, we're doomed!


Also don't lose sight that this is really no more serious an issue than, 
say, a company originating com.sun.* being purchased by another 
company, named com.oracle.*.  And you can't say, Well that's not the 
same, because it is.


The only safe bet is to designate some central authority and give them 
power over every possible name. Then we run the massive risk of this 
system failing (and this applies to microdata's reverse DNS as well as 
RDF's URI), or it being taken over by an entity that sees such a data 
store as a way to make a great profit. We also defeat the very principle 
on which semantic data on the web abides, and that's true whether you're 
support microdata or RDF.


Shelley






Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
Classes in com.sun.* are reserved for Java implementation details and should
not be used by the general public.  CURIE URL are intended for general use.

So, I can say Well, it is not the same, because it is not.

Cheers,
Chris



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Manu Sporny
Kristof Zelechovski wrote:
 I understand that there are ways to recover resources that disappear from
 the Web; however, the postulated advantage of RDFa you can go see what it
 means simply does not hold. 

This is a strawman argument more below...

 All this does not imply, of course, that RDFa is no good.  It is only
 intended to demonstrate that the postulated advantage of the CURIE
 lookup is wishful thinking.

That train of logic seems to falsely conclude that if something does not
hold true 100% of the time, then it cannot be counted as an advantage.

Example:

Since the postulated advantage of RAID-5 is that a disk array is
unlikely to fail due to a single disk failure, and since it is possible
for more than one disk to fail before a recovery is complete, one cannot
call running a disk array in RAID-5 mode an advantage to not running
RAID at all (because failure is possible).

or

Since the postulated advantage of CURIEs is that you can go see what it
means and it is possible for a CURIE defined URL to be unavailable, one
cannot call it an advantage because it may fail.

There are two flaws in the premises and reasoning above, for the CURIE case:

- It is assumed that for something to be called an 'advantage' that it
  must hold true 100% of the time.
- It is assumed that most proponents of RDFa believe that you can go
  see what it means holds at all times - one would have to be very
  deluded to believe that.

 The recovery mechanism, Web search/cache,
 would be as good for CURIE URL as for domain prefixes.  Creating a redirect
 is not always possible and the built-in redirect dictionary (CURIE catalog?)
 smells of a central repository. 

Why does having a file sitting on your local machine that lists
alternate vocabulary files for CURIEs smell of a central repository?
Perhaps you're assuming that the file would be managed by a single
entity? If so, it wouldn't need to be and that was not what I was proposing.

 Serving the vocabulary from the own domain is not always possible, e.g. in
 case of reader-contributed content, 

This isn't clear, could you please clarify what you mean by
reader-contributed content?

 and only guarantees that the vocabulary
 will be alive while it is supported by the domain owner.

This case and it's solution was already covered previously. Again - if
the domain owner disappears, the domain disappears, or the domain owner
doesn't want to cooperate for any reason, one could easily set up an
alternate URL and instruct the RDFa processor to re-direct any
discovered CURIEs that match the old vocabulary to the new
(referenceable) vocabulary.

 (WHATWG wants HTML documents to be readable 1000 years from now.)  

Is that really a requirement? What about external CSS files that
disappear? External Javascript files that disappear? External SVG files
that disappear? All those have something to do with the document's
human/machine readability. Why is HTML5 not susceptible to link rot in
the same way that RDFa is susceptible to link rot?

Also, why 1000 years, that seems a bit arbitrary? =P

 It is not always practical either as it could confuse URL-based 
 tools that do not retrieve the resources referenced.

Could you give an example of this that wouldn't be a bug in the
dereferencing application? How could a non-dereference-able URL confuse
URL-based tools?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



[whatwg] Link rot is not dangerous

2009-05-15 Thread Manu Sporny
Tab Atkins Jr. wrote:
 Reversed domains aren't *meant* to link to anything.  They shouldn't
 be parsed at all.  They're a uniquifier so that multiple vocabularies
 can use the same terms without clashing or ambiguity.  The Microdata
 proposal also allows normal urls, but they are similarly nothing more
 than a uniquifier.
 
 CURIEs, at least theoretically, *rely* on the prefix lookup.  After
 all, how else can you tell that a given relation is really the same
 as, say, foaf:name?  If the domain isn't available, the data will be
 parsed incorrectly.  That's why link rot is an issue.

Where in the CURIE spec does it state or imply that if a domain isn't
available, that the resulting parsed data will be invalid?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
Serving the RDFa vocabulary from the own domain is not always possible, e.g.
when a reader of a Web site is encouraged to post a comment to the page she
reads and her comment contains semantic annotations.

The probability of a URL becoming unavailable is much greater than that of
both mirrored drives wearing out at the same time.  (data mirroring does not
claim it protects from fire, water, high voltage, magnetic storms,
earthquakes and the like; it only protects you from natural wear.)  The
probability of ultimately losing data stored in one copy is 1; the
probability of a URL going down is close to 1.  So, RAID works in most
cases, CURIE URL do not (ultimately) work in most cases.

Disappearing CSS is not a problem for HTML because CSS does not affect the
meaning of the page.

Disappearing scripts are a problem for HTML but they are not a problem for
HTML *data*.  In other words, script-generated content is not guaranteed to
survive, and there is nothing we can do about that except for a warning.
Such content cannot be HTML-validated either.  In general, scripts are best
used (and intended) for behavior, not for creating content.

External SVG files do not describe existing content, they *are* (embedded)
content.  If a HTML file disappears, it becomes unreadable as well, but that
problem obviously cannot be solved from within HTML :-)

HTML should be readable in 1000 years from now was an attempt to visualize
the intention of persistence.  It should not be understood as best before,
of course.

If the author chooses to create a redirect to a well-known vocabulary using
a dependent vocabulary stored at his own site in order to prevent link rot,
tools that recognize vocabulary URL without reading the corresponding
resources will be unable to recognize the author's intent, and for the tools
that do read the original vocabulary will still be unavailable, so this
method causes more problems than it solves.

Cheers,
Chris




Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Kristof Zelechovski wrote:

Classes in com.sun.* are reserved for Java implementation details and should
not be used by the general public.  CURIE URL are intended for general use.

So, I can say Well, it is not the same, because it is not.

Cheers,
Chris


  
But we're not dealing with Java anymore. We're dealing with using 
reversed DNS concatenated with some kind of default URI, to create some 
kind of bastardized URL, which actually is valid, though incredibly 
painful to see, and can be implied to actually take one to to a web address.


You don't have to take my word for it -- check out Philip's testing demo 
for microdata. You get triples with the following:


http://www.w3.org/1999/xhtml/custom#com.damowmow.cat

http://philip.html5.org/demos/microdata/demo.html#output_ntriples

Not only do you face problems with link rot, you also face a significant 
amount of confusion, as people look at that and go, What the hell is 
that?


Oh, and you can say, Well, but we don't _mean_ anything by it -- but 
what does that have to do with anything? People don't go running the 
spec everytime they see something. They look at this thing and think, 
Oh, a link. I wonder where it goes. You go ahead and try it, and 
imagine for a moment the confusion when it goes absolutely no where. 
Except that I imagine the W3C folks are getting a little annoyed with 
the HTML WG now, for allowing this type of thing in, generating a whole 
bunch of 404 errors for the web master(s).


But hey, you've given me another idea. I think I'll create my own 
vocabulary items, with the reversed DNS 
http://www.w3.org/1999/xhtml/custom#com.sun.*. No, maybe 
http://www.w3.org/1999/xhtml/custom#com.opera.*. Nah, how about 
http://www.w3.org/1999/xhtml/custom#com.microsoft.*. Yeah, that's cool. 
And there is no mechanism is place to prevent this, because unlike 
regular URIs, where the domain is actually controlled by specific 
entity, you've created the world famous W3C fudge pot. Anything goes.


I can't wait for the lawsuits on this one. You think that cybersquatting 
is an issue on the web, or facebook, or Twitter, wait until you see 
people use com.microsoft.*.


Then there's the vocabulary that was created by foobar.com, that people 
think, Hey, cool, I'll use that...whatever it is. After all, if you 
want to play with the RDF kids, your vocabularies have to be usable by 
other people.


But Foobar takes a dive in the dot com pool, and foobar.com gets taken 
over by a porn establishment. Yeah, I can't wait for people to explain 
that one to the boss. Just because it doesn't link, won't mean it won't 
end up on Twitter as a big, huge joke.


If you want to find something to criticize, I think it's important to 
realize that hey, folks, you've just stepped over the line, and you're 
now in the Zone of Decentralization. Whatever impacts us, babes, impacts 
all of you. Because if you look at Philip's example, you're going to see 
the same set of vocabulary URIs we're using for RDF right now, as 
microdata uses our stuff, too. Including the links that are all 
trembling on the edge on the self-implosion.


So the point of all of this is moot.

But it was fun. Really fun. Have a great weekend.

Shelley


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Philip Taylor
On Fri, May 15, 2009 at 6:25 PM, Shelley Powers
shell...@burningbird.net wrote:
 The most important point to take from all of this, though, is that link rot
 within the RDF world is an extremely rare and unlikely occurrence.

That seems to be untrue in practice - see
http://philip.html5.org/data/rdf-namespace-status.txt

The source data is the list of common RDF namespace URIs at
http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
from three years ago. Out of those 284:
 * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
ought to exist. In the other cases, it'd be possible that only the
prefix+suffix URIs are meant to exist. Some of the cases are just
typos, but I'm not sure how many.)
 * 2 are Forbidden. (Of those, 1 looks like a typo.)
 * 2 are Bad Gateway.
 * 22 could not connect to the server. (Of those, 2 weren't http://
URIs, and 1 was a typo. The others represent 13 different domains.)

(For the URIs which returned Redirect responses, I didn't check what
happens when you request the URI it redirected to, so there may be
more failures.)

Over a quarter of the most common namespace URIs don't resolve
successfully today, and most of those look like they should have
resolved when they were originally used, so link rot seems to be
common.

(Major vocabularies like RSS and FOAF are likely to exist for a long
time, but they're the easiest cases to handle - we could just
pre-define the prefixes rss: and foaf: and have a centralised
database mapping them onto schemas/documentation/etc. It seems to me
that URIs are most valuable to let any tiny group make one for their
rarely-used vocabulary, and be guaranteed no name collisions without
needing to communicate with a centralised registry to ensure
uniqueness; but it's those cases that are most vulnerable to link rot,
and in practice the links appear to fail quite often.)

(I'm not arguing that link rot is dangerous - just that the numbers
indicate it's a common situation rather than an extremely rare
exception.)

-- 
Philip Taylor
exc...@gmail.com


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Tab Atkins Jr.
On Fri, May 15, 2009 at 1:32 PM, Manu Sporny mspo...@digitalbazaar.com wrote:
 Tab Atkins Jr. wrote:
 Reversed domains aren't *meant* to link to anything.  They shouldn't
 be parsed at all.  They're a uniquifier so that multiple vocabularies
 can use the same terms without clashing or ambiguity.  The Microdata
 proposal also allows normal urls, but they are similarly nothing more
 than a uniquifier.

 CURIEs, at least theoretically, *rely* on the prefix lookup.  After
 all, how else can you tell that a given relation is really the same
 as, say, foaf:name?  If the domain isn't available, the data will be
 parsed incorrectly.  That's why link rot is an issue.

 Where in the CURIE spec does it state or imply that if a domain isn't
 available, that the resulting parsed data will be invalid?

Assume a page that uses both foaf and another vocab that subclasses
many foaf properties.  Given working lookups for both, the rdf parser
can determine that two entries with different properties are really
'the same', and hopefully act on that knowledge.

If the second vocab 404s, that information is lost.  The parser will
then treat any use of that second vocab completely separately from the
foaf, losing valuable semantic information.

(Please correct any misunderstandings I may be operating under; I'm
not sure how competent parsers currently are, and thus how much they'd
actually use a working subclassed relation.)

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Philip Taylor wrote:

On Fri, May 15, 2009 at 6:25 PM, Shelley Powers
shell...@burningbird.net wrote:
  

The most important point to take from all of this, though, is that link rot
within the RDF world is an extremely rare and unlikely occurrence.



That seems to be untrue in practice - see
http://philip.html5.org/data/rdf-namespace-status.txt

The source data is the list of common RDF namespace URIs at
http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
from three years ago. Out of those 284:
 * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
ought to exist. In the other cases, it'd be possible that only the
prefix+suffix URIs are meant to exist. Some of the cases are just
typos, but I'm not sure how many.)
 * 2 are Forbidden. (Of those, 1 looks like a typo.)
 * 2 are Bad Gateway.
 * 22 could not connect to the server. (Of those, 2 weren't http://
URIs, and 1 was a typo. The others represent 13 different domains.)

(For the URIs which returned Redirect responses, I didn't check what
happens when you request the URI it redirected to, so there may be
more failures.)

Over a quarter of the most common namespace URIs don't resolve
successfully today, and most of those look like they should have
resolved when they were originally used, so link rot seems to be
common.

(Major vocabularies like RSS and FOAF are likely to exist for a long
time, but they're the easiest cases to handle - we could just
pre-define the prefixes rss: and foaf: and have a centralised
database mapping them onto schemas/documentation/etc. It seems to me
that URIs are most valuable to let any tiny group make one for their
rarely-used vocabulary, and be guaranteed no name collisions without
needing to communicate with a centralised registry to ensure
uniqueness; but it's those cases that are most vulnerable to link rot,
and in practice the links appear to fail quite often.)

(I'm not arguing that link rot is dangerous - just that the numbers
indicate it's a common situation rather than an extremely rare
exception.)

  
Philip, I don't think the occurrence of link rot causing problems in the 
RDF world is all that common, but thanks for looking up this data. 
Actually I will probably quote your info on my next writing at my weblog.


I'd like to be dropped from any additional emails in this thread. After 
all, I  have it on good authority I'm not open for rational discussion. 
So I'll leave this type of thing to you guys.


Thanks

Shelley


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
The problem of cybersquatting of oblique domains is, I believe, described
and addressed in tag URI scheme definition [RFC4151], which I think is
something rather similar to the constructs used for HTML microdata.  I think
that document is relevant not only to this discussion but to the whole
concept.
IMHO,
Chris