Ian Davis wrote:
On Fri, Nov 5, 2010 at 10:05 AM, Nathan <[email protected]> wrote:
Not at all, I'm saying that if big-corp makes a /web crawler/ that describes
what documents are about and publishes RDF triples, then if you use 200 OK,
throughout the web you'll get (statements similar to) the following
asserted:
</toucan> :primaryTopic dbpedia:Toucan ; a :Document .
i don't think so. If the bigcorp is producing triples from their crawl
then why wouldn't they use the triples they are sent (and/or
content-location, link headers etc). The above looks like what you'd
get from a third party translation of the crawl results without the
context of actually having fetched the data from the URI.
Wouldn't be too sure about that, even the major browser vendors get it
completely wrong, for instance do an XHR for a URI in chrome and even if
there's 10 redirects in a chain, the base and the document uri is that
which you requested. This is true all over the place, from using
file_get_content's in PHP to most HTTP clients in any language, the
pattern is simply:
requested-uri = "http://...";
doc = get(requested-uri);
info at the end is almost always ( requested-uri, doc ) - in fact often
there's not even any way to get the redirected to URI back out from the
HTTP client.
As for using the triples they are sent, all you need to do is consider
an HTML crawler running over RDFa documents
If the bigcorp is not linked data aware then today they will follow
the 303 redirect as a standard HTTP redirect. rfc2616 says that the
target URI is not a substitute for the original URI but just an
alternate location to get a response from. The bigcorp will simply
infer the statements you list above **even though there is a 303
redirect**.
exactly, kind of semi-damning all /slash URIs.. or atleast requiring a
load of provenance data.
As rfc2616 itself points out, many user agents treat 302 and 303
interchangeably. Only linked data aware agents will ascribe special
meaning to 303 and they're the ones that are more likely to use the
data they are sent.
God knows why linked data clients are ascribing any meaning to 303, the
pattern's there to ensure that a thing and the doc describing it have
different URIs, and to ensure that people don't say that thing is a
document. Although it's not exactly worked out that way. The use of the
particular status code 303 is only relevant if your ascribing meaning to
the response code of GETs, if your not then 3** will do the same job.
Out of interest, just who is trawling the web and going "301 that's an
IR, 303 that's maybe not an IR, 302 that's an IR".
My personal opinion on the entire thing is as simple as give different
things different names, if there's a good chance something will think
that thing is a different kind of thing by using a particular uri scheme
or style (like saying mailto:[email protected] is a mailbox) then avoid it if
it conflicts with the kind of thing you're describing. IMO slash URIs
are often taken to mean documents, so I avoid them. You don't, so
regardless of what status code you use, or how you deploy data, that
conflation will be there. Thus my take away on the whole thing for you
(and even though it goes against tag) is just 200 your uri's if you want
to, but don't go around telling the rest of the world to do it and
promote it as a good pattern, as it's not. tdb scheme or frag uris
address the issues, whilst introducing others, but at least the data's
somewhat cleaner.
I'll roll with the "who cares" line of thinking, I certainly don't care
how you or dbpedia or foaf or dc publish your data, so long as I can
deref it, but for god sake don't go telling everybody using slash URIs
and 200 is "The Right Thing TM"
Best,
Nathan