Ian Davis wrote:
On Thu, Nov 4, 2010 at 6:08 PM, Nathan <[email protected]> wrote:
You see it's not about what we say, it's about what other say, and if 10
huge corps analyse the web and spit out billions of triples saying that
anything 200 OK'd is a document, then at the end when we consider the RDF
graph of triples, all we're going to see is one statement saying something
is a "nonInformationResource" and a hundred others saying it's a document
and describing what it's about together with it's format and so on.
I honestly can't see how anything could reason over a graph that looked like
that.
I honestly believe that's the least of our worries. How often do you
need to determine whether something in the universe of discourse is an
electronic document or not compared with all the other questions you
might be asking of your data. I might conceivable ask "show me all the
documents about this toucan" but I'd much rather ask "show me all the
data about this toucan"
I think we all would, but we'd also like to see the data about this
toucan rather than about this toucan and the document that describes it.
To be clear, the issue is not </toucan> ex:isDescribedBy </doc>
The issue is </toucan> ex:isDescribedBy </toucan>
And when you 200 OK, that's what you'll get in your graph. TBH with any
slash URI it's probably what you'll end up getting.
However, I'm also very aware that this all may be moot any ways, because
many crawlers and HTTP agents just treat HTTP like a big black box, they
don't know there ever was a 303 and don't know what the end URI is (even
major browser vendors like chrome do this, setting the base wrong and
everything) - so even the current 303 pattern doesn't keep different things
with different names for /slash URIs in all cases.
That's true. I don't suppose any of the big crawlers care about the
semantics of 303 because none of them care about the difference
between a thing and its description. For example the Google OpenSocial
doesn't give a hoot about the difference and yet seems to still
function. As I say above, this document/thing distinction is actually
quite small area to focus on compared with the the real problems of
analysing the web of data as a whole.
Well yeah, one could take the entire graph, stick it in a triple store,
and look then strip all triples which can be inferred as having a class
Document. To be left with just the data :) [ which obviously won't
include your /toucan /doc or /anna ]
Best,
Nathan