On 11/4/10 4:00 PM, Hugh Glaser wrote:
Pretty much +1.
Of course, being a Good Citizen of the LOD Community, I have always done
the 303 thing (or hash), as recommended in the relevant docs, even if not
mandated. This was despite the fact that I disagreed that it was worth the
candle, compared with many of the more pragmatic, and social issues that
are being raised here.
A community as small as ours (still) is can ill afford to have splintering.

But I do find myself wondering whether things have changed since that
decision was made, but certainly sufficient time has passed to review it,
and there have been some changes out there since.

By the way, on my travels as a LD consumer, I do find a lot of LD that
does just do 200 - arrived just a little while ago:
curl -i -H "Accept: application/rdf+xml"
http://www.uk-postcodes.com/postcode/AB101AA
HTTP/1.1 200 OK
...<rdf:RDF

So it seems to me that specifying Best Practice for this sort of system,
thus making it more useful, is better than just sitting here saying it is
wrong, and possibly letting the poorer data sweep over us while we have
our heads in the sand.

Thanks for re-raising the issue, even if it has filled up my mailbox!

Hugh,

Let's pass the URL: <http://www.uk-postcodes.com/postcode/AB101AA> (Ian: I am not speaking Turtle I want the Link to work with all kinds of email clients that support hyperlinks) through URI Debugger [1].

1. http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Fwww.uk-postcodes.com%2Fpostcode%2FAB101AA&useragentheader=&acceptheader= -- <head/> tell me nothing, HTTP response headers tell an HTTP User agent nothing. It tell me (Human) "Get this information as XML <http://www.uk-postcodes.com/postcode/AB101AA.xml>, CSV <http://www.uk-postcodes.com/postcode/AB101AA.csv>, JSON <http://www.uk-postcodes.com/postcode/AB101AA.json> or RDF <http://www.uk-postcodes.com/postcode/AB101AA.rdf>" (hopefully the hyperlinks remain intact re. XML, CSV, JSON, and RDF resource URLs)

2. http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/http/www.uk-postcodes.com/postcode/AB101AA -- shows that semantically, we have a triple based on "sioc:links_to" relation that connects the HTML page to various resource types (RDF, CSV, XML, JSON)

3. http://linkeddata.uriburner.com/about/html/http/www.uk-postcodes.com/postcode/AB101AA.rdf -- show an empty page since there isn't a triple connecting the RDF resource to its content (no real-world sensory equivalent of distinguishing the Canvas from the Painting, the Data in the Graph Paper from the actual Graph Paper)

4. http://linkeddata.uriburner.com/ode/?uri%5B%5D=http%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttp%2Fwww.uk-postcodes.com%2Fpostcode%2FAB101AA.rdf&; -- clicked the ODE link in the footer of empty page since ODE is an RDF data browser, it projects the graph pictorial carried by the RDF document

5. http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fwww.uk-postcodes.com%2Fpostcode%2FAB101AA%23ddd9fd92559e0657db00c4e60c49f344c98c6dcc8-ge_1&sid=8829&urilookup=1 -- because of the HTTP GET in #2 data was still written to the backend RDF store/dbms, so a descriptor page can still be generated.

** I know at first blush you might assume this to be a demonstration of OpenLink technology, if so please find a way to put that aside ** Lets look at a number of things in play here:

1. The description of a Postcode was persisted to a Document
2. The Document was published to the Web
3. The Document Address became the focal point of the Web contribution
4. The Document has unambiguously Identified Subjects (using an HTTP URI based Names) 5. The Document contains no metadata that semantically associates its Subjects 6. http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Fwww.uk-postcodes.com%2Fpostcode%2FAB101AA%23ddd9fd92559e0657db00c4e60c49f344c98c6dcc8-ge_1&useragentheader=&acceptheader= -- verifying #5 7. http://bit.ly/bV1rsY -- alternative to #6 using same tool but passing URL of the descriptor generated by our Sponger (what's behind URIBurner), notice how it closes the loop by adding the missing metadata in a myriad of way that includes HTML+RDFa.

To conclude, is this URL: <http://www.uk-postcodes.com/postcode/AB101AA>, what you regard as the best practice? If so how, can this be? If the data doesn't effectively describe itself to Humans and Machines without the middleware heuristics demonstrated by our Sponger?

Ultimately, we come back to relative and subjective ambiguity. That will never go away, but deductively constructing context is a practical solution that's 100% palatable with the ingenuity that belies Web Architecture. IMHO.


Links:

1. http://linkeddata.informatik.hu-berlin.de/uridbg/index.php
2. http://en.wikipedia.org/wiki/Document -- What is a Document?


--
Hugh

On 04/11/2010 19:04, "Harry Halpin"<[email protected]>  wrote:

On Thu, Nov 4, 2010 at 7:18 PM, Ian Davis<[email protected]>  wrote:
On Thursday, November 4, 2010, Nathan<[email protected]>  wrote:

Please, don't.

303 is a PITA, and it has detrimental affects across the board from
network load through to server admin. Likewise #frag URIs have there
own set of PITA features (although they are nicer on the network and
servers).

However, and very critically (if you can get more critical than
critical!), both of these patterns / constraints are here to ensure
that  different things have different names, and without that
distinction our data is junk.

I agree with this and I address it in my blog post where I say we
should link the thing to its description using a triple rather than a
network response code.

This is key. The issue with 303 is that it uses a "network response
code" to make a semantic distinction that can (and likely should) be
done in the data-format itself, i.e. distinguishing a name for the
data for the name identified by the thing itself. To be precise, you
can state (ex:thing isDescribedBy ex:document, ex:thing rdf:type
irw:nonInformationResource) in ex:document that is full of statements
about ex:thing, and a half-way intelligent RDF parser should be able
to sort that out.

Re 303 and performance, I am *sure* for DERI's Sindice it's fine to
follow 303s. However, performance-wise for a server, it seems rather
silly to do another HTTP redirect with little gain, and I think
something Google-size would care about wasting HTTP response codes.

Lastly, the real problem is deployment. Imagine you are a regular
database admin - lots of people do not care and do not want to (and
can't) edit .htaccess and deal with Apace HTTP redirects just to put
some data on the Web, and will not use Link headers or anything else
either. They want to take a file and just put in on the Web via FTP
without messing with their server. Many developers (I've had this
conversation with David Recordon before the OGP launch in person) note
that making the average data deployer worry about the difference
between a URI for a thing and a document will naturally hurt
deployment, so *not* following this practice was considered a feature,
not a bug, by Facebook.

A database admin is at a bar. A RDF evangelist comes ups and says
"Yes, I'd like for you to release your data, but you have to set up
your server to do 303 redirects and convert your data to this weird
looking RDF/XML format in addition to having a human-readable format.
Maybe you can google for D2RQ..." A Microsoft guy comes up and says
"Hey, here's this oData format, why not just have your server produce
a format we understand, Atom." Guess who will win the argument :)

Think outside the box, RDF needs to lower deployment costs. You can do
that and keep the name/thing distinction, by doing it as a triple in
the dataformat, which is a logical thing to do rather than doing
essentially semantic work as a network response code.


This goes beyond your and my personal opinions, or those of anybody
here, the constraints are there so that in X months time when
"multi-corp" trawls the web, analyses it and releases billions of
statements saying like {</foo>  :hasFormat "x"; sioc:about
dbpedia:Whatever } about each doc on the web, that all of those
statements are said about documents, and not about you or I, or
anything else real, that they are said about the right "thing", the
correct name is used.
I don't see that as a problem. It's an error because it's not what the
original publisher intended but there are many many examples where
that happens in bulk, and actually the 303 redirect doesn't prevent it
happening with naive crawlers.

If someone asserts something we don't have to assume it is
automatically true. We can get authority about what a URI denotes by
dereferencng it. We trust third party statements as much or as little
as we desire.


And this is critically important, to ensure that in X years time when
somebody downloads the RDF of 2010 in a big *TB sized archive and
considers the graph of RDF triples, in order to make sense of some
parts of it for something important, that the data they have isn't just
unreasonable junk.
Any decent reasoner at that scale will be able to reject triples that
appear to contradict one another. Seeing properties such as "format"
against a URI that everyone else claims denotes an animal is going to
stick out.

It's not about what we say something is, it's about what others say
the thing is, and if you 200 OK the URIs you currently 303, then it
will be said that you are a document, as simple as that. Saying you are
a document isn't the killer, it's the hundreds of other statements said
along side that which make things so ambiguous that the info is useless.

That's only true under the httpRange-14 finding which I am proposing
is part of the problem.



If 303s are killing you then use fragment URIs, if you refuse to use
fragments for whatever reason then use something new like tdb:'s,
support the data you've published in one pattern, or archive it and
remove it from the web.
These are publishing alternatives, but I'm focussed on the 303 issue
here.


But, for whatever reasons, we've made our choices, each has pro's and
cons, and we have to live with them - different things have different
name, and the giant global graph is usable. Please, keep it that way.

Agree, different things have different names, that's why I emphasise
it in the blog post. I don't agree that the status quo is the best
state of affairs.


Best,

Nathan


Ian






--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Reply via email to