Dan Brickley wrote:
On 21 Mar 2010, at 12:47, Hugh Glaser <[email protected]> wrote:
Hi Kingsley, I am right with you - finding stuff is hard.
But I do think we could make it easier for all of us.
Just the esw wiki alone requires me to put every set I create into a
bunch of places
10 years ago, looking for RDF on the public Web was like looking for a
needle in a haystack. There wasnt much out there and it was poorly
linked. So a big part of the thinking that led to the foaf/rdfweb
design was to make discovery easier: if you find one rdf doc, you
should be able to find most of the rest by following seeAlso and other
kinds of links.
Why isn't this enough?
Because it doesn't lead me explicitly to:
1. RDF Data Set Archives
2. SPARQL endpoints.
It will get me to Linked Data based hypermedia resources, but that's one
of three distinct items.
The bigger problem is really this, and it was the basis of the best
practices, do SPARQL endpoint publishers when people hammering their
endpoints with SPARQL CONSTRUCTS en route to constructing dumps that are
then loaded into personal or service specific endpoints? Today, even
with DBpedia outlining the components with absolute clarity, we still
get loads of visitors attempting to empty the Quad Store via the SPARQL
endpoint (even when they could simply go load the data sets themselves
into an RDF store of their choice).
Perhaps because many of the datasets are huge db exports, crawlers are
often overwhelmed and dissapear into depth-first holes? Or because we
don't publish triples about doc- and dataset-types in a
crawler-discoverable way?
I don't see this as a "Crawler Only" zone or solution. A project should
make the following crystal clear, ultimately for its own good:
1. RDF Data Set Archives
2. SPARQL Endpoints
3. URIs pattern examples for Published Linked Data.
A single HTML+RDFa (or HTML5 with RDFa or Microformats) document can
express the above with clarity for visiting user agents.
A wiki page is ok for initial bootstrap but we ought to outgrow that
soon...
Yes, of course, hence the items 1-3 above. But for now, its better than
what's currently in play i.e., inconsistency.
Kingsley
Dan
--
Regards,
Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen