Daniel Schwabe wrote:
All,
the sitemap.xml solution works IF everybody (or most) have the
robots.txt or the sitemap.xml at the root directory. So, conceptually
speaking, it should be the way to go.
But a quick test on the LOD cloud returned 404 for many if not most
sites for both sitemap.xml and robots.txt...
Curiously, for many of those without a sitemap.xml, the
<c-name>/sparql URI format to access the SPAQL endpoint DOES work...
So something is still missing. Either each dataspace mantainer that is
willing to provide the SPARQL endpoint also provides a (even if
minimal) sitemap.xml or voiD description, or at least follows this
convention.
This would greatly enhance the accessibility of the data, and enable
tools to automatically find them as needed...
Cheers
D
Daniel,
+1
Clearly we need to document the best practices somewhere :-)
Kingsley
Sergio Fernández wrote:
On Sat, 2009-03-07 at 00:36 -0300, Daniel Schwabe wrote:
I could query the site for its sitemap extension (would it always be
<home url>/sitemap.xml?
Yes, you can do it in a programmatic way. But that URL (/sitemap.xml),
even it's common used, it's not mandatory, so you can't use it as a
constant. But there is one way, not so direct, but at least one that is
standard:
1) From /robots.txt you can take the Sitemap's URL ("Sitemap:" as [1]
specifies)
2) According the extension proposed by DERI [2], you can check if the
sitemap points a SPARQL enpoint looking for the
sc:sparqlEndpointLocation element.
Hope that helps.
Best,
[1] http://www.sitemaps.org/protocol.php
[2] http://sw.deri.org/2007/07/sitemapextension/
--
Daniel Schwabe
Tel:+55-21-3527 1500 r. 4356
Fax: +55-21-3527 1530
http://www.inf.puc-rio.br/~dschwabe Dept. de Informatica, PUC-Rio
R. M. de S. Vicente, 225
Rio de Janeiro, RJ 22453-900, Brasil
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com