Hugh Glaser wrote:
Hi Guys,
I am puzzled by the whole discussion, so will try to summarise to find out
if I have some misunderstanding.
It really is "just" about finding where the URIs are, and search engines are
the game in town. We need to make it really easy for people to find the
Linked Data URIs they need. Wrappers make things a bit harder.
Juan asked if "Sindice crawled the whole regular web and checked the
Spongers for each URL (sic!)".
I read this as: "Can I use Sindice to find Linked Data URIs provided by the
Spongers?" Or to put it yet another way, "Does Sindice index the part of the
Semantic Web provided by the Spongers?"
One way to do this would be to do what Juan suggests - model what the
Spongers are doing, and then infer what the Linked Data URIs would be, based
on the URLs of the underlying web pages, having crawled them.
But there seems to me a much simpler and more principled way - the Sponger
should do it.
Spongers should provide Semantic Sitemaps (and of course voiD descriptions),
so that Sindice can index (not *crawl*, which I think has lead to some of
the confusion) the sites.
How might this be done?
Well, certainly where the Sponger is connected to a particular site which
has an ordinary Sitemap, it could/should process it as part of the
connection with a site, and then re-publish the Semantic Sitemap. For sites
that don't have Sitemaps, it may/will be somewhat harder.
I may be misunderstanding Spongers as well, but it all seems pretty clean
and straightforward to me.
Great stuff, of course.
Hugh,
Quick Sponger Glossary:
Sponger -- The Data Access Manager Layer
Basic Cartridges/Drivers/Providers -- The components that perform the
extraction and transformation into RDF model based Linked Data graphs
Meta Cartridges -- Smarter Cartridges that perform Lookups and leverage
Inference Rules etc.. which are added to the basic Linked Data graphs
(*these aren't part of the Open Source Edition of Virtuoso*).
A Sponger generated Linked Data graph does optionally include VoiD
descriptions; why wouldn't it, bearing in mind our proximity to VoiD)?
We just disabled while updating the Sponger Engine etc.. You must have
seen VoiD graphs in earlier Sponger proxy URIs, right? They will be
re-enabled very soon :-)
All sponged data ends up in the Quad Store of its host Virtuoso instance
(so <http://uriburner.com/sparql> and <http://uriburner.com/fct> are in
place as per usual re. Virtuoso instances).
A Virtuoso Sponger instance can optionally ping PTSW each time it makes
a Linked Data graph from its Web Resource RDFization activity, so
Sindice and others engines that already subscribe to PTSW also have
access to the Sponger generated Linked Data.
Sponger proxy/wrapper URIs are Data Source Names, if you look at the
graph closely you should see how we express with clarity what we are
doing i.e., note the "owl:sameAs" assertion to the original data source,
and the "owl:shameAs" pattern which is about minting a hint URI back to
the data source we've sponged. What we will be adding is some additional
Provenance Data now that we have a good shared ontology in place.
As stated above, the existence of <http://uriburner.com/fct> implies
that the faceted search and find engine in also alive and any client
application can use the REST or SOAP services it provides to perform
disambiguated search and find queries. This means, Sindice can lookup
Sponger instances on the same way the Sponger looks up Sindice via its
Web Services.
When you use Virtuoso, with the Sponger Middleware Enabled, Web Document
URLs basically become Named Graph IRIs re. SPARQL (the point Martin
emphasized in his post).
When you use the OpenLink Data Explorer (ODE) [5], you can also bind the
Browser to bind to any of the instances above, and exploit the effect of
sponging by simply using invoking the View Page Metadata option (main or
context menu or via the URIBurner Bookmarklet).
Links:
1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger
2. http://uriburner.com -- instance of Virtuoso with Sponger
Middleware enabled
3. http://bbc.openlinksw.com -- instance of Virtuoso with Sponger
Middleware enabled
4. http://lod.openlinksw.com -- instance of Virtuoso with Sponger
Middleware enabled
5. http://ode.openlinksw.com -- OpenLink Data Explorer
6. http://trdf.sourceforge.net/provenance/ns-20090825.html -- Provenance
Ontology
Kingsley
Best
Hugh
On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <[email protected]> wrote:
But Sindice could at least crawl Amazon.
It would be great to use sig.ma to create a "meshup" with the amazon data.
Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org
On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
<[email protected]> wrote:
I don't think so, because this would require that Sindice crawled the
whole regular web and checked the Spongers for each URL (sic!).
Juan Sequeda wrote:
Does Sindice crawl this (or any other semantic web search engines)?
Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com