Hi there,

I've done data integration based on SPARQL in a "restricted" domain, not web-scale (see SemWIQ presentation at ESWC08 [1]). But the issues are similar. We need some descriptions about sites, owner, license, etc. In our case this is provided upon registration of data sources at the mediator which is maintaining a site catalog.

For me there are two points that count for voiD:
1. we just need those meta-data about the maintainer of endpoints
2. we need some simple "pre-compiled" statistical information just because of performance (I think so) Of course, data should be self-describing and you could fetch all data and collect your own stats using SPARQL, but this will produce unnecessary load to servers. Additionally, curently SPARQL does not support aggregate functions (at least not the spec) which allow you to retrieve already aggregated stat data.

One possible way to achieve this is to provide voiD data as part of the actual graph exposed by the SPARQL endpoint. SPARQL can be used to retrieve meta data without the need for an additional meta-description layer. However, I think the real problem is, that sometimes this is not (easily) possible. In such cases a simple file resource could add the required information. But how should the search engine, client, etc. know where to find meta data? First try using SPARQL, then file location by convention? I don't know...

For voiD there are two things:
1. definition of a metaLOD vocabulary
2. specifying a convention of "where to find meta data" (like "/ robots.txt" or "/sitemap.xml")

1. is easier than 2.

Regarding statistics: I'm working on a statistics monitor which can be attached to a SPARQL endpoint (at the same host or at least in the subnet). It will periodically generate stats for the data stored behind the SPARQL endpoint. Because it works via SPARQL, it can be used regardless of the implementation (my actually be a wrapper like D2R-Server). I basically need this for query optimization in SemWIQ.

It would be great if I could use outcomings from the voiD approach. That's why I'd like to get involved.

Regards
AndyL

[1] http://semwiq.faw.uni-linz.ac.at


On Jun 12, 2008, at 8:49 AM, Hausenblas, Michael wrote:



Giovanni,

I think I see your argument here and I tend to agree up to a certain
point. What makes me wonder is that it is *you* stating this ;)

Seriously, I very much believe in self-descriptive documents, etc. I do
prefer simple things that work. However, voiD is just the next logical
step after semantic sitemaps (it actually is thought to extend it in
terms of using the sc:datasetURI as the entry point, see also [1]). So, just in case you want to argument against your own proposal, please tell
me so ;)

I guess you're right that many things can be done already and I'm
positive that we should use the current layer, then advance to the next. But what if, say, the current layer is missing something. To whom is it up to decide when we are done? I guess it is up to the people using it.
So, let's not judge a book by its cover, please.

voiD intends to formalise what is already used in practice. I myself
have built some applications that exploit the LOD datasets and others
certainly have done as well. As it seems, there is a certain need to do
what we have done up to now mainly in our brains, in a more automated
way. There we are: a clear demand for something, a proposal to solve it.
It is as simple as it is. If it turns out that LOD dataset provides
don't use it - fine. They might use other methods, then, or nothing at
all.

I see two issues with what you propose, however - granularity &
scalability. Currently we have identified two use cases for voiD:

1. automatic creation of a map (such as http://sindice.com/map)
2. topic-based selection of LOD datasets

I guess you're kinda familiar with (1). Now, think about scalability.
Today we have a bunch of LOD data sets or other sources -  tomorrow we
may have 10k and next year maybe a million. Next, when looking at (2),
I'd like to have a reliable, simple method to determine a 'good' entry
point into the LOD cloud. As soon as I'm in, I can follow my nose using
basically what you propose.

Finally, the reactions so far tell us that voiD seems to be what people where waiting for in terms of easy to use and powerful enough to have an
added value.

Concluding, it is not 'Giovanni vs. voiD', it is Giovanni + voiD for a
better, finally a *real* Semantic Web.

Cheers,
        Michael


[1] http://sw.joanneum.at/voiD/img/void_discovery.png

----------------------------------------------------------
Michael Hausenblas, MSc.
Institute of Information Systems & Information Management
JOANNEUM RESEARCH Forschungsgesellschaft mbH

http://www.joanneum.at/iis/
----------------------------------------------------------


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Giovanni Tummarello
Sent: Thursday, June 12, 2008 12:08 AM
To: Hausenblas, Michael
Cc: public-lod@w3.org; Semantic Web
Subject: The king is dressed in void

Wasnt RDF all aabout being self describing?

if i say "giovanni works in research" .. do i really need a
vucabolary that says "this rdf contains informations that describe
what people claim to be working on" that's a suicide. If this is the
case (which i totally dont believe) then the king is seriously naked
and there is no hope whatsoever that RDF is going to have any
relevance (and there i say it)

to find one such file, instead of having to invent agree and markup
i'd say its much easier to do something like [1] or [2].
this is not marketing. its a plea to NOT jump on more layers of stuff
when the previous layers have really to show there value and
adoptability still. Solve some simple use cases first then jump to the
more complex one.

Giovanni

[1]
http://demo.sindice.com/search?q=*+%3Chttp%3A%2F%2Fwww.w3.org%2
F2006%2Fvcard%2Fns%23title%3E+%27research%27&qt=advanced

or
http://sindice.com/search?q=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1
%2Fknows&qv=http%3A%2F%2Frichard.cyganiak.de%2Ffoaf.rdf %23cygri&qt=ifp
(documents which contain statements in which someone claims to be
knowing richard)

[2] http://forum.sindice.com/showthread.php?t=10


On Wed, Jun 11, 2008 at 8:54 AM, Hausenblas, Michael
<[EMAIL PROTECTED]> wrote:


Dear interested people in linked datasets,

As you may have gathered, we have recently initiated a
discussion on how
to discover the linked dataset cloud [1]. The result of our impromptu
kick-off meeting at the ESWC08 is literally voiD - the '
vocabulary of
interlinked datasets' (see notes at [2]). This is a proposal for a
vocabulary and a mechanism how it should be deployed and
used. We have
some first slides available at [3] as well.

Please consider commenting on it either by replying to this message
and/or sharing your thoughts with us at the Wiki [2].

Cheers,
     Michael

[1] http://richard.cyganiak.de/2007/10/lod/
[2]

http://community.linkeddata.org/MediaWiki/index.php?MetaLOD#Kic
k-off_mee
ting_at_ESWC08
[3]
http://www.slideshare.net/mediasemanticweb/full-eswc08-lightning- talk

----------------------------------------------------------
Michael Hausenblas, MSc.
Institute of Information Systems & Information Management
JOANNEUM RESEARCH Forschungsgesellschaft mbH
Steyrergasse 17, A-8010 Graz, AUSTRIA

<office>
phone: +43-316-876-1193 (fax:-1191)
mobile: +43-699-1876-1165
e-mail: [EMAIL PROTECTED]
skype: mhausenblas
  web: http://www.joanneum.at/iis/

<see also>
       http://sw-app.org/about.html
       http://riese.joanneum.at
----------------------------------------------------------






----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69
http://www.langegger.at



Reply via email to