The problem at hand is: How to get reasonably accurate and up-to-date
statistics about the LOD cloud?
I see three workable methods for this.
1. Compile the statistics from voiD descriptions published by
individual dataset maintainers. This is what Hugh proposes below.
Enabling this is one of the main reason why we created voiD. There has
to be better tools for creating voiD before this happens. The tools
could be, for example, manual entry forms that spit out voiD (voiD-o-
matic?), or analyzers that read a dump and spit out a skeleton voiD
file.
2. Hand-compile the statistics by watching public-lod, trawling
project home pages, emailing dataset maintainers, and fixing things
when dataset maintainers complain. This is how I created the original
LOD cloud diagram in Berlin, and after I left Berlin, Anja has done a
great job keeping it up to date despite its massive growth. We will
continue to update it on a best-effort basis for the foreseeable
future. A voiD version of the information underlying the diagram is in
the pipeline. Others can do as we did.
3. Anyone who has a copy of a big part of the cloud (e.g. OpenLink and
we at Sindice) can potentially calculate the statistics. This is non-
trivial because we just have triples, and we need to reverse-engineer
datasets and linksets from them, it involves computation over quite
serious amounts of data, and in the end you still won't have good
labels or homepages for the datasets. While this approach is possible,
it seems to me that there are better uses of engineering and research
resources.
There is a fourth process that, IMO, does NOT work:
4. Send an email to public-lod asking "Everyone please enter your
dataset in this wikipage/GoogleSpreadsheet/fancyAppOfTheWeek."
Best,
Richard
On 11 Aug 2009, at 22:07, Hugh Glaser wrote:
If any more work is to be put into generating this picture, it
really should be from voiD descriptions, which we already make
available for all our datasets.
And for those who want to do it by hand, a simple system to allow
them to specify the linkage using voiD would get the entry into a
format for the voiD processor to use (I'm happy to host the data if
need be).
Or Aldo's system could generate its RDF using the voiD ontology,
thus providing the manual entry system?
I know we have been here before, and almost got to the voiD
processor thing:- please can we try again?
Best
Hugh
On 11/08/2009 19:00, "Aldo Bucchi" <[email protected]> wrote:
Hi,
On Aug 11, 2009, at 13:46, Kingsley Idehen <[email protected]>
wrote:
Leigh Dodds wrote:
Hi,
I've just added several new datasets to the Statistics page that
weren't previously listed. Its not really a great user experience
editing the wiki markup and manually adding up the figures.
So, thinking out loud, I'm wondering whether it might be more
appropriate to use a Google spreadsheet and one of their submission
forms for the purposes of collectively the data. A little manual
editing to remove duplicates might make managing this data a little
more easier. Especially as there are also pages that separately list
the available SPARQL endpoints and RDF dumps.
I'm sure we could create something much better using Void, etc but
for
now, maybe using a slightly better tool would give us a little more
progress? It'd be a snip to dump out the Google Spreadsheet data
programmatically too, which'd be another improvement on the current
situation.
What does everyone else think?
Nice Idea! Especially as Google Spreadsheet to RDF is just about
RDFizers for the Google Spreadsheet API :-)
Hehe. I have this in my todo (literally). A website that exposes a
google spreadsheet as SPARQL endpoint. Internally we use it as UI to
quickly create config files et Al.
But It will remain in my todo forever...;)
Kingsley, this could be sponged. The trick is that the spreadsheet
must have an accompanying page/sheet/book with metadata (the NS or
explicit URIs for cols).
Kingsley
Cheers,
L.
2009/8/7 Jun Zhao <[email protected]>:
Dear all,
We are planning to produce an updated data cloud diagram based on
the
dataset information on the esw wiki page:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics
If you have not published your dataset there yet and you would
like your
dataset to be included, can you please add your dataset there?
If you have an entry there for your dataset already, can you
please update
information about your dataset on the wiki?
If you cannot edit the wiki page any more because the recent
update of esw
wiki editing policy, you can send the information to me or Anja,
who is
cc'ed. We can update it for you.
If you know your friends have dataset on the wiki, but are not on
the
mailing list, can you please kindly forward this email to them? We
would
like to get the data cloud as up-to-date as possible.
For this release, we will use the above wiki page as the
information
gathering point. We do apologize if you have published information
about
your dataset on other web pages and this request would mean extra
work for
you.
Many thanks for your contributions!
Kindest regards,
Jun
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security
System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/
~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com