Re: [HELP] Can you please update information about your dataset?

Richard Cyganiak Wed, 12 Aug 2009 05:19:27 -0700

The problem at hand is: How to get reasonably accurate and up-to-datestatistics about the LOD cloud?


I see three workable methods for this.

1. Compile the statistics from voiD descriptions published byindividual dataset maintainers. This is what Hugh proposes below.Enabling this is one of the main reason why we created voiD. There hasto be better tools for creating voiD before this happens. The toolscould be, for example, manual entry forms that spit out voiD (voiD-o-matic?), or analyzers that read a dump and spit out a skeleton voiDfile.

2. Hand-compile the statistics by watching public-lod, trawlingproject home pages, emailing dataset maintainers, and fixing thingswhen dataset maintainers complain. This is how I created the originalLOD cloud diagram in Berlin, and after I left Berlin, Anja has done agreat job keeping it up to date despite its massive growth. We willcontinue to update it on a best-effort basis for the foreseeablefuture. A voiD version of the information underlying the diagram is inthe pipeline. Others can do as we did.

3. Anyone who has a copy of a big part of the cloud (e.g. OpenLink andwe at Sindice) can potentially calculate the statistics. This is non-trivial because we just have triples, and we need to reverse-engineerdatasets and linksets from them, it involves computation over quiteserious amounts of data, and in the end you still won't have goodlabels or homepages for the datasets. While this approach is possible,it seems to me that there are better uses of engineering and researchresources.


There is a fourth process that, IMO, does NOT work:

4. Send an email to public-lod asking "Everyone please enter yourdataset in this wikipage/GoogleSpreadsheet/fancyAppOfTheWeek."


Best,
Richard


On 11 Aug 2009, at 22:07, Hugh Glaser wrote:

If any more work is to be put into generating this picture, itreally should be from voiD descriptions, which we already makeavailable for all our datasets.And for those who want to do it by hand, a simple system to allowthem to specify the linkage using voiD would get the entry into aformat for the voiD processor to use (I'm happy to host the data ifneed be).

Or Aldo's system could generate its RDF using the voiD ontology,thus providing the manual entry system?

I know we have been here before, and almost got to the voiDprocessor thing:- please can we try again?


Best
Hugh

On 11/08/2009 19:00, "Aldo Bucchi" <[email protected]> wrote:

Hi,

On Aug 11, 2009, at 13:46, Kingsley Idehen <[email protected]>
wrote:

Leigh Dodds wrote:

Hi,

I've just added several new datasets to the Statistics page that
weren't previously listed. Its not really a great user experience
editing the wiki markup and manually adding up the figures.

So, thinking out loud, I'm wondering whether it might be more
appropriate to use a Google spreadsheet and one of their submission
forms for the purposes of collectively the data. A little manual
editing to remove duplicates might make managing this data a little
more easier. Especially as there are also pages that separately list
the available SPARQL endpoints and RDF dumps.

I'm sure we could create something much better using Void, etc but
for
now, maybe using a slightly better tool would give us a little more
progress? It'd be a snip to dump out the Google Spreadsheet data
programmatically too, which'd be another improvement on the current
situation.

What does everyone else think?

Nice Idea! Especially as Google Spreadsheet to RDF is just about
RDFizers for the Google Spreadsheet API :-)


Hehe. I have this in my todo (literally). A website that exposes a
google spreadsheet as SPARQL endpoint. Internally we use it as UI to
quickly create config files et Al.
But It will remain in my todo forever...;)

Kingsley, this could be sponged. The trick is that the spreadsheet
must have an accompanying page/sheet/book with metadata (the NS or
explicit URIs for cols).


Kingsley

Cheers,

L.

2009/8/7 Jun Zhao <[email protected]>:

Dear all,

We are planning to produce an updated data cloud diagram based on
the
dataset information on the esw wiki page:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

If you have not published your dataset there yet and you would
like your
dataset to be included, can you please add your dataset there?

If you have an entry there for your dataset already, can you
please update
information about your dataset on the wiki?

If you cannot edit the wiki page any more because the recent
update of esw
wiki editing policy, you can send the information to me or Anja,
who is
cc'ed. We can update it for you.

If you know your friends have dataset on the wiki, but are not on
the
mailing list, can you please kindly forward this email to them? We
would
like to get the data cloud as up-to-date as possible.

For this release, we will use the above wiki page as theinformation

gathering point. We do apologize if you have published information
about
your dataset on other web pages and this request would mean extra
work for
you.

Many thanks for your contributions!

Kindest regards,

Jun


______________________________________________________________________

This email has been scanned by the MessageLabs Email Security
System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

--


Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software     Web: http://www.openlinksw.com

Re: [HELP] Can you please update information about your dataset?

Reply via email to