Re: New LOD Cloud - Please send us links to missing data sources

Kingsley Idehen Sat, 28 Feb 2009 15:19:20 -0800

Ted Thibodeau Jr wrote:

Hi, Anja --

On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
> we are currently updating the LOD cloud. Find the draft here:
> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-02-27.png>

It remains very pretty -- but it feels like a data silo of its own.

I can't speak for anyone else, but I find it difficult to read this
cloud -- I cannot see which data sets do well at out-linking, nor
which sets are apparently good for being out-linked *to*.

Where is the data behind the graph?

What are the triple counts for each set?

What are counts for each arrow?  (When the arrow is bidirectional,
I would expect 2 counts, 1 for each arrowhead.)

In October, when I asked about these, such numbers weren't used in
drawing the cloud -- so the sizes of the nodes and the weights of
the arcs are just artistic, and/or based on gut feel.  This use of
commonly understood graphing techniques (larger nodes for larger
data sets, thicker arcs for more links between) without any actual
data behind it troubles me.

Because of this, and because I wanted to easily see which data sets
made many links out, and which data sets were linked *to* a lot,
I made an alternative graphic -- which doesn't look so pretty (it's
much less suggestive of an actual cloud), but which I think is rather
more readable, and has no potential misinterpretation about data set
sizes or inter-link intensity, as all nodes are the same size and all
arcs are the same weight.

   <http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html>

This version makes plain that there are some clusters within the
"cloud" -- and that DBpedia, MusicBrainz, and GeoNames are clearly
nexuses for these clusters, being the targets of many inter-links.
The new edition reveals several new nexuses, in ECS Southampton,
GeneID, UniProt, DBLP RKB Explorer, CiteSeer, and others.

At first glance, FOAF appears to be such a nexus, but it is an
odd case.

On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
> Keep in mind: A data source qualifies for the cloud, if the
> data is available via dereferencable URIs and if the data
> source is interlinked with at least one other source (meaning
> it references URIs within the namespace of the other source).

Ted,


I don't think FOAF properly belongs on a cloud of *data sources*,
because it's more of a vocabulary than a data-set -- more
analogous to Dublic Core than to the others here.  In other words,
it's more *data dictionary* than *instance data*.

There is no FOAF SPARQL endpoint, no FOAF RDF dump, etc., in
marked contrast to the others on this graphic.  The FOAF ontology
is used by *many* data sets which aren't represented here -- which
is why I used a cloud for FOAF (and for SIOC).  *Most* uses of
FOAF are one-off pages, and I don't think those can really be
considered "data sources" never mind "data sets" for purposes of
this diagram.

This would be much clearer if they simply indicated that is implied aFOAF Profile cluster or Data Space as per:

http://esw.w3.org/topic/FoafSites . The same thing applies to SIOC.


My intent (as time permits) is to build separate illustrations
of the data sets that make use of these vocabularies (e.g.,
LiveJournal, Facebook, Tribe [assuming revival], etc.).

The Dictionary/Vocabulary/Schema (TBox) side is fine, which is basicallywhat drove the creation of the UMBEL cloud. Now the LOD cloud and theUMBEL clouds actually come together via the Sponger Cloud since it usesterms from the dictionaries in the UMBEL Cloud and links whereappropriate to instance data in the LOD cloud.


Examples:

1. Crunchbase - maps on the fly to DBpedia (always has)

2. Wikipedia - which does the DBpedia extraction on the fly, againstWikipedia source data when it determines a deltas between Wikipedia andDBpedia3. XBRL - both Joshua's stuff and WikiCompany are looked up (*note, thiscartridge needs fixing and enhancing, but its still a zillion timesbetter than zilch*)

Obviously, there are many more data set inter-links in the updated
cloud than the previous edition, so there will be more intersections
in mine, too.  But if you compare mine to the edition it was based
on, you'll see that there's a lot less inter-linking in reality
than a casual glance at this original suggests --

<http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18_blank.png>

The LOD cloud is more about marketing collateral for presentations.Don't take it that seriously in a pure linkage sense. I am onlycommenting because of the marketing aspects of this diagram.


I think exposing where the gaps are is *much* more likely to lead to
them being closed, than making them invisible.

I'd be happy to update mine -- and happier still to make the node sizes
and arc weights proportional to the data -- but I can no longer reverse
engineer the connections simply by looking at the cloud image.

Can we get the raw data, please?

Be seeing you,

Ted



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: New LOD Cloud - Please send us links to missing data sources

Reply via email to