Yves Raimond wrote:
On Sat, Aug 2, 2008 at 5:17 PM, Kingsley Idehen <[EMAIL PROTECTED]> wrote:
Yves Raimond wrote:
Hello!


I would like to suggest that publishers of new linked data spaces that
plug
into the growing LOD include the following:

1. cross-link information

I would also suggest we find a better measure for interlinkage than a
raw number of triples linking one dataset to another.
For example, http://dbtune.org/musicbrainz/ creates its own identifier
for languages (http://dbtune.org/musicbrainz/directory/language),
which are owl:sameAs'ed to the corresponding languages in Lingvoj when
applicable, whereas linkedmdb directly links to the Lingvoj
identifiers. In the latter case, the raw number of interlinks will be
higher, but could be reduced a lot by creating identifiers for
language and use sameAs.

The same applies for geographic locations, for example. Some datasets
use foaf:based_near to link to Geonames, some others create their own
identifiers, and then link to the corresponding Geonames locations
through owl:sameAs. For the same dataset, this two methodologies will
lead to completely different numbers.

To boost the statistics of a dataset, we could simply link each person
or group in them to http://dbpedia.org/class/yago/Entity100001740
through rdf:type :-D

Amen!

And it also means we start to expose the fact that LOD is not an "instance
level only" linked data space (a sad misconception).

So I think we should agree on what we count as "interlinks" before
publishing such statistics, so that we can actually use these values?

We should basically express linkages across instance and schema/data
dictionary vectors. This also helps those looking to build LOD applications.

Of course there is more to come re. the injection of "data dictionary /
schema" linkage aspects of LOD, but no harm in getting our thoughts in order
re. "best practices" for the growing cloud :-)
My recommendation would be to always go for the lowest value - the one
you'd obtain by creating your own identifiers and using owl:sameAs
(which would be equivalent to the number of distinct external URIs
mentioned in your dataset).

What do you think?

Good Idea, so share you page as a nice example :-)


I just gave it a shot on Jamendo, counting the results of a SELECT
DISTINCT query, and this is indeed a bit depressing.
http://dbtune.org/jamendo/
For example, the Geonames interlinking drops from 3244 to 289 :-)
Smarts vs Size, which do you choose?  I find this elating :-)

Kingsley
Some similar statistics from Musicbrainz at
http://dbtune.org/musicbrainz/ , which I'll publish when I get some
time to figure out how to tweak d2r templates :-)

Distinct DBpedia albums - 22426
Distinct DBpedia artists - 39877
Distinct MySpace artists (on http://dbtune.org/myspace/) - 14668
Distinct DBpedia countries - 245
Distinct Lingvoj languages - 185

Cheers!
y


Kingsley
Cheers!
y



2. cross-link visual derived from the LOD cloud diagram.

The Linked Movies Database has nice examples of both [1].

Links:

1. http://www.linkedmdb.org:8080/Main/Interlinking

--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software     Web: http://www.openlinksw.com







--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software     Web: http://www.openlinksw.com








--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com





Reply via email to