Thanks, Aidan - sorry for missing your analysis.

In fact, with the ensuing discussions it turns out that there is even more information out there, it's just extremely hard to find. Another reason why we need this discussion, and probably some concerted effort.

Pascal.

On 5/21/2013 3:14 PM, Aidan Hogan wrote:
<snip>
On 18/05/2013 09:58, Leigh Dodds wrote:
You don't say in your paper how you did the analysis. Did you use the
metadata from the LOD group in datahub? At the time I had to do
mine manually, but it wouldn't be hard to automate some of this now,
perhaps to create an regularly updated set of indicators.

One criteria that agents might apply when conducting "Follow Your
Nose" consumption of Linked Data is the licensing of the target data,
e.g. ignore links to datasets that are not licensed for your
particular usage.

On a similar note, we also did a survey of some licensing issues in and
around Linked Data as part of a larger contribution looking at how
closely publishers of RDF follow various tips from the (now superseded
but still relevant) "How to Publish Linked Data on the Web" guide [1].

Our analysis is published/available at [2,3]. For the paper, we looked
at ~4 million RDF/XML documents crawled in May 2011, divided the data by
pay-level domain and looked at how well each domain followed the key
guidelines in [1] with the goal of seeing how well specific guidelines
are followed, and looking to comparatively rank the conformance of
publishers using objective measures. We ended up looking at 188 domains
that offered more than 1,000 quads.

Long story shortish, for one of the guidelines we looked specifically at
licensing information for documents embedded in the documents themselves
[p29,2]. This was tricky: we found a bunch of licensing properties in
use [Table 19,2]. Considering as many of these properties as we could
identify, we found that only 15% of the domains provided licensing
information embedded in *at least one* local document. Averaging equally
across the domains (which had different numbers of documents), about 3%
of documents contained observable licensing information about themselves.

On the plus side, there was some use of the creative-commons vocabulary:

     http://creativecommons.org/ns

... though I think dct:rights/dct:license are more actively promoted.



Versus registering the licensing information on the DataHub or so forth
(which AFAIK no longer supports a public SPARQL endpoint), it would be
much better for (SemWeb) consumers if publishers directly embed
licensing meta-data in the individual RDF documents themselves. There
are already established vocabularies and (at least CC) license URIs in
place for this.


Cheers/fwiw,
Aidan




[1]
http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/
[2] http://sw.deri.org/~aidanh/docs/ldstudy12.pdf
[3] Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel
Polleres and Stefan Decker. "An empirical survey of Linked Data
conformance ". In the Journal of Web Semantics 14: pp. 14–44, 2012.






--
Prof. Dr. Pascal Hitzler
Kno.e.sis Center, Wright State University, Dayton, OH
[email protected]   http://pascal-hitzler.de/
Semantic Web Textbook: http://www.semantic-web-book.org/
Semantic Web Journal: http://www.semantic-web-journal.net/


Reply via email to