Ian Davis wrote:
On Wed, Jun 24, 2009 at 9:56 PM, Kingsley Idehen
<[email protected] <mailto:[email protected]>> wrote:
The NYT, London Times, and others of this ilk, are more likely to
contribute their quality data to the LOD cloud if they know there
is a vehicle (e.g., a license scheme) that ensures their HTTP URIs
are protected i.e., always accessible to user agents at the data
representation (HTML, XML, N3, RDF/XML, Turtle etc..) level;
thereby ensuring citation and attribution requirements are honored.
I agree with that, but it only covers a small portion of what is
needed. You fail to consider the situations where people publish data
about other people's URIs, as reviews or annotation.
I am not, far from it.
The foaf:primaryTopic mechanism isn't strong enough if the publisher
requires full attribution for use of their data. If I use SPARQL to
extract a subset of reviews to display on my site then in all
likelihood I have lost that linkage with the publishing document.
Only if you choose to construct your result document using literal
values i.e., a SPARQL solution that has URIs filtered out; anyway, if
thats what you end up doing, then you do have <link/> and @rel at your
disposal for identifying your data sources, worst case.
Attribution is the kind of thing one gives as the result of a
license requirement in exchange for permission to copy. In the
academic world for journal articles this doesn't come into play at
all, since there is no copying (in the usual case). Instead people
cite articles because the norms of their community demand it.
Yes, and the HTTP URI ultimately delivers the kind mechanism I
believe most traditional media companies seek (as stated above).
They ultimately want people to use their data with low cost
citation and attribution intrinsic to the medium of value exchange.
The BBC is a traditional media company. Its data is licensed only for
personal, non-commercial use: http://www.bbc.co.uk/terms/#3
I used New York Times and London Times for specific reasons, their
business models are different from that of the BBC; they are traditional
*commercial* media companies.
btw - how are you dealing with this matter re. the
nuerocommons.org <http://nuerocommons.org> linked data space? How
do you ensure your valuable work is fully credited as it bubbles
up the value chain?
I found this linked from the RDF Distribution page on neurocommons.org
<http://neurocommons.org> :
http://svn.neurocommons.org/svn/trunk/product/bundles/frontend/nsparql/NOTICES.txt
Everyone should read it right now to appreciate the complexity of
aggregating data from many sources when they all have idiosyncratic
requirements of attribution.
Then read
http://sciencecommons.org/projects/publishing/open-access-data-protocol/
to see how we should be approaching the licensing of data. It explains
in detail the motivations for things like CC-0 and PDDL which seek to
promote open access for all by removing restrictions:
"Thus, to facilitate data integration and open access data sharing,
any implementation of this protocol MUST waive all rights necessary
for data extraction and re-use (including copyright, sui generis
database rights, claims of unfair competition, implied contracts, and
other legal rights), and MUST NOT apply any obligations on the user of
the data or database such as “copyleft” or “share alike”, or even the
legal requirement to provide attribution. Any implementation SHOULD
define a non-legally binding set of citation norms in clear,
lay-readable language."
Science Commons have spent a lot of time and resources to come to this
conclusion, and they tried all kinds of alternatives such as
attribution and share alike licences (as did Talis). The final
consensus was that the public domain was the only mechanism that could
scale for the future. Without this kind of approach, aggregating,
querying and reusing the web of data will become impossibly complex.
This is a key motivation for Talis starting the Connected Commons
programme ( http://www.talis.com/platform/cc/ ). We want to see more
data that is unambiguously reusable because it has been placed in the
public domain using CC-0 or the Open Data Commons PDDL.
So, I urge everyone publishing data onto the linked data web to
consider waiving all rights over it using one of the licenses above.
I don't think "waiving all rights" is a practical option for the likes
of New York Times or Times of London, ditto traditional commercial media
companies.
As Kingsley points out, you will always be attributed via the URIs you
mint.
This part I totally agree with :-)
Ian
PS. This was the subject of my keynote at code4lib 2009 "If you love
something, set it free", which you can view here
http://www.slideshare.net/iandavis/code4lib2009-keynote-1073812
The thing about "Free" is that we'll always end up having to
disambiguate: "Free Speech" and "Free Beer". That's the sad nature of
the overloaded "Free" moniker that belies the Open Source moniker.
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com