Re: Why is it bad practice to consume Linked Data and publish opaque HTML pages?

Dominic Oldman Mon, 01 Apr 2013 05:21:01 -0700


For the specific case of the BMs endpoint would the ideal situation be that 
there is no formal attribution requirement (friction free) but rather some 
encouraging (but not mandatory) words about embedding at least the URI of the 
object record in a web publication.

There is no need for every URI to be included, but the inclusion of the object 
URI  (a simple matter if you are querying the EndPouint) would provide 
everything that anyone would need, particularly since every object record is a 
graph and therefore only the main URI is needed to collect all the triples for 
an individual object.

It would be good to have some best practice guidelines that general web site 
developers can reference (and we can reproduce or link to on our sites) when 
querying triplestores.   

Dominic

________________________________
 From: Hugh Glaser <[email protected]>
To: "[email protected]" <[email protected]> 
Cc: Kingsley Idehen <[email protected]> 
Sent: Monday, 1 April 2013, 12:51
Subject: Re: Why is it bad practice to consume Linked Data and publish  opaque  
HTML pages?

These aims are laudable, and are a good objective when possible.
And I note, Kingsley, that your post talks about "republish the extracted 
content", and I roughly agree with you. 

But the wider discussion seems to me to have a very simplistic, if not naive, 
view of how LOD is used in practice (well, at least compared to the way I use 
it :-) ).
A typical page of something like http://apps.seme4.com/see-uk/ (sorry, hardware 
fault at the moment) or http://www.dotac.info/explorer/ uses many hundreds, or 
even thousands of RDF documents from hundreds of domains retrieved via URIs.
The contribution of some documents may be as little as lending weight to an 
inference that was calculated several years ago, and the document may have long 
been discarded, and not re-cached.
Or, of course, it may be an easily identifiable "fact" in the presentation.
The best I can do is point overall at the domains where we got data 
(http://www.rkbexplorer.com/data/), in the spirit of attribution.
A *requirement* to attribute each URI in a system that goes out and gets stuff 
from the LOD Cloud like that simply means that I have to ignore that entire 
data source, because I can't realistically satisfy it.
Actually, maybe I could - an enormous list of every URI we have ever resolved - 
but somehow I don't think a page with hundreds of millions of URIs on it is 
very helpful.
Of course, I could do quite a lot of implementation work to try to track it, 
but that would have serious computing, storage and communication costs - such 
provenance data for an rkbexplorer network panel might well have than an order 
of magnitude more URIs than the panel itself, plus the descriptive overheads 
(and the receiver would not be very happy with perhaps 50K for 1K of 
substantive data).
Actually, in many cases, at the moment, really doing it properly would not be 
possible, as the RDF data does not in fact have a licence, even if  the web 
"site" does.
Again, this is because people seem to have a simplistic view of how LOD data is 
consumed.
Remember, it is agents that are doing the retrieval, and that eyeballs never 
get to see the "site", if there is such a thing.
Even Jeff's "special cases" clause makes me nervous - the best I can manage in 
reality is to have a link to the main site.
(By the way Jeff, in answer to your question of what you might do, you could 
add licence information to the RDF you return.)
In practice I try to ensure I block sites that require attribution - if I can't 
comply with the spirit, never mind the letter, of the publisher's requirements, 
then I prefer to leave it out.

So, if a site *requires* attribution, some really interesting sites that really 
use the power of Linked Data won't use the data - is that what the publisher 
wanted when they published it?

I do like Chris Gutteridge's data.southampton.ac.uk - please attribute of you 
can, but if you really, really can't, then still feel free to use my beautiful 
data.

Good discussion.
Hugh

On 30 Mar 2013, at 14:35, Kingsley Idehen <[email protected]> wrote:

> All,
> 
> " Citing sources is useful for many reasons: (a) it shows that it isn't a 
> half-baked idea I just pulled out of thin air, (b) it provides a reference 
> for anybody who wants to dig into the subject, and (c) it shows where the 
> ideas originated and how they're likely to evolve." -- John F. Sowa [1].
> 
> An HTTP URI is an extremely powerful citation and attribution mechanism. 
> Incorporate Linked Data principles and the power increases exponentially.
> 
> It is okay to consume Linked Data from wherever and publish HTML documents 
> based on source data modulo discoverable original sources Linked Data URIs.
> 
> It isn't okay, to consume publicly available Linked Data from sources such as 
> the LOD cloud and then republish the extracted content using HTML documents, 
> where the original source Linked Data URIs aren't undiscoverable by humans or 
> machines.
> 
> The academic community has always had a very strong regard for citations and 
> source references. Thus, there's no reason why the utility of Linked Data 
> URIs shouldn't be used to reinforce this best-practice, at Web-scale .
> 
> Links:
> 
> 1. http://ontolog.cim3.net/forum/ontolog-forum/2013-03/msg00084.html -- 
> ontolog list post .
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen    
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> 
> 
> 
> 
>

Re: Why is it bad practice to consume Linked Data and publish opaque HTML pages?

Reply via email to