Re: Linking non-open data

Chris Bizer Fri, 18 Apr 2008 06:49:11 -0700


Hi Peter,

reading your "ramblings", they actually make a lot of sence to me and Ithink I even like them better than my own initial ideas on the problem asyour approach nicely avoids the owl:sameAs.


Anybody else further ideas?

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
[EMAIL PROTECTED]
www.bizer.de

----- Original Message -----From: Peter Coetzee

To: Chris Bizer

Cc: Matthias Samwald ; [email protected] ; Tassilo Pellegrini ; AndreasBlumauer (Semantic Web Company)

Sent: Friday, April 18, 2008 11:06 AM
Subject: Re: Linking non-open data


Hi Chris,

I like the sound of this, as a neat and elegant way to work round theproblem. The only concern I'd have, is that it lacks any "backwards" linksfrom the protected to the public data object. For example, if my agent findsthe triple



http://mydomain//resource/myResource foaf:interest
http://yourDomain/resource/ProtectedDataAboutObjectX

in some document out there, and *doesn't* have (and cannot get) thecredentials to access http://yourDomain/resource/ProtectedDataAboutObjectX,it has no way of knowing that it might be able to get some data about the"real" object (please excuse my loose language!) being discussed fromhttp://yourDomain/resource/PublicDataAboutObjectX, or does it? Note, I'massuming here that my agent hasn't encountered the owl:sameAs elsewhere onits 'travels'.

I guess there are two obvious solutions to me; either every time we refer toProtectedDataAboutObjectX, we must also include the owl:sameAs toPublicDataAboutObjectX, or we must always refer to PublicDataAboutObjectXand rely on its linked-ness into ProtectedDataAboutObjectX to get at thatdata if we have credentials. Hmmm - both feel a little bit cumbersome to me,what do you think?

On a slightly separate (and less tangible) note, I feel slightlyuncomfortable with the notion of "refer to that URI about ObjectX because Iknow what data it will serve" - in theory (when the whole world ispassionate about interlinking their datasets ;) ), shouldn't it be ok torefer to any URI for the object, and (perhaps eventually) get to whicheverdata you seek? I recognise that in practise this would be unnecessarilyinefficient, but stick with me for a minute! As an extension of thatfeeling, it strikes me as odd to mint two different URIs for the same thing,solely to get around a mechanical issue like authentication. Perhaps whatI'm getting at then is something more along the lines of:

1. Use the resource http://yourDomain/resource/ObjectX to refer to theresource itself (always)2. When someone dereferences http://yourDomain/resource/ObjectX, they arerequired to attempt to authenticate3a. If the client fails to authenticate, they are presented with only thepublic data - perhaps by using a suitable redirect tohttp://yourDomain/resource/PublicDataAboutObjectX (note - no owl:sameAsneeded, as we're always referring to http://yourDomain/resource/ObjectX)3b. If the client provides sufficient credentials, they are presented withthe protected data as well (again, either directly or through a redirect tohttp://yourDomain/resource/ProtectedDataAboutObjectX; whichever is deemed tobe more "pure")

This mechanism would also permit the server on http://yourDomain/ to servedifferent facets on the data depending on the user who has authenticated(e.g. it may be that a "student" user can't see as much data as a"supervisor", etc). It also removes (I think?) the risk of agents reachingan unnecessary dead-end when they follow a link tohttp://yourDomain/resource/ProtectedDataAboutObjectX.

Apologies for the fairly rambling train of thought - I hope it was vaguelycoherent!


Any thoughts?

Cheers,
Peter



On Fri, Apr 18, 2008 at 4:21 AM, Chris Bizer <[EMAIL PROTECTED]> wrote:

Hi Peter,

One of the problems this presents though, is how to advertise the datathat'savailable for a user. Perhaps something like the Semantic Web SitemapExtension[1] could be used / extended to say what data is available behind thisauthentication,so that an agent knows whether or not it's interested in trying to findcredentials for it
(e.g. prompting a user)?

Building on the Sitemap Extension would be one option, but I thinkadvertising could also work much simpler just by setting RDF links to theaccess protected resources.


So you could do have something like this:

1. Use http://yourDomain/resource/PublicDataAboutObjectX to identify yourresource and the public data about it.

2. If some client dereferences this URI it would get the public datacontaining a RDF link like this

http://yourDomain/resource/PublicDataAboutObjectX owl:sameAshttp://yourDomain/resource/ProtectedDataAboutObjectX

3. If the client would then try to dererferencehttp://yourDomain/resource/ProtectedDataAboutObjectX it would be asked toprovide some credentials.

Using this mechanism, external data providers could also link to theprotected data, for instance:


http://mydomain//resource/myResource foaf:interest
http://yourDomain/resource/ProtectedDataAboutObjectX

What do you think?

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
[EMAIL PROTECTED]
www.bizer.de

----- Original Message -----From: Peter Coetzee

To: Chris Bizer ; Matthias Samwald

Cc: [email protected] ; Tassilo Pellegrini ; Andreas Blumauer (Semantic WebCompany)

Sent: Thursday, April 17, 2008 2:03 PM
Subject: Re: Linking non-open data


Hi all,


On Thu, Apr 17, 2008 at 12:25 PM, Chris Bizer <[EMAIL PROTECTED]> wrote:


Hi Matthias,

A question that will surely arise in many places when more people get toknow about the linked data initiative and the growing infrastructure oflinked open data is: how can these principles be applied to organizationaldata that might not / only partially be open to the public web?

I think applying the Linked Data principles within a corporate intranet doesnot pose any specific requirements. It is just that the data is notaccessable from the outside.

It sounds to me like deploying linked data over an intranet would be towardsthe "trivial" side of solutions - what about when data is out on (dare Isay, in? ;) ) the web fully, but you need to control access to it (i.e. theauthentication Matthias describes). I like the idea of using standard HTTPauthentication for this - it just seems like the "right" mechanism to use.One of the problems this presents though, is how to advertise the datathat's available for a user. Perhaps something like the Semantic Web SitemapExtension [1] could be used / extended to say what data is available behindthis authentication, so that an agent knows whether or not it's interestedin trying to find credentials for it (e.g. prompting a user)?

People will soon try to develop practices for selectively protecting partsof their linked data with fine-grained access rights. Could simple HTTPauthentication be useful for linked data?

As Linked Data heavily relies on HTTP anyway, I think HTTP authenticationshould be the first choise and people having these requirements shoud checkif they can go with HTTP auth.

How does authentication work for SPARQL endpoints containing several namedgraphs?

Of course you can always make things as difficult as you like. But I guessfor many use cases an all-or-nothing aproach is good enough, which wouldallow HTTP authentication to be used again.

If you wanted slightly more fine-grained control, I don't see any reason youcan't still use HTTP auth - if you pass the authenticated user detailsthrough to whatever framework you're using on the backend to handle SPARQL,and then check "does this user have permissions" for each of the namedgraphs mentioned in the query.

Can we use RDF vocabularies to represent access rights? Should suchvocabularies be standardized?

Sure, but I think all work in this area should be based on clearly motivatedreal-world use cases and collecting these use cases should be the first stepbefore starting to define vocabularies.

Is there any ongoing work on defining such practices (or even 'bestpractices')?

There is lots of work on using RDF, OWL and different rules languages torepresent access control proicies. See for instance Rei, KAoS and Protune orthe SemWeb policy workshop athttp://www.l3s.de/~olmedilla/events/2006/SWPW06/ , for older work alsohttp://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/

But I guess a lot of this will be a bit over-the-top for the common linkeddata use cases.


Cheers

Chris





Cheers,
Matthias Samwald

I've given some thought to this before, but not put much down on paper yet -it occurs to me that this kind of standardisation would be a powerfulcomponent for a semantic web publishing framework of some description. Idon't know if Virtuoso is playing in this space at all (other than briefsessions toggling with it from the client side, I've not really explored itspotential yet!), but I've had a project on the back burner for a couple ofmonths to try and handle some issues like this (as well as the contentnegotiation, and some other aspects) easily for people wishing to publishdata in the semantic web. If there's likely to be some interest in such a(probably free / oss) project, I can dust it off when I get some time andsee about bringing it to some kind of completion!


Cheers,
Peter

[1] - http://sw.deri.org/2007/07/sitemapextension/

Re: Linking non-open data

Reply via email to