Re: Facebook Linked Data

Sebastian Schaffert Mon, 03 Oct 2011 09:02:54 -0700

Dear Norman,

Sorry for replying late, I was a bit busy with other things ...



Am 28.09.2011 um 19:13 schrieb Norman Gray:

> 
> Sebastian, hello.
> 
> On 27 Sep 2011, at 13:43, Sebastian Schaffert wrote:
> 
>>> I think you're disappointed because your expectations may be wrong.
>> 
>> My expectations are my expectations. But I accept that the world maybe does 
>> not satisfy them ;-)
> 
> I often have the same feeling -- *sigh* -- I've come to think of it as the 
> tragedy of adulthood....
> 
>> But from my experience in developing software together with industry 
>> partners out there I have a good guess that my expectations will 
>> more-or-less match with the expectations of other developers. Especially 
>> those who are not very deep in Semantic Web technologies. 
> 
> I'm nervous of opening up a potentially long discussion, but I've never 
> understood what's so hard about httpRange-14.  Any time I've explained it to 
> someone -- including some pretty SemWeb-sceptical RDBMS people -- they've got 
> the idea and its importance pretty promptly.  I may have given one RDBMS 
> colleague their SemWeb insight that way.
> 
> I do appreciate that in certain circumstances, where one doesn't have good 
> control over the data being LODified, there's no option but to say, in effect
> 
>    <http://example.org/foo>
>        a foaf:Person;
>        a foaf:Document.
> 
> (I haven't looked at it, but I imagine that dbpedia either suffers from this 
> or else has had to be very clever with domains to get round it).
> 
> According to httpRange-14, of course, one of those statements must simply be 
> false.  So clients have to be smart to deal with this punning; but life is 
> hard and we know this is the wild wild web: the httpRange-14 dogma cannot be 
> absolute.

In practice I would even argue that the "inconsistency" in this data is rarely 
a problem. Because applications will simply ignore the information that is 
irrelevant to them. Inconsistent information - from my perspective - is only a 
problem when a certain kind of reasoning is applied that specifically takes 
into account both facts and thus the "ex falsum quodlibet" problem of logics 
strikes. On the WWW - as you say - we will have to live with inconsistencies 
anyways. So better welcome them and find applications that do not propagate 
errors in the data easily :)

My argument here is also that there is not really a URI identity crisis, except 
if you do the "mistake" to have both a document and a concept behind 
"http://example.org/foo";. DBPedia and other Linked Data servers have a IMHO 
clean approach to this problem:
- if you request http://example.org/foo - the text/html document, you are 
redirected to http://example.org/page/foo, which is the actual document that 
contains a human readable description of http://example.org/foo
- if you request http://example.org/foo - the RDF data, you are redirected to 
http://example.org/data/foo, which is the actual document containing the 
machine readable description of http://example.org/foo

Now if you want to speak about the human readable document or the RDF document, 
you can easily do so by using the respective URIs. The connection between the 
documents and the concepts is modelled using the HTTP redirect and thus clear 
to the client. From my perspective, this is a much cleaner and human-friendly 
approach to the problem than httpRange-14.

In our Linked Media Framework, we go even a step further by taking into account 
the MIME type. This will result in redirects like
- http://example.org/foo, Accept: text/html; rel=content -> 
http://example.org/content/text/html/foo
- http://example.org/foo, Accept: image/jpeg; rel=content -> 
http://example.org/content/image/jpeg/foo
- http://example.org/foo, Accept: application/rdf+xml; rel=meta -> 
http://example.org/meta/application/rdf+xml/foo
- http://example.org/foo, Accept: text/html; rel=meta -> 
http://example.org/meta/text/html/foo
In this case, all four documents are different descriptions of the person 
http://example.org/foo (e.g. a text, an image, an RDF document, and tabular 
metadata in HTML).


Btw, the above snippet is not inconsistent in itself. It would be if we would 
say that foaf:Person and foaf:Document are disjoint and apply some sort of 
advanced semantics (i.e. OWL, not RDF/RDFS) on it, something we implicitly do 
because we think the distinction is reasonable. But it is not explicitly stated.


> 
>>> When you dereference the URL for a person (such as .../561666514#), you get 
>>> back RDF.  Our _expectation_, of course, is that that RDF will include some 
>>> remarks about that person (.../561666514#), but there can be no guarantee 
>>> of this, and no guarantee that it won't include more information than you 
>>> asked for.  All you can reliably expect is that _something_ will come back, 
>>> which the service believes to be true and hopes will be useful.  You add 
>>> this to your knowledge of the world, and move on.
>> 
>> There I have my main problem. If I ask for "A", I am not really interested 
>> in "B".
> 
> But if one does accept the logic of httpRange-14, then 'A' is something like 
> 'B#', and it is _impossible_, as a consequence of the way HTTP is defined, to 
> dereference specifically 'A', and thus any client which exists in a world 
> with httpRange-14 in it, must necessarily be able to deal with the fact that 
> what is described in the response may not be precisely what it did the HTTP 
> transaction on.
> 
> It presumably knows that it was asked to find out about 'A' = 'B#', so it can 
> do its filtering process with that in mind, no?

In the case that I tried I was asking for .../sebastian.schaffert and I got 
back .../561666514#. There was no redirect and no information how 
sebastian.schaffert is related to 561666514#.

If I had requested .../561666514 and I got .../561666514# the situation could 
have been a bit simpler, but there are still a lot of points open. 

One of the most important ones is that the "#" character in URIs itself is 
interpreted differently depending on the syntax used and on the client that 
uses it:
- in HTML, it refers to the HTML anchor in the page, identified by the <a 
name="..."> tag
- in XML, it refers to the XML id of an element in the page, in a way that is 
often incompatible with RDF/XML (just imagine an "id" attribute on an RDF/XML 
property...)
- Web browsers often use the "#" on the client side to represent stateful 
information in Javascript
- with multimedia files, the "#" often identifies the fragment of the 
multimedia file 
  (see the work of the media fragments group: 
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec) 

So the semantics of the "#" itself is not very well defined and left to the 
browser, and in most cases it actually is used to identify a *fragment* and not 
a different thing. Using it to distinguish between document and object is in my 
opinion an abuse of the original specification.


> I agree, by the way, that we shouldn't expect that everyone in the world, 
> including RDBMS diehards and junior programmers, should be expected to 
> understand the formal subtleties here.  But that's what libraries and layers 
> are for, surely.  You do understand the difference, so you write a thin layer 
> which provides API-users with the information they expect.  Am I 
> misunderstanding a constraint?

The argument with libraries does not really hold, because in the end developers 
have to understand the underlying model to be able to use the library. Noone 
can use a relational database without knowing the basic concepts of tables, and 
only with sufficient knowledge about the relational model are developers able 
to use relational databases correctly. Sophisticated libraries like Hibernate 
make it only easier for people who already understand the model.


> 
>> - the data I get back is not about the resource I requested (discussion 
>> above), because there are competing philosophies about httpRange-14 (which 
>> is IMHO a never ending problem, unsolvable and also unnecessary in most 
>> situations), because there are several different recommendations about how 
>> to publish data on the web, or because some service somehow decides that 
>> some other data might be more useful or interesting than the one I asked for
> 
> I'm prolonging this discussion because I'm trying to publish linked data 
> myself (I just need to twist a few more SemWeb-sceptical arms), I believe I 
> thoroughly understand the pattern and the point, and so I would be interested 
> to find out if I've somehow drifted away from the mainstream.
> 
Not necessarily from the mainstream, which seems to have adopted mostly the 
httpRange-14. But there are also some people criticising this decision, e.g. 
the good summary at:
- http://dfdf.inesc-id.pt/tr/web-arch


Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Facebook Linked Data

Reply via email to