Re: Facebook Linked Data

Sebastian Schaffert Tue, 27 Sep 2011 04:46:48 -0700

Am 27.09.2011 um 09:44 schrieb Norman Gray:
>> 
>> I am disappointed because I asked for data about 
>> http://graph.facebook.com/561666514 and got back data about 
>> http://graph.facebook.com/561666514# - this is my main concern. Maybe I 
>> should ask for http://graph.facebook.com/561666514# in the first place and 
>> manually remove the trailing "#" like a browser does. But I prefer a 
>> predictable, well-defined, and universal (over all LD services) behaviour.
> 
> I think you're disappointed because your expectations may be wrong.


My expectations are my expectations. But I accept that the world maybe does not 
satisfy them ;-)

But from my experience in developing software together with industry partners 
out there I have a good guess that my expectations will more-or-less match with 
the expectations of other developers. Especially those who are not very deep in 
Semantic Web technologies. 

We are working together with many IT companies (with excellent software 
developers) and trying to convince them that Semantic Web technologies are 
superior for information integration. They are already overwhelmed when they 
have to understand that a database ID for an object is not enough. If they have 
to start distinguishing between the data object and the real world entity the 
object might be representing, they will be lost completely.


> 
> When you dereference the URL for a person (such as .../561666514#), you get 
> back RDF.  Our _expectation_, of course, is that that RDF will include some 
> remarks about that person (.../561666514#), but there can be no guarantee of 
> this, and no guarantee that it won't include more information than you asked 
> for.  All you can reliably expect is that _something_ will come back, which 
> the service believes to be true and hopes will be useful.  You add this to 
> your knowledge of the world, and move on.

There I have my main problem. If I ask for "A", I am not really interested in 
"B". What our client implementation therefore does is to throw away everything 
that is about B and only keeps data about A. Which is - in case of the FB data 
- nothing. The reason why we do this is that often you will get back a large 
amount of irrelevant (to us) data even if you only requested information about 
a specific resource. I am not interested in the 999 other resources the service 
might also want to offer information about, I am only interested in the data I 
asked for. Also, you need to have some kind of "handle" on how to start working 
with the data you get back, like:
1. I ask for information about A, and the server gives me back what it knows 
about A (there, my expectation again ...)
2. From the data I get, I specifically ask for some common properties, like A 
foaf:name ?N and do something with the bindings of N. Now how would I know how 
to even formulate the query if I ask for A but get back B?

Of course I should instead of asking for http://graph.facebook.com/561666514 
have asked for the person "http://graph.facebook.com/561666514#"; and stripped 
the trailing hash and then applied my filtering on the result. My mistake, but 
this was also not obvious in the service description sent out by Jesse (ok, my 
"httpRange-14 alarm" should have signaled a potential danger ...").


The concept of "knowledge of the world" is too abstract for practical 
implementations: the fact that I can only expect that "something" comes back 
that (in one of the many different syntaxes) somehow corresponds to the RDF 
model is a very weak contract. It does not really go beyond what e.g. the 
Facebook OpenGraph API or other services that are not using Semantic Web 
technologies already offer.

I sometimes have the feeling that most of the Linked Data world is currently 
concerned with "somehow publishing all data out there" without being too clear 
about the "somehow" amd without taking into account the people who are supposed 
to *use* that data. The "somehow" currently includes:
- about 10 different syntaxes (RDF/XML, N3, Turtle, RDFa, JSON-LD, RDF/JSON, 
...), many of which are not really solvable via content negotiation (e.g. 
JSON-LD and RDF/JSON both have content type application/json, N3 and ntriples 
has content type text/plain (or sometimes text/rdf+n3; level=XY)
- the data I get back is not about the resource I requested (discussion above), 
because there are competing philosophies about httpRange-14 (which is IMHO a 
never ending problem, unsolvable and also unnecessary in most situations), 
because there are several different recommendations about how to publish data 
on the web, or because some service somehow decides that some other data might 
be more useful or interesting than the one I asked for
- the data I get back uses different, unconnected vocabularies for the same 
thing (try getting information about the same person from DBPedia, Freebase, 
Facebook, and that person's FOAF file - getting the *name* alone is a serious 
issue with many workarounds

I am not really complaining, I just wanted to point out issues that still need 
to be solved. And of course the problem is not really only the Linked Data 
published by Jesse and Facebook, this was just a starting point because I ran 
into troubles there.

> 
> How much or how little information comes back is an engineering or UI 
> decision on the part of the service.

... but this obviously is a serious factor in the usefulness of the service. 
Which was my initial point.


> 
> Or, put another way:
> 
>> But Linked Data could do better: there could be a uniform way of accessing 
>> the data and a unified contract about what comes back.
> 
> 
> There _is_ a uniform way of accessing the data: you dereference the 
> non-fragment bit of a thing's name and read what comes back.  And there is a 
> uniform contract: the RDF that comes back is something the service believes 
> may be useful/interesting to you, and should include further places to look.
> 
> Yes, it would be _nice_ if the contract were stronger, but this is the web, 
> and the LD pattern's key insight is that this degree of _very_ loose coupling 
> is practical and useful.


In principle I agree. But the usefulness has yet to be proven, and I fear that 
the very weak contract is not enough to show the advantages over competing 
technologies. Maybe this is not necessary as long as the data somehow gets more 
easily accessible. But as a Semantic Web community we also have a certain 
hypothesis that the technologies *we* are coming up with are better than what 
is already out there.

Btw, the statement of _very_ loose coupling is for me in total contradiction 
with the httpRange-14 discussion: for instance, someone who is interested in 
"elephants" would probably simply link in his FOAF file to 
http://dbpedia.org/resource/Elephant, which is of course NOT the proper 
identifier for the elephant but only the document containing the data. In the 
same way, I would probably link in my FOAF file to my Facebook account using 
foaf:holdsAccount http://graph.facebook.com/561666514 and not 
http://graph.facebook.com/561666514# ...


Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Facebook Linked Data

Reply via email to