Re: Facebook Linked Data

Kingsley Idehen Tue, 27 Sep 2011 06:42:30 -0700

On 9/27/11 7:43 AM, Sebastian Schaffert wrote:

Am 27.09.2011 um 09:44 schrieb Norman Gray:

I am disappointed because I asked for data about http://graph.facebook.com/561666514 and 
got back data about http://graph.facebook.com/561666514# - this is my main concern. Maybe 
I should ask for http://graph.facebook.com/561666514# in the first place and manually 
remove the trailing "#" like a browser does. But I prefer a predictable, 
well-defined, and universal (over all LD services) behaviour.

I think you're disappointed because your expectations may be wrong.

My expectations are my expectations. But I accept that the world maybe does not 
satisfy them ;-)


But from my experience in developing software together with industry partners 
out there I have a good guess that my expectations will more-or-less match with 
the expectations of other developers. Especially those who are not very deep in 
Semantic Web technologies.

We are working together with many IT companies (with excellent software 
developers) and trying to convince them that Semantic Web technologies are 
superior for information integration. They are already overwhelmed when they 
have to understand that a database ID for an object is not enough.

If they understand what a Database Object Identifier is. Then they sure well understand what a Data Object Identifier is. And from their its trivial for them to grok why the use of de-referencable URIs == SuperKeys++. As I stated recently, RDBMS identifiers such as primary and foreign keys promise a lot, but in reality deliver so little. On the other hand, URIs deliver on endless promise. We have the World Wide Web as exhibit #1.

As is typically the case these days, you can take an alternative approach by completely reinventing terminology that comes across as gobbledygook to folks that have come to understand these matters in different realms.

  If they have to start distinguishing between the data object and the real 
world entity the object might be representing, they will be lost completely.

This is all they have to do, which most in the IT realm have actually groked for eons modulo use of HTTP based de-referencable URIs:


A Data Object can represent a real-world Entity.
The Data Object must be unambiguously Named.

Its actual Representation (as expression and serialization time) is best served via an EAV/SPO based directed graph.

Accessing the actual Data Object (its Representation) occurs via an Address.

You can use URIs as unambiguous Object Names.

You can use URLs (a kind of URI) to unambiguously Name the location of a Data Object.

A Data Object Name is distinct from Data Object access Address.
Courtesy of indirection, you can access a Data Object by Name or Address.

Indirection remains a key mechanism for solving problems in computing. That didn't start with Linked Data and won't end with Linked Data.

HTTP URIs make resolvable (de-referencable) Name based indirection cheap (albeit somewhat unintuitive) due to HTTP based WWW ubiquity.

The answers don't lie in Semantic Web literature, far from it, you have to look to the broader realm of computer science fort that.

Linked Data, like the Web in general, boils down to ingenious use of Hyperlinks to extend the scale of old concepts.

If Data Object Names and Addresses weren't distinct, we wouldn't even be able to send email or use any other computer program. It just so happens that what's hidden by the combination of operating systems and programming languages is being exposed to a higher level, courtesy of WWW ubiquity. Trouble is that in this higher level of exposure we have a broad spectrum of audience skills and experience levels re. computer science fundamentals and industry history.

Imagine if we didn't end up with "resource" as a tactical replacement for "object", with regards to terminology. Imagine that?

Imagine the same if we had EAV triples + power of URIs instead of SPO triples where the "Object" simply adds yet another chunk of confusion re. computer science. The Object of a literary sentence != computer Object, but when you bring it into a computing space, conflation occurs, and we end up with 12+ years trying to untangle the mess.



Links:

1. http://www.cs.cmu.edu/afs/cs.cmu.edu/user/clamen/OODBMS/Manifesto/htManifesto/node4.html - Object Identity

2. http://www.w3.org/Addressing/rfc1630.txt -- Universal Resource Identifiers in WWW

3. http://www.w3.org/People/Connolly/9703-web-apps-essay.html - Distributed objects are the very heart of the Web, and have been since its invention -- Dan Connolly essay from way back

4. http://goo.gl/y7Gq4 -- my G+ note that deconstructs how Facebook have implemented a Linked Data Space without disruption to their existing infrastructure or business model.



Kingsley

When you dereference the URL for a person (such as .../561666514#), you get 
back RDF.  Our _expectation_, of course, is that that RDF will include some 
remarks about that person (.../561666514#), but there can be no guarantee of 
this, and no guarantee that it won't include more information than you asked 
for.  All you can reliably expect is that _something_ will come back, which the 
service believes to be true and hopes will be useful.  You add this to your 
knowledge of the world, and move on.

There I have my main problem. If I ask for "A", I am not really interested in "B". What
our client implementation therefore does is to throw away everything that is about B and only keeps data
about A. Which is - in case of the FB data - nothing. The reason why we do this is that often you will get
back a large amount of irrelevant (to us) data even if you only requested information about a specific
resource. I am not interested in the 999 other resources the service might also want to offer information
about, I am only interested in the data I asked for. Also, you need to have some kind of "handle"
on how to start working with the data you get back, like:
1. I ask for information about A, and the server gives me back what it knows
about A (there, my expectation again ...)
2. From the data I get, I specifically ask for some common properties, like A
foaf:name ?N and do something with the bindings of N. Now how would I know how
to even formulate the query if I ask for A but get back B?

Of course I should instead of asking for http://graph.facebook.com/561666514 have asked for the person
"http://graph.facebook.com/561666514#"; and stripped the trailing hash and then applied my
filtering on the result. My mistake, but this was also not obvious in the service description sent out
by Jesse (ok, my "httpRange-14 alarm" should have signaled a potential danger ...").

The concept of "knowledge of the world" is too abstract for practical implementations:
the fact that I can only expect that "something" comes back that (in one of the many
different syntaxes) somehow corresponds to the RDF model is a very weak contract. It does not
really go beyond what e.g. the Facebook OpenGraph API or other services that are not using Semantic
Web technologies already offer.

I sometimes have the feeling that most of the Linked Data world is currently concerned with "somehow
publishing all data out there" without being too clear about the "somehow" amd without taking
into account the people who are supposed to *use* that data. The "somehow" currently includes:
- about 10 different syntaxes (RDF/XML, N3, Turtle, RDFa, JSON-LD, RDF/JSON,
...), many of which are not really solvable via content negotiation (e.g.
JSON-LD and RDF/JSON both have content type application/json, N3 and ntriples
has content type text/plain (or sometimes text/rdf+n3; level=XY)
- the data I get back is not about the resource I requested (discussion above),
because there are competing philosophies about httpRange-14 (which is IMHO a
never ending problem, unsolvable and also unnecessary in most situations),
because there are several different recommendations about how to publish data
on the web, or because some service somehow decides that some other data might
be more useful or interesting than the one I asked for
- the data I get back uses different, unconnected vocabularies for the same
thing (try getting information about the same person from DBPedia, Freebase,
Facebook, and that person's FOAF file - getting the *name* alone is a serious
issue with many workarounds

I am not really complaining, I just wanted to point out issues that still need
to be solved. And of course the problem is not really only the Linked Data
published by Jesse and Facebook, this was just a starting point because I ran
into troubles there.

How much or how little information comes back is an engineering or UI decision 
on the part of the service.

... but this obviously is a serious factor in the usefulness of the service. 
Which was my initial point.

Or, put another way:

But Linked Data could do better: there could be a uniform way of accessing the 
data and a unified contract about what comes back.


There _is_ a uniform way of accessing the data: you dereference the 
non-fragment bit of a thing's name and read what comes back.  And there is a 
uniform contract: the RDF that comes back is something the service believes may 
be useful/interesting to you, and should include further places to look.

Yes, it would be _nice_ if the contract were stronger, but this is the web, and 
the LD pattern's key insight is that this degree of _very_ loose coupling is 
practical and useful.


In principle I agree. But the usefulness has yet to be proven, and I fear that 
the very weak contract is not enough to show the advantages over competing 
technologies. Maybe this is not necessary as long as the data somehow gets more 
easily accessible. But as a Semantic Web community we also have a certain 
hypothesis that the technologies *we* are coming up with are better than what 
is already out there.

Btw, the statement of _very_ loose coupling is for me in total contradiction with the 
httpRange-14 discussion: for instance, someone who is interested in "elephants" 
would probably simply link in his FOAF file to http://dbpedia.org/resource/Elephant, 
which is of course NOT the proper identifier for the elephant but only the document 
containing the data. In the same way, I would probably link in my FOAF file to my 
Facebook account using foaf:holdsAccount http://graph.facebook.com/561666514 and not 
http://graph.facebook.com/561666514# ...


Greetings,

Sebastian



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Facebook Linked Data

Reply via email to