Peter Saint-Andre wrote:
On 05/12/2008 12:53 PM, Dan Brickley wrote:
I've been hacking around with
the use of XMPP as a data bus for RDF querying using SPARQL (as Peter
well knows, being my XMPP helpline). Some notes on that at [1].

[1] http://danbri.org/words/2008/02/11/278

See also http://crschmidt.net/semweb/sparqlxmpp/

Yup, Chris wrote this up based on IRC chats after the original design discussions I had with you. He beat me to running code :) The mapping of SPARQL result set format to XMPP IQ markup mutated a bit over time, so there isn't clean interop currently between his python and my jqbus java stuff. But it's all wrong anyway since large resultsets are too big for one IQ; need to move to some batching or attachments-based model.

I still don't quite understand SPARQL, but I'm sure that's because I'm
missing the appropriate synapses. :)

OK basic idea of SPARQL:

1. understand the RDF 'nodes and arcs' data model. Quick messy potted version here, sorry if this is too hasty...

An RDF graph encodes a collection of simple statements or claims, which can be visualised as an edge-labelled graph, where the inter-node edges in this graph correspond to properties/relationships/attributes. Each edge links something either to another thing, or to a literal value. So each node is either a literal (which may either be tagged with a language code, or with a datatype URI), or a non-literal. Non literals may be labelled with a URI, or may be 'blank' (although actual representations of this graph structure often have a private-to-the-graph identifier; however this is invisible to RDF). The label on each edge is itself a URI, for usual namespacing reasons.

2. think of RDF graphs as descriptions of the world; sets of claims which may or may not be accurate. The graph can be written/published/excanged in any of the various concrete RDF syntaxes. RDF/XML is a common if ugly one. Also we have RDFa, Turtle/N3, and GRDDL which uses XSLT to turn colloqiual XML into RDF graphs.

3. think of an RDF dataset as a collection of one or more of these graphs; a system dealing with multiple such graphs identifies each with a URI.

4. RDF querying in SPARQL is all about asking questions of one or more such graphs. As such, an RDF query is conceptually a bit like an RDF document, except bits can be marked as missing and labelled with variable names.

So an RDF/XML document might encode a graph that says something equiv to:

'there exists a Movie, its :homepage property a has value which is the URI <http://ironmanmovie.marvel.com/>; it's :title property has the literal value 'Iron Man' and its :starring property has the literal value "Robert Downey Jr.".'

(I'll spare you the XML version here)

By contrast SPARQL uses a non-XML notation to express questions. You might write sparql which says,

'OK give me values ?x and ?y where ?x is the URI of the movie's :homepage, and ?y is the title, wherever some thing that is Movie has a :starring property with value "Robert Downey Jr.".

(that's in pseudo-sparql english for now)

And depending on the dataset you ran the query against, you might get one row back, with ?x=http://ironmanmovie.marvel.com/ ?y="Iron Man".
Or you might get a load more rows describing other movies.

5. For now, just focus on this part of SPARQL. The bit that looks most like SQL. We have a query which is asking for variable-to-value bindings, tabular ... against some target data. It returns a set of 'hits' just like SQL over JDBC/ODBC/DBI etc. There are detail differences, but conceptually it is similar.

SPARQL defines XML and JSON bindings for these result sets. The XMPP/SPARQL binding work is an attempt to flow these through XMPP instead of the more traditional HTTP-based bindings.

http://www.w3.org/TR/rdf-sparql-XMLres/
http://www.w3.org/TR/rdf-sparql-json-res/ ...are the formats, there's also a protocol spec, http://www.w3.org/TR/rdf-sparql-protocol/ which also covers the HTTP binding. By far the most work is in the query language spec though, http://www.w3.org/TR/rdf-sparql-query/

6. er that's it. I guess I should write the above stuff in RDF/XML and SPARQL here.

target data:

_:r1 rdf:type eg:Movie .
_:r1 eg:homepage <http://ironmanmovie.marvel.com/> .
_:r1 eg:title "Iron Man" .
_:r1 eg:starring "Robert Downey Jr." .

this could also be written

[ a eg:Movie;
  eg:homepage <http://ironmanmovie.marvel.com/> ;
  eg:title "Iron Man";
  eg:starring "Robert Downey Jr.";
]

...in Turtle/SPARQL notation. The RDF/XML snippet for this might be

<eg:Movie xmlns:eg="http://eg.example.com/abcd#";>
 <eg:homepage rdf:resource="http://ironmannovie.marvel.com/"/>
 <eg:title>Iron Man</eg:title>
 <eg:starring>Robert Downey Jr.</eg:starring>
</eg:Movie>

A key point is that SPARQL doesn't care which notation the source data was originally in. Everything gets normalised into the triple/graph model, and queries are written in terms of it.

So our sample query in SPARQL might be:

PREFIX eg: <http://eg.example.com/abcd#>
SELECT ?x ?y
WHERE {
[ a eg:Movie;
  eg:homepage ?x ;
  eg:title ?y ;
  eg:starring "Robert Downey Jr." ;
] .
}

# here 'a' is short for rdf:type, and the [ chunk bracket ] notation is a way of writing a blank node

and the results are like this:

?x=http://ironmannovie.marvel.com/ ?y="Iron Man"
?x=http://someothermovie.com/ ?y="Some Other Movie"





Humm ok that was a bit rambling, ... does it help at all? Main point I was aiming at here is that understanding basic sparql is a small step from being comfortable with the RDF graph model. Since queries are really just RDF graphs with bits labelled missing ("?x" etc), and query results are the values from the graph that fit the pattern specified.

There is a lot of extra detail, eg. around GRAPH clause which lets you match specific graphs in the target dataset, or for optionals, filters, datatyping etc. But the core is pretty simple.

Hope that helps :)

cheers,

Dan

--
http://danbri.org/

Reply via email to