Peter Saint-Andre wrote:
On 05/12/2008 12:53 PM, Dan Brickley wrote:
I've been hacking around with
the use of XMPP as a data bus for RDF querying using SPARQL (as Peter
well knows, being my XMPP helpline). Some notes on that at [1].
[1] http://danbri.org/words/2008/02/11/278
See also http://crschmidt.net/semweb/sparqlxmpp/
Yup, Chris wrote this up based on IRC chats after the original design
discussions I had with you. He beat me to running code :) The mapping of
SPARQL result set format to XMPP IQ markup mutated a bit over time, so
there isn't clean interop currently between his python and my jqbus java
stuff. But it's all wrong anyway since large resultsets are too big for
one IQ; need to move to some batching or attachments-based model.
I still don't quite understand SPARQL, but I'm sure that's because I'm
missing the appropriate synapses. :)
OK basic idea of SPARQL:
1. understand the RDF 'nodes and arcs' data model. Quick messy potted
version here, sorry if this is too hasty...
An RDF graph encodes a collection of simple statements or claims, which
can be visualised as an edge-labelled graph, where the inter-node edges
in this graph correspond to properties/relationships/attributes. Each
edge links something either to another thing, or to a literal value. So
each node is either a literal (which may either be tagged with a
language code, or with a datatype URI), or a non-literal. Non literals
may be labelled with a URI, or may be 'blank' (although actual
representations of this graph structure often have a
private-to-the-graph identifier; however this is invisible to RDF). The
label on each edge is itself a URI, for usual namespacing reasons.
2. think of RDF graphs as descriptions of the world; sets of claims
which may or may not be accurate. The graph can be
written/published/excanged in any of the various concrete RDF syntaxes.
RDF/XML is a common if ugly one. Also we have RDFa, Turtle/N3, and GRDDL
which uses XSLT to turn colloqiual XML into RDF graphs.
3. think of an RDF dataset as a collection of one or more of these
graphs; a system dealing with multiple such graphs identifies each with
a URI.
4. RDF querying in SPARQL is all about asking questions of one or more
such graphs. As such, an RDF query is conceptually a bit like an RDF
document, except bits can be marked as missing and labelled with
variable names.
So an RDF/XML document might encode a graph that says something equiv to:
'there exists a Movie, its :homepage property a has value which is the
URI <http://ironmanmovie.marvel.com/>; it's :title property has the
literal value 'Iron Man' and its :starring property has the literal
value "Robert Downey Jr.".'
(I'll spare you the XML version here)
By contrast SPARQL uses a non-XML notation to express questions. You
might write sparql which says,
'OK give me values ?x and ?y where ?x is the URI of the movie's
:homepage, and ?y is the title, wherever some thing that is Movie has a
:starring property with value "Robert Downey Jr.".
(that's in pseudo-sparql english for now)
And depending on the dataset you ran the query against, you might get
one row back, with ?x=http://ironmanmovie.marvel.com/ ?y="Iron Man".
Or you might get a load more rows describing other movies.
5. For now, just focus on this part of SPARQL. The bit that looks most
like SQL. We have a query which is asking for variable-to-value
bindings, tabular ... against some target data. It returns a set of
'hits' just like SQL over JDBC/ODBC/DBI etc. There are detail
differences, but conceptually it is similar.
SPARQL defines XML and JSON bindings for these result sets. The
XMPP/SPARQL binding work is an attempt to flow these through XMPP
instead of the more traditional HTTP-based bindings.
http://www.w3.org/TR/rdf-sparql-XMLres/
http://www.w3.org/TR/rdf-sparql-json-res/ ...are the formats, there's
also a protocol spec, http://www.w3.org/TR/rdf-sparql-protocol/ which
also covers the HTTP binding. By far the most work is in the query
language spec though, http://www.w3.org/TR/rdf-sparql-query/
6. er that's it. I guess I should write the above stuff in RDF/XML and
SPARQL here.
target data:
_:r1 rdf:type eg:Movie .
_:r1 eg:homepage <http://ironmanmovie.marvel.com/> .
_:r1 eg:title "Iron Man" .
_:r1 eg:starring "Robert Downey Jr." .
this could also be written
[ a eg:Movie;
eg:homepage <http://ironmanmovie.marvel.com/> ;
eg:title "Iron Man";
eg:starring "Robert Downey Jr.";
]
...in Turtle/SPARQL notation. The RDF/XML snippet for this might be
<eg:Movie xmlns:eg="http://eg.example.com/abcd#">
<eg:homepage rdf:resource="http://ironmannovie.marvel.com/"/>
<eg:title>Iron Man</eg:title>
<eg:starring>Robert Downey Jr.</eg:starring>
</eg:Movie>
A key point is that SPARQL doesn't care which notation the source data
was originally in. Everything gets normalised into the triple/graph
model, and queries are written in terms of it.
So our sample query in SPARQL might be:
PREFIX eg: <http://eg.example.com/abcd#>
SELECT ?x ?y
WHERE {
[ a eg:Movie;
eg:homepage ?x ;
eg:title ?y ;
eg:starring "Robert Downey Jr." ;
] .
}
# here 'a' is short for rdf:type, and the [ chunk bracket ] notation is
a way of writing a blank node
and the results are like this:
?x=http://ironmannovie.marvel.com/ ?y="Iron Man"
?x=http://someothermovie.com/ ?y="Some Other Movie"
Humm ok that was a bit rambling, ... does it help at all? Main point I
was aiming at here is that understanding basic sparql is a small step
from being comfortable with the RDF graph model. Since queries are
really just RDF graphs with bits labelled missing ("?x" etc), and query
results are the values from the graph that fit the pattern specified.
There is a lot of extra detail, eg. around GRAPH clause which lets you
match specific graphs in the target dataset, or for optionals, filters,
datatyping etc. But the core is pretty simple.
Hope that helps :)
cheers,
Dan
--
http://danbri.org/