Re: Querying only the default graph from the data store

Lee Feigenbaum Fri, 07 Sep 2012 07:38:13 -0700

[moved to public-sparql-dev]

I have a related question -- do all quad stores / named graph storesinclude a default graph? If the store that you develop or use does havea default graph, does that graph also have a name (URI)?

Answering for Anzo: Anzo does not have a default graph. All graphs arenamed with URIs.


Lee

On 9/7/2012 7:44 AM, Barry Bishop wrote:

Hello Axel,

On 05/09/12 21:14, Polleres, Axel wrote:
Thanks Barry,
Since you confirm that the response addresses your comment, pleaseconsider this reply informal (chair-hat off).
I feel this is a shame, as two different implementations can
produce different output from the simplest of queries, e.g.
SELECT * { ?s ?p ?o }
I personally find this quite normal... different endpoints
respond differently to such query since they refer to differentdefault datasets, i.e.Naturally when I query dbpedia.org I qury a different dataset thandata.semanticweb.org, etc.
Well, dbpedia.org and data.semanticweb.org sparql endpoints makedifferent data available, so I suppose you would naturally getdifferent results to the same query. However, this is not what I wasgetting at. In fact, I'm not sure I have managed to get my pointacross at all. Perhaps another hypothetical example:
Suppose you run a development team that builds an application thatinteracts with some public sparql endpoint, say http://xyz.org/sparql- then one day xyz.org start to have scalability problems and decideto upgrade their RDF database to some expensive new thing. Both oldand new RDF databases are fully compliant with W3C, but after theyupgrade your application is completely broken only because the twodatabase implementations construct their RDF dataset differently whenno FROM clauses are given. I am sure you wouldn't find it so naturalin this case.
There are some workarounds as you say, but not in all cases. When youare using someone else's database and don't get to decide how theypartition their data in to separate graphs, then you can be completelystuck. As fabulous as the query language is (and I do think it istremendous achievement), this ambiguity over constructing a datasetwhen there are no FROMs is a bit of a hole.
Notably, I'd like to also point you to the another document withinthe SPARQL1.1 specification,
i.e. the service-description document at
http://www.w3.org/TR/sparql11-service-description/
which provides means to describe which graphs compose the default
dataset of a particular service endpoint.
Particularly, the property
http://www.w3.org/TR/sparql11-service-description/#sd-defaultDataset
is intended to provide a description of the default dataset that anendpoint uses.Note also that the service desription voaculary is extensible, andwhat we specify now is only a core, but other vocabulary can be usedto extend this (e.g. VoID)
All well and good, if this feature is actually provided by anendpoint. However, it requires quite a lot of programming for a clientto work all this out and re-write queries accordingly. And actually,it still doesn't help - e.g. if the endpoint you want to useconstructs the dataset as an RDF merge of all graphs (when no FROMclauses are given [I need to find an abbreviation for this]) and youonly want to query the default graph, then you just can't do it. Thereis no way to tell such an endpoint that you only want the defaultgraph using the query language.
The problem is basically that the default graph is special - becauseit doesn't have an identifier it can not be used in the same way asnamed graphs....
... in the query language. However, in the update language theappropriate syntax has already been created and would be the perfectcomplement to the query language, e.g. if I can do this:
    CLEAR DEFAULT

why can't I do this:

    SELECT *
    FROM DEFAULT
    {...}
and specify absolutely unambiguously that I want my query to execute*only* over the default graph in the database. No matter how animplementation constructs its dataset when no FROM clauses are given,this syntax should always work in the expected way.
Since I am rambling on, the related keywords from the update languagewould also be very useful, e.g. one can clear all graphs like this:
    CLEAR ALL

so why not be able to do this:

    SELECT *
    FROM ALL
    {...}
This would help in the opposite case, when an implementationconstructs the dataset using only the default graph (when no FROMclauses are given). In this situation, it is not possible to query forthe graph names (using select distinct ?g {graph ?g {?s ?p ?o}}), sothe above would say: "please merge all graphs for input to my query,even though I don't know what their names are and have no way offinding out (using the query language)".
These things might not seem important, but they are life and death toapplication programmers. Right now, to build an application that needsto interact with a sparql endpoint that is only known at runtime isfraught with difficulties. Not the least of which is that if yourapplication is required to query data only from the default graph,then there is no way to write a query that is guaranteed to do this onall (W3C compliant) sparql endpoints.
Which I still feel is a bit of a shame.

barry
As for the rest of your response, we seem to agree that what you'reaiming atis rather a new feature than something this working group can addresswithin its current
charter and resources.

Best regards,
Axel
-----Original Message-----
From: Barry Bishop [mailto:[email protected]]
Sent: Mittwoch, 05. September 2012 19:49
To: Polleres, Axel
Cc: [email protected]
Subject: Re: Querying only the default graph from the data store

Hello Axel,

Thanks for taking the time to reply. I realise this thread is
somewhat out of place given the status/progress of the WG.

Your reply does address my initial post. It does not resolve
it, but this is perhaps not the time. However, for the
purpose of clarity I will make further comments inline:

On 05/09/12 04:11, Polleres, Axel wrote:
Hi Barry,

This is in response to
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Aug/0
011.html
The working draft does not specify how the RDF dataset is
constructed
when no FROM and FROM NAMED clauses are present in the
SPARQL query.
Implementations are therefore able to construct the dataset
differently, e.g.
a. dataset default graph contains only the data store's
default graph
b. dataset default graph contains the RDF merge of all
graphs in the
data store
It is correct that how the concrete default dataset of a
SPARQL endpoint is conctructed is left open to
implementations. Since different endpoints and
implementations support different behaviours in this regard
(e.g. in some implementations the default graph of the
default dataset is the union of all named graphs whereas in
others this is not the case), the working group does not feel
that there is a unique standard behavior to be advocated this
time around.

I feel this is a shame, as two different implementations can
produce different output from the simplest of queries, e.g.
SELECT * { ?s ?p ?o }

However, this is a separate issue.
As soon as a single FROM or FROM NAMED clause is used then
the data
store's default graph is excluded from the query's dataset.

Which means that there is no portable way to defne a
SPARQL query so
that it executes only against the default graph in the
data store -
or even against a combination of the default graph and one or more
named graphs.
Please note that a) querying the default graph in the
datastore is the standard behavior when no explicit FROM or
FROM NAMED clauses are given. b) the combination of querying
named graphs and the default graph of the endpoint's default
dataset is supported via GRAPH graph patterns.

a) This is rather inconsistent. Above you say that the
construction of the default RDF dataset (when no FROM/FROM
NAMED clauses are given) is not defined, but here you say
constructing it using the default graph only is the 'standard
behaviour'. One of the motivations for this post is that
there are good reasons not to have only the default graph in
the 'default dataset', e.g. you wouldn't be able to do this
to find out the graph names when presented with an unknown endpoint:

SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o } }

Anyway, the point here is that there is no *portable* way to
query just the default graph.

b) yes, but you can't query the RDF merge of the default
graph and a named graph in the same way with two named
graphs, e.g. FROM ex:g1 FROM ex:g2. Instead one would need to
use a triple and graph pattern union, which for complex
queries becomes cumbersome. Put another way, any combination
of named graphs can be merged and explored with query triple
patterns, but this can't be done with any combination of
named graphs and the default graph.
See also examples below.
This is a problem that often confuses users of RDF data
stores and is
likely to lead to implementations that provide their own specific
means to achieve this, e.g.
http://www.openrdf.org/issues/browse/SES-850

Inspired by the update language's use of the 'DEFAULT' keyword for
graph manipulation, I suggest an extension to the query
language that
allows "FROM DEFAULT" to be used, e.g.

SELECT *
FROM DEFAULT
WHERE { ..... }

=> dataset contains a default graph made up of the data store's
default graph only
Please note that this the standard behaviour when no FROM clause is
given, i.e. this corresponds to

SELECT *
WHERE { ..... }       <--- (no use of GRAPH keyword)
I don't think this is "standard behaviour", rather it is
common behaviour. It can not be standard when the
construction of the dataset is implementation dependent when
no FROM clause is given.
This construct can be used with any number of FROM <uri>
or FROM NAMED
<uri> clauses, e.g.

SELECT *
FROM DEFAULT
FROM <http://example.com#g1>
WHERE { ..... }

=> dataset contains a default graph made up of the data
store's default
graph merged with the contents of the data store's g1 graph
This would be a fairly trivial change for exisiting sparql
processor
implementations, but would provide a big improvement in
functionality/flexibility by allowing a data store's
default graph to be
used/queried/merged in the same way as any of it's named graphs.
Note that similar to the example above, you can query the
default graph and named graphs within the default dataset in
a data store side by side by using GRAPH graph patterns, i.e.
   SELECT *
   WHERE
   {
     .....                              <-- (no use of
GRAPH) matches the default graph
     GRAPH <http://ex.com#g1> { .... }  <-- matches named
graph g1 (assuming g1 is a named graph in the default dataset)
   }
Consider an application that needs to execute queries over various
subsets of a database's contents, where the subsets are defined using
various combinations of named graphs. It would certainly be useful to
have standard queries which only required the appropriate
"FROM g1 FROM
g2 etc" prepended. This is easy to do, unless one of the
graphs is the
default graph.
Finally, note that it is not possible in SPARQL1.1 to
construct a *new* dataset composed of *parts* of the default
dataset of an endpoint plus possible external graphs; such a
feature currently not foreseen in the features addressed in
this round of SPARQL, but had been suggested before [1].
The features being worked on in this round of
standardization have been decided in a voting process at the
beginning of the WG and are documented in the following
document: http://www.w3.org/TR/sparql-features/
Additionally, a list of work items and features postponed
to a future working group are being collected by the group in
a dedicated wiki page [2] which also contains the features
discussed in the beginning of the WG which have not been
considered for this round [3].

Yes, I will be more timely next time and will endeavour to
progress this
topic in the proper way. My apologies for the 'noise'.

Regards,
barry
Among this list, the feature "Composite Datasets" [1] might
partially capture what you have in mind and a future WG might
possibly work out the details of such feature.
We'd kindly ask you to confirm by a reply to this list that
this addresses your comment.
Axel Polleres, on behalf of the SPARQL WG

1. http://www.w3.org/2009/sparql/wiki/Feature:CompositeDatasets
2. http://www.w3.org/2009/sparql/wiki/Future_Work_Items
3. http://www.w3.org/2009/sparql/wiki/Category:Features

Re: Querying only the default graph from the data store

Reply via email to