> -----Original Message-----
> From: Peter Ansell [mailto:ansell.pe...@gmail.com]
> Sent: 22 November 2008 21:54
> To: Kingsley Idehen
> Cc: dbpedia-discuss...@lists.sourceforge.net; virtuoso-
> us...@lists.sourceforge.net
> Subject: Re: [Dbpedia-discussion] DBPedia 3.2 Load in Virtuoso 5.0.9 -
> Reporting on results, and some questions

> > Duplicates!
> > Can someone please explain this?
> >
> > As a side, when I run this from isql on my newly locally installed dbpedia
> I get no duplicates (I havent tried Jena with my local).
> >
> >
> > <eom>
> >
> >

Kingsley wrote:
> Marvin,
>
> You will see why when you run:
>
> select *
> where {graph ?g {
> ?s
>  <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }}
>
> As you can see their are two graphs:
> 1. http://dbpedia.org
> 2. http://dbpedia.org/resource/<entity> (this one results from cache
> activity associated with client interactions with Virtuoso)
>
> Solutions:
> -- Being specific about source Graph by specifying Graph IRI
> select ?s
> where {graph <http://dbpedia.org> {
> ?s
>  <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }}
> OR
>
> select ?s
> from <http://dbpedia.org>
> where {
> ?s
>  <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }
> -- Using DISTINCT
>
> select distinct ?s
> where {
> ?s
>  <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }
>

Peter wrote:
> What is the instruction to give with Jena/Other clients etc. to make it
> behave in the same way as the HTTP SPARQL page interface and not resolve
> triples from the cache graphs.

For Jena, when a call of:

qexec = QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";, q);

is made, the query is passed as-is to the SPARQL endpoint.  The result set 
comes back as SPARQL results Format and is parsed to produce the local 
programming objects.  There no additional process client-side.  Duplicates 
should not come back from that pattern but the client-side code does not check 
that the endpoint is functioning correctly.

In SPARQL, matching a basic graph pattern or a triple pattern and one variable 
does not give duplicates because an RDF graph is a set of triples.  (It is only 
possible if the pattern includes a blank node - think of that as a variable 
that is projected away and like an projection, can result in duplicates across 
the narrower intermediate result).

If a union of other graphs are underlying the virtual graph then the compound 
graph should still appear to be a set of statements which will not produce 
duplicates.  By just passing over the query as-is, there's an assumption the 
endpoint will respect those semantics

It would requite changing the query to suppress duplicates, e.g. using DISTINCT.

In Jena this happens in quite a few places: we have union graphs, and the 
inference engines would produce duplicates if they didn't suppress them.  The 
storage layers  SDB and TDB [*] both support query over the union of named 
graphs in an RDF datasets and both suppress duplicates that occur to give the 
set-of-triples view.)

        Andy

[*] In the SVN only.  It didn't make the last release.

>
> Cheers,
>
> Peter

Reply via email to