On 28/01/15 12:28, Qiaser Mehmood wrote:
The query SELECT (count(distinct ?p) AS ?count ) { ?s ?p ?o } returns 21 in
both cases. listOfPropertiesInDataset is just name of actual query which I
execute and store in a model. mdl= qry.execConstruct
However, if I run the following code and get triples count for that query:
qry=QueryExecutionFactory.sparqlService(endpoint, query);int count= 0;Iterator
<Triple> triples = qry.execConstructTriples();while(triples.hasNext()){
triples.next();
count++;
}System.out.print("Triples count value is" + count);
The count value is different for Fuseki (42) and (740444) for Sesame. Although
the data is same in both stores. What could be a reason for this difference?.
execConstructTriples is an important detail.
Try execConstruct and model.size(). Should be 42 in each case.
execConstructTriples is passing back the low level stream of triples
received. It may include duplicates if the server sends duplicates.
There are 21 unique predicates in your data.
CONSTRUCT
{ <datasetUri> void:propertyPartition ?pUri .
?pUri void:property ?p . }
is 42 triples in a set of triples. 2 for each ?p : ?pUri is calculated
from ?p. Note *set*. Sets do not show duplicates.
The default thing you are querying in Jena sends back a stream of
triples from a set - duplicates have been suppressed.
Sesame does not: it send back the results of each template instantiation
for every match of the pattern. No duplicates have been suppression.
Look at the start of the Sesame triple stream of results. I would
expect to see repeated triples.
There are 740444/2 = 370222 triples in your data.
370222 matches of ?s ?p ?o
so 370222 matches of "WHERE {}"
but your construct template does not depend on ?s or ?o. The same ?p and
?pUri from different matching of ?s ?p ?o happen over and over again.
Hence Sesame returns 740444 triples with many duplicates.
Compare
SELECT ?p ?pUri WHERE {}
and
SELECT DISTINCT ?p ?pUri WHERE {}
Projecting out just ?p gives 370222 of 21 distinct values.
This data creates an RDF graph of one triple :
@prefix : <http://example/> .
:s :p "abc" .
:s :p "abc" .
:s :p "abc" .
Andy
PS I think Sesame may have changed this behaviour in recent versions, at
least I recall some discussion.
On Wednesday, January 28, 2015 11:13 AM, Andy Seaborne <[email protected]>
wrote:
On 28/01/15 10:49, Qiaser Mehmood wrote:
Thanks Andy, I forgot to mention that I am using Jena to query both the Fuseki
and Sesame, moreover I dumped the same data in both store.
So you mean that result difference over same data is due to the particular
engine which return either duplicate (i.e. sesame) and set with no duplicate
(i.e. Fuseki).
Thanks,Qaiser.
So what does
SELECT (count(distinct ?p) AS ?count ) { ?s ?p ?o }
return in each case?
And how are you counting results (listOfPropertiesInDataset is not Jena
code).
Andy
On Tuesday, January 27, 2015 8:50 PM, Andy Seaborne <[email protected]>
wrote:
On 27/01/15 17:32, Qiaser Mehmood wrote:
What could be the reason of results (listOfPropertiesInDataset) difference for
the same query which runs on two different engine e.g. fuseki and sesame. I
dumped the Kegg data into fuseki and sesame and when I run the following query
the results vary.
PREFIX void: <http://rdfs.org/ns/void#> CONSTRUCT { <datasetUri>
void:propertyPartition ?pUri . ?pUri void:property ?p . } WHERE { ?s ?p ?o .
BIND(IRI(CONCAT(STR(<baseUri>),MD5(STR(?p)))) AS ?pUri)}
In fuseki it returns 42 and in sesame it returns back 740444
Best,Qaiser.
I guess there are 42 different predicates in the data.
SELECT (count(distinct ?p) AS ?count ) { ?s ?p /o }
Jena returns a model, a set of triples. Set means no duplicates.
It looks liek you are using the form of execution in Sesame that returns
an iterator of stream of triples. No suppression of duplicates.
In your query:
PREFIX void: <http://rdfs.org/ns/void#>
CONSTRUCT
{ <http://example/base/datasetUri> void:propertyPartition ?pUri .
?pUri void:property ?p .}
WHERE
{ ?s ?p ?o
BIND(iri(concat(str(<http://example/base/baseUri>), MD5(str(?p))))
AS ?pUri)
}
Your query has massive duplicates - it projects out ?s and ?o..
Many ?s ?p ?o, few distinct ?p
Try this:
WHERE
{ SELECT DISTINCT ?p ?pUri {
?s ?p ?o
BIND(iri(concat(str(<http://example/base/baseUri>), MD5(str(?p))))
AS ?pUri)
}
}
Andy