Re: results vary for the same query on same dataset for different engine

Qiaser Mehmood Wed, 28 Jan 2015 04:31:41 -0800

The query SELECT (count(distinct ?p) AS ?count ) { ?s ?p ?o } returns 21 in 
both cases.  listOfPropertiesInDataset is just name of actual query  which I 
execute and store in a model. mdl= qry.execConstruct
 However, if I run the following code and get triples count for that query:
qry=QueryExecutionFactory.sparqlService(endpoint, query);int count= 0;Iterator 
<Triple> triples = qry.execConstructTriples();while(triples.hasNext()){
    triples.next();
    count++;
}System.out.print("Triples count value is" + count);
The count value is different for Fuseki (42) and (740444) for Sesame. Although 
the data is same in both stores. What could be a reason for this  difference?.


 

     On Wednesday, January 28, 2015 11:13 AM, Andy Seaborne <[email protected]> 
wrote:
   

 On 28/01/15 10:49, Qiaser Mehmood wrote:
> Thanks Andy, I forgot to mention that I am using Jena to query both the 
> Fuseki and Sesame, moreover I dumped the same data in both store.
> So you mean that result difference over same data is due to the particular 
> engine which return either duplicate (i.e. sesame) and set with no duplicate 
> (i.e. Fuseki).
> Thanks,Qaiser.

So what does

SELECT (count(distinct ?p) AS ?count ) { ?s ?p ?o }

return in each case?


And how are you counting results (listOfPropertiesInDataset is not Jena 
code).

    Andy

>
>      On Tuesday, January 27, 2015 8:50 PM, Andy Seaborne <[email protected]> 
>wrote:
>
>
>  On 27/01/15 17:32, Qiaser Mehmood wrote:
>> What could be the reason of results (listOfPropertiesInDataset) difference 
>> for the same query which runs on two different engine e.g. fuseki and 
>> sesame. I dumped the Kegg data into fuseki and sesame and when I run the 
>> following query the results vary.
>> PREFIX void: <http://rdfs.org/ns/void#> CONSTRUCT { <datasetUri> 
>> void:propertyPartition ?pUri . ?pUri void:property ?p . } WHERE { ?s ?p ?o . 
>> BIND(IRI(CONCAT(STR(<baseUri>),MD5(STR(?p)))) AS ?pUri)}
>>
>> In fuseki it returns 42 and in sesame it returns back 740444
>> Best,Qaiser.
>>
>
> I guess there are 42 different predicates in the data.
>
> SELECT (count(distinct ?p) AS ?count ) { ?s ?p /o }
>
> Jena returns a model, a set of triples.  Set means no duplicates.
>
> It looks liek you are using the form of execution in Sesame that returns
> an iterator of stream of triples.  No suppression of duplicates.
>
> In your query:
>
> PREFIX  void: <http://rdfs.org/ns/void#>
>
> CONSTRUCT
>    { <http://example/base/datasetUri> void:propertyPartition ?pUri .
>      ?pUri void:property ?p .}
> WHERE
>    { ?s ?p ?o
>      BIND(iri(concat(str(<http://example/base/baseUri>), MD5(str(?p))))
> AS ?pUri)
>    }
>
> Your query has massive duplicates - it projects out ?s and ?o..
>
> Many ?s ?p ?o, few distinct ?p
>
> Try this:
>
> WHERE
>    { SELECT DISTINCT ?p ?pUri {
>      ?s ?p ?o
>      BIND(iri(concat(str(<http://example/base/baseUri>), MD5(str(?p))))
> AS ?pUri)
>      }
>    }
>
>
>      Andy
>
>
>
>
>

Re: results vary for the same query on same dataset for different engine

Reply via email to