Re: [Virtuoso-users] speeding up a query

Bart Vandewoestyne Fri, 04 Apr 2014 04:21:08 -0700

On 2014-04-03 15:00, Kingsley Idehen wrote:
> On 4/3/14 7:47 AM, Bart Vandewoestyne wrote:
>> Hello list,
>>
>> INITIAL REMARK: if this is not the appropriate mailing list for this
>> question, please let me know the best place to ask questions regarding
>> SPARQL queries and their optimization.
>>
>> I'm a beginner when it comes to writing SPARQL queries.  I am trying to
>> speed up a certain query that I got from someone, which has the
>> following form:
>>
>> SELECT ?val (COUNT(DISTINCT ?id) as ?vc)
>> WHERE
>> {
>>     ?id<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?val.
>>     ?id ?property ?property_value.
>>     ?property_value bif:contains "'foo'".
>>     ?id ?property1 ?property_value1.
>>     ?property_value1 bif:contains "'bar'".
>> }
>> GROUP BY ?val
>> ORDER BY DESC(?vc)
>>
>>
>> First of all, I noticed that I can write it more elegantly as follows:
>>
>> SELECT ?val (COUNT(DISTINCT ?id) as ?vc)
>> WHERE
>> {
>>     ?id<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?val ;
>>         ?property1 ?property_value1 ;
>>         ?property2 ?property_value2 .
>>
>>     ?property_value1 bif:contains "'foo'" .
>>
>>     ?property_value2 bif:contains "'bar'" .
>> }
>> GROUP BY ?val
>> ORDER BY DESC(?vc)
>>
>> Secondly, looking at
>> http://docs.openlinksw.com/virtuoso/queryingftcols.html  my educated
>> guess was that I could replace this query with
>>
>> SELECT ?val (COUNT(DISTINCT ?id) as ?vc)
>> WHERE
>> {
>>     ?id<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?val ;
>>         ?property ?property_value .
>>         ?property_value bif:contains 'foo or bar' .
>> }
>> GROUP BY ?val
>> ORDER BY DESC(?vc)
>>
>> and my hope was that this version would run a little faster (don't ask
>> me why... just a wild guess that I would try)
>>
>> Unfortunately, this last version seems to give different query results.
>>
>> My two questions:
>>
>> 1) Why is this last query giving different results?  Am I
>> misinterpreting something?
>>
>> 2) Is there a way I can rewrite the original query so that it runs
>> faster?
>>
>> Thanks!
>> Bart
>
> One little issue here is that you don't indicate the role of named
> graphs i.e., do you want this query to be scoped to every named graph or
> to specific named graphs? As you can imagine, this has impact on the
> perceived performance.
>
> You could use the public LOD Cloud instance to demonstrate your quest,
> and share a SPARQL query results URL for accelerated analysis (on our
> side).
>
> [1] http://lod.openlinksw.com/sparql -- LOD Cloud Cache Instance (50
> Billion+ Triples)


Hello Kingsley,

I'm afraid I'm still too new to SPARQL to completely understand your 
answer...  I'm not really familiar with named graphs yet, but after some 
googling around, i think I can conclude that I'm not using any named 
graphs for two reasons:

1) My SPARQL queries are as written above, not using a FROM NAMED or 
GRAPH keyword.

2) I seem to have no named graphs in my triple store, as demonstrated by 
the following query:


SQL> sparql select distinct ?g where { graph ?g {?s ?p ?o} };
g
LONG VARCHAR
_______________________________________________________________________________


0 Rows. -- 1 msec.


So my two questions remain:

1) Why is the rewritten query

SELECT ?val (COUNT(DISTINCT ?id) as ?vc)
WHERE
{
     ?id<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?val ;
         ?property ?property_value .
         ?property_value bif:contains 'foo or bar' .
}
GROUP BY ?val
ORDER BY DESC(?vc)

giving me different results than the original one:

SELECT ?val (COUNT(DISTINCT ?id) as ?vc)
WHERE
{
     ?id<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?val.
     ?id ?property ?property_value.
     ?property_value bif:contains "'foo'".
     ?id ?property1 ?property_value1.
     ?property_value1 bif:contains "'bar'".
}
GROUP BY ?val
ORDER BY DESC(?vc)


2) Is there a way to rewrite the original query so that it runs faster? 
  Or can I apply other tricks?  For as far as I understand from the 
Virtuoso docs, I cannot use any indexing tricks to speed this query up 
because the default indexing scheme on RDF_QUAD is already active and 
should suffice?


If you need further information, feel free to ask.

Kind regards,
Bart

------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] speeding up a query

Reply via email to