Hi Robert,
We are working on a new open source snapshot build we can provide to you which
may improve performance somewhat although no guarantees as the order by will
always result in a full table scan first to order it then limit the results.
What datasets do you have loaded in your store, is there a sample dataset we
could load locally to see this issue first hand.
For the query you were getting the syntax error with, try running the following:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT *
from <http://dbpedia.org>
WHERE
{{
SELECT DISTINCT?title ?u ?created
WHERE
{
?prog dc:title ?title;
dc:created ?created.
?u dbpprop:name ?name .
FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))
} LIMIT 1
}} ORDER BY DESC (?created)
Note the double curly brackets ie “{{ ...}}” , I also had to add ?created to
the result list otherwise you get another error indicating "SP031: SPARQL
compiler: Variable 'created' is used in the query result set but not assigned
"
Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink
On 13 Jun 2011, at 23:05, Robert Globisch wrote:
> Hi Hugh,
>
> thank you!
>
> There seems to be no noticeable decrease of execution time when i use this
> ORDER BY clause.
>
> In the meantime I did some further tests. It seems to me like the ORDER BY
> clause causes this massive performance slowdowns.
> When i use the following query without the ORDER BY clause the performance is
> quite acceptable even dc:created property is included.
>
> [notice: to improve performance i use dbpprop:name instead of rdfs:label now.
> Think it will avoid labels of concepts/ontologys etc. i do not need for my
> results]
>
>
> See:
>
> PREFIX dc: <http://purl.org/dc/elements/1.1/>
> select distinct ?title ?u
> from <http://dbpedia.org>
> WHERE
> {
> ?prog dc:title ?title;
> dc:created ?created.
>
> ?u dbpprop:name ?name .
> FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))
> }
> LIMIT 1
>
> > execution time on my 4gb quadcore system: about 60s with LIMIT 1 (2, 4) ;
> > and about 120s with LIMIT 8
>
> When i add the ORDER BY clause (within search pattern or beyond using your
> proposition) it's not usable anymore (execution time: about 10 minutes).
> My aim is to find the result(s) for my last created triple (newest one) thats
> because i added the dc:created property.
>
> As far as i read the ORDER BY clause it sorting the whole table before,
> right? Not only the limited results?
> Could this be bypassed someway?
>
> Maybe putting all my dc:title triples into a separate graph, sorting them by
> ?created and put them into a subquery using bif:strstr function?
> Phil M pointed out a similar solution on
> http://stackoverflow.com/questions/1154546/sorting-sparql-results-by-date?answertab=votes#tab-top
> storing the creation date in a separate graph.
>
> In this case i will refer to my question on semanticweb.com too:
>
> http://answers.semanticweb.com/questions/10014/limit-before-order-by-clause
>
> Unfortunately this subquery does not work at the second SELECT clause ->
> syntax error at 'SELECT' before 'distinct'
>
>
> PREFIX dc: <http://purl.org/dc/elements/1.1/>
> SELECT DISTINCT *
> from <http://dbpedia.org>
> WHERE
> {
> SELECT DISTINCT?title ?u
> WHERE
> {
> ?prog dc:title ?title;
> dc:created ?created.
>
> ?u dbpprop:name ?name .
> FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))
> } LIMIT 1
> } ORDER BY DESC (?created)
>
>
> Best regards,
>
> Robert
>
>
> On 13.06.2011 11:19, Hugh Williams wrote:
>>
>> Hi Robert,
>>
>> Development suggest the query:
>>
>> sparql
>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>> select distinct ?title ?u
>> from <http://dbpedia.org>
>> WHERE
>> {
>> ?prog dc:title ?title .
>> ?u rdfs:label ?label .
>> FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
>> }
>> ORDER BY DESC ((select ?created where { ?prog dc:created ?created. } ))
>> LIMIT 1
>>
>> should be the fastest. Note FROM <graph> and an implicit hint to the
>> optimizer that ?created can be calculated later and does not affect
>> filtering (i.e. the presence of ?created is not essential).
>>
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software
>> Web: http://www.openlinksw.com
>> Support: http://support.openlinksw.com
>> Forums: http://boards.openlinksw.com/support
>> Twitter: http://twitter.com/OpenLink
>>
>> On 12 Jun 2011, at 15:28, Robert Globisch wrote:
>>
>>> Hi Hugh,
>>>
>>> that's me, yes. Hello :)
>>>
>>> When i remove the dc:created property (bounded to my ?prog variable) it
>>> gets a lot faster on my Thinkpad (TP) and QuadCore system.
>>> I need the dc:created property to order the results based on their date of
>>> creation (time you tuned in to a channel) of my files loaded into the store.
>>> As you can see it improves execution time massively.
>>>
>>> I run the explain function for the following query using the virtuoso.db
>>> loaded with the whole en.dbpedia dataset.
>>> Hope that's what you wanted to have.
>>>
>>>
>>> ************************************************************************************
>>> ************************************************************************************
>>> PREFIX po: <http://purl.org/ontology/po/>
>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>>> PREFIX dbpprop: <http://dbpedia.org/property/>
>>> PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
>>> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>> PREFIX dcterms: <http://purl.org/dc/terms/>
>>>
>>> select distinct ?title ?u
>>> WHERE
>>> {
>>> ?prog dc:title ?title;
>>> dc:created ?created.
>>>
>>>
>>> ?u rdfs:label ?label.
>>>
>>> FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
>>>
>>>
>>> }
>>> ORDER BY DESC (?created)
>>> LIMIT 1
>>>
>>>
>>>
>>> ************************************************************************************
>>> ************************************************************************************
>>>
>>>
>>> Result:
>>>
>>> REPORT
>>> VARCHAR
>>> _______________________________________________________________________________
>>>
>>> {
>>> Subquery 21
>>> {
>>> Fork 50
>>> {
>>> END Node
>>>
>>> After test:
>>> 0: if ( 0 1(=) 1 ) then 10 else 3 unkn 10
>>> 3: if ( 0 1(=) 1 ) then 10 else 6 unkn 10
>>> 6: if ( 0 1(=) 1 ) then 10 else 9 unkn 10
>>> 9: BReturn 1
>>> 10: BReturn 0
>>> from DB.DBA.RDF_QUAD by RDF_QUAD_POGS 1.1e+002 rows
>>> Key RDF_QUAD_POGS ASC ($27 "s-13-1-t0.O", $26 "s-13-1-t0.S")
>>> inlined <col=554 P = #dc/elements/1.1/title >
>>> Local Test
>>> 0: if ( 0 1(=) 1 ) then 4 else 3 unkn 4
>>> 3: BReturn 1
>>> 4: BReturn 0
>>>
>>>
>>> Precode:
>>> 0: $30 "__ro2sq" := Call __ro2sq ($27 "s-13-1-t0.O")
>>> 5: BReturn 0
>>> from DB.DBA.RDF_QUAD by RDF_QUAD 0.23 rows
>>> Key RDF_QUAD ASC ($32 "s-13-1-t1.O")
>>> inlined <col=554 P = #dc/elements/1.1/created > , <col=553 S = $26
>>> "s-13-1-t0.S">
>>>
>>> from DB.DBA.RDF_QUAD by RDF_QUAD 9.6e+006 rows
>>> Key RDF_QUAD ASC ($37 "s-13-1-t2.O", $36 "s-13-1-t2.S")
>>> inlined <col=554 P = #label >
>>>
>>>
>>> After test:
>>> 0: $40 "__ro2sq" := Call __ro2sq ($37 "s-13-1-t2.O")
>>> 5: $41 "strstr" := Call strstr ($40 "__ro2sq", $30 "__ro2sq")
>>> 10: $42 "isnotnull" := Call isnotnull ($41 "strstr")
>>> 15: if ( 0 1(=) $42 "isnotnull") then 19 else 18 unkn 19
>>> 18: BReturn 1
>>> 19: BReturn 0
>>>
>>> After code:
>>> 0: $43 "__id2i" := Call __id2i ($36 "s-13-1-t2.S")
>>> 5: BReturn 0
>>> Distinct (HASH) ($27 "s-13-1-t0.O", $36 "s-13-1-t2.S")
>>>
>>> Precode:
>>> 0: $49 "__ro2sq" := Call __ro2sq ($32 "s-13-1-t1.O")
>>> 5: BReturn 0
>>> Sort (HASH) (TOP 1 ) ($49 "__ro2sq") -> ($30 "__ro2sq", $43 "__id2i")
>>>
>>> }
>>> top order by node
>>>
>>> After code:
>>> 0: $22 "title" := := artm $30 "__ro2sq"
>>> 4: $23 "u" := := artm $43 "__id2i"
>>> 8: BReturn 0
>>> Subquery Select($22 "title", $23 "u", <$39 "<DB.DBA.RDF_QUAD s-13-1-t2>"
>>> spec 5>, <$34 "<DB.DBA.RDF_QUAD s-13-1-t1>" spec 5>, <$29 "<DB.DBA.RDF_QUAD
>>> s-13-1-t0>" spec 5>)
>>> }
>>>
>>>
>>> After code:
>>> 0: $70 "title" := Call __ro2sq ($22 "title")
>>> 5: $71 "u" := Call __ro2sq ($23 "u")
>>> 10: BReturn 0
>>> Select ($70 "title", $71 "u")
>>> }
>>>
>>> 69 Rows. -- 328 msec.
>>>
>>> ************************************************************************************
>>> ************************************************************************************
>>>
>>>
>>> Best regards,
>>>
>>> Robert
>>>
>>>
>>>
>>> On 12.06.2011 15:39, Hugh Williams wrote:
>>>>
>>>> Hi Robert,
>>>>
>>>> I presume you are also "Robbet <[email protected]>” who posted similar
>>>> questions on the vos mailing list ?
>>>>
>>>> Can you use the Virtuoso explain function to generate a compiler query
>>>> execution plan so we can so how this is being constructed as detailed at:
>>>>
>>>> http://docs.openlinksw.com/virtuoso/fn_explain.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> It is also
>>>>
>>>> not clear to me what the figures you state in the
>>>> following
>>>>
>>>> mean:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>> As soon as i remove the dc:created property query gets about 10-100x
>>>>>> faster
>>>>>> (TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> What
>>>>
>>>> is TP and what are the timing difference with and
>>>> without the
>>>>
>>>> dc:created property ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards
>>>> Hugh Williams
>>>> Professional Services
>>>> OpenLink Software
>>>> Web: http://www.openlinksw.com
>>>> Support: http://support.openlinksw.com
>>>> Forums: http://boards.openlinksw.com/support
>>>> Twitter: http://twitter.com/OpenLink
>>>>
>>>> On 12 Jun 2011, at 12:43, Kingsley Idehen wrote:
>>>>
>>>>> On 6/12/11 1:22 AM, Robert Globisch wrote:
>>>>>>
>>>>>> Hello Kingsley,
>>>>>>
>>>>>> i will need your help once again. Actually i'm a bit frustrated :/
>>>>>>
>>>>>> During the last few hours i made some test examples to find out how my
>>>>>> query performs:
>>>>>>
>>>>>> First i created a new virtuoso.db with the labels_en.nt dbpedia dataset
>>>>>> only (virtuoso.db size about 2.6GB).
>>>>>> I added some of my own triples. Only a few with some dc: and po
>>>>>> properties. (see attachment - example file).
>>>>>>
>>>>>> Afterwards i ran the following query with free text searc index disabled
>>>>>> / enabled to get matching title strings within dbpedia:
>>>>>>
>>>>>> SELECT distinct ?title ?label
>>>>>>
>>>>>> WHERE
>>>>>> {
>>>>>>
>>>>>> ?prog dc:title ?title;
>>>>>> dc:created ?created.
>>>>>>
>>>>>> ?dbpedia rdfs:label ?label
>>>>>>
>>>>>> FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
>>>>>>
>>>>>> }
>>>>>> LIMIT 1
>>>>>>
>>>>>>
>>>>>> Execution time on an Intel QuadCore system with 4gb of ram (as already
>>>>>> discussed) was about 7 minutes (with free text enabled / disabled).
>>>>>> I performed same query on the whole de.dbpedia data set (separate
>>>>>> virtuoso.db - size about 8,5 GB) on a small Thinkpad (AMD Dual Core with
>>>>>> 4gb ram)
>>>>>> and it took about 3,5 minutes to execute. Some interesting fact i
>>>>>> noticed: As soon as i remove the dc:created property query gets about
>>>>>> 10-100x faster
>>>>>> (TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).
>>>>>>
>>>>>>
>>>>>> Is there anything left i could do to increase performance besides
>>>>>> hosting it on a more powerful system?
>>>
>>
>