Hi Hugh,

thank you!

There seems to be no noticeable decrease of execution time when i use this ORDER BY clause.

In the meantime I did some further tests. It seems to me like the ORDER BY clause causes this massive performance slowdowns. When i use the following query without the ORDER BY clause the performance is quite acceptable even /dc:created/ property is included.

[notice: to improve performance i use /dbpprop:name/ instead of /rdfs:label/ now. Think it will avoid labels of concepts/ontologys etc. i do not need for my results]


See:
/
/

   /PREFIX dc: <http://purl.org/dc/elements/1.1/> /
   /select distinct ?title ?u/
   /from <http://dbpedia.org <http://dbpedia.org/>>/
   /WHERE/
   /{/
   /?prog dc:title ?title;/
   /dc:created ?created./

   /?u dbpprop:name ?name ./
   /FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))/
   /}/
   /LIMIT     1/


> execution time on my 4gb quadcore system: about 60s with LIMIT 1 (2, 4) ; and about 120s with LIMIT 8

When i add the ORDER BY clause (within search pattern or beyond using your proposition) it's not usable anymore (execution time: about 10 minutes). My aim is to find the result(s) for my last created triple (newest one) thats because i added the /dc:created/ property.

As far as i read the ORDER BY clause it sorting the whole table before, right? Not only the limited results?
Could this be bypassed someway?

Maybe putting all my /dc:title/ triples into a separate graph, sorting them by ?created and put them into a subquery using bif:strstr function? Phil M pointed out a similar solution on http://stackoverflow.com/questions/1154546/sorting-sparql-results-by-date?answertab=votes#tab-top storing the creation date in a separate graph.

In this case i will refer to my question on semanticweb.com too:

http://answers.semanticweb.com/questions/10014/limit-before-order-by-clause

Unfortunately this subquery does not work at the second SELECT clause -> syntax error at 'SELECT' before 'distinct'


/PREFIX dc: <http://purl.org/dc/elements/1.1/> /
/        SELECT DISTINCT */
/        from <http://dbpedia.org <http://dbpedia.org/>>/
/        WHERE/
/        {
            SELECT DISTINCT//?title ?u //
            WHERE
            {
/ /              ?prog dc:title ?title;/
/              dc:created ?created./

/            ?u dbpprop:name ?name ./
/            FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))
           } LIMIT 1
/ /        } ORDER BY DESC (?created)/


Best regards,

Robert


On 13.06.2011 11:19, Hugh Williams wrote:
Hi Robert,

Development suggest the query:

sparql
PREFIX dc: <http://purl.org/dc/elements/1.1/>
select distinct ?title ?u
from <http://dbpedia.org <http://dbpedia.org/>>
WHERE
{
?prog dc:title ?title .
?u rdfs:label ?label .
FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
}
ORDER BY DESC ((select ?created where { ?prog dc:created ?created. } ))
LIMIT     1

should be the fastest. Note FROM <graph> and an implicit hint to the
optimizer that ?created can be calculated later and does not affect
filtering (i.e. the presence of ?created is not essential).

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 12 Jun 2011, at 15:28, Robert Globisch wrote:

Hi Hugh,

that's me, yes. Hello :)

When i remove the /dc:created/ property (bounded to my ?prog variable) it gets a lot faster on my Thinkpad (TP) and QuadCore system. I need the /dc:created/ property to order the results based on their date of creation (time you tuned in to a channel) of my files loaded into the store.
As you can see it improves execution time massively.

I run the explain function for the following query using the virtuoso.db loaded with the whole en.dbpedia dataset.
Hope that's what you wanted to have.


************************************************************************************
************************************************************************************

    /PREFIX po: <http://purl.org/ontology/po/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dbpprop: <http://dbpedia.org/property/>
    PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
    PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dcterms: <http://purl.org/dc/terms/>

    select distinct ?title ?u
    WHERE
    {
    ?prog dc:title ?title;
    dc:created ?created.


    ?u rdfs:label ?label.

    FILTER (bif:isnotnull (bif:strstr (?label, ?title)))


    }
    ORDER BY DESC (?created)
    LIMIT     1/




************************************************************************************
************************************************************************************


Result:

REPORT
VARCHAR
_______________________________________________________________________________

{
Subquery 21
{
Fork 50
{
END Node

After test:
      0: if ( 0  1(=)  1 ) then 10 else 3 unkn 10
      3: if ( 0  1(=)  1 ) then 10 else 6 unkn 10
      6: if ( 0  1(=)  1 ) then 10 else 9 unkn 10
      9: BReturn 1
      10: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD_POGS   1.1e+002 rows
Key RDF_QUAD_POGS  ASC ($27 "s-13-1-t0.O", $26 "s-13-1-t0.S")
 inlined <col=554 P =  #dc/elements/1.1/title >
 Local Test
      0: if ( 0  1(=)  1 ) then 4 else 3 unkn 4
      3: BReturn 1
      4: BReturn 0


Precode:
      0: $30 "__ro2sq" := Call __ro2sq ($27 "s-13-1-t0.O")
      5: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD       0.23 rows
Key RDF_QUAD  ASC ($32 "s-13-1-t1.O")
inlined <col=554 P = #dc/elements/1.1/created > , <col=553 S = $26 "s-13-1-t0.S">

from DB.DBA.RDF_QUAD by RDF_QUAD   9.6e+006 rows
Key RDF_QUAD  ASC ($37 "s-13-1-t2.O", $36 "s-13-1-t2.S")
 inlined <col=554 P =  #label >


After test:
      0: $40 "__ro2sq" := Call __ro2sq ($37 "s-13-1-t2.O")
      5: $41 "strstr" := Call strstr ($40 "__ro2sq", $30 "__ro2sq")
      10: $42 "isnotnull" := Call isnotnull ($41 "strstr")
      15: if ( 0  1(=) $42 "isnotnull") then 19 else 18 unkn 19
      18: BReturn 1
      19: BReturn 0

After code:
      0: $43 "__id2i" := Call __id2i ($36 "s-13-1-t2.S")
      5: BReturn 0
Distinct (HASH) ($27 "s-13-1-t0.O", $36 "s-13-1-t2.S")

Precode:
      0: $49 "__ro2sq" := Call __ro2sq ($32 "s-13-1-t1.O")
      5: BReturn 0
Sort (HASH) (TOP  1  ) ($49 "__ro2sq") -> ($30 "__ro2sq", $43 "__id2i")

}
top order by node

After code:
      0: $22 "title" :=  := artm $30 "__ro2sq"
      4: $23 "u" :=  := artm $43 "__id2i"
      8: BReturn 0
Subquery Select($22 "title", $23 "u", <$39 "<DB.DBA.RDF_QUAD s-13-1-t2>" spec 5>, <$34 "<DB.DBA.RDF_QUAD s-13-1-t1>" spec 5>, <$29 "<DB.DBA.RDF_QUAD s-13-1-t0>" spec 5>)
}


After code:
      0: $70 "title" := Call __ro2sq ($22 "title")
      5: $71 "u" := Call __ro2sq ($23 "u")
      10: BReturn 0
Select ($70 "title", $71 "u")
}

69 Rows. -- 328 msec.

************************************************************************************
************************************************************************************


Best regards,

Robert



On 12.06.2011 15:39, Hugh Williams wrote:
Hi Robert,

I presume you are also "Robbet <[email protected] <mailto:[email protected]>>” who posted similar questions on the vos mailing list ?

Can you use the Virtuoso explain function to generate a compiler query execution plan so we can so how this is being constructed as detailed at:

http://docs.openlinksw.com/virtuoso/fn_explain.html


It is also
not clear to me what the figures you state in the following
mean:



As soon as i remove the dc:created property query gets about 10-100x faster
(TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).



What
is TP and what are the timing difference with and without the
dc:created property ?


Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com <http://www.openlinksw.com/>
Support: http://support.openlinksw.com <http://support.openlinksw.com/>
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 12 Jun 2011, at 12:43, Kingsley Idehen wrote:

On 6/12/11 1:22 AM, Robert Globisch wrote:
Hello Kingsley,

i will need your help once again. Actually i'm a bit frustrated :/

During the last few hours i made some test examples to find out how my query performs:

First i created a new virtuoso.db with the /labels_en.nt/ dbpedia dataset only (virtuoso.db size about 2.6GB). I added some of my own triples. Only a few with some dc: and po properties. (see attachment - example file).

Afterwards i ran the following query with free text searc index disabled / enabled to get matching title strings within dbpedia:

    /SELECT distinct ?title ?label

    WHERE
    {

    ?prog dc:title ?title;
    dc:created ?created.

    ?dbpedia rdfs:label ?label

    FILTER (bif:isnotnull (bif:strstr (?label, ?title)))

    }
    LIMIT 1


    /

Execution time on an Intel QuadCore system with 4gb of ram (as already discussed) was about 7 minutes (with free text enabled / disabled). I performed same query on the whole de.dbpedia data set (separate virtuoso.db - size about 8,5 GB) on a small Thinkpad (AMD Dual Core with 4gb ram) and it took about 3,5 minutes to execute. Some interesting fact i noticed: As soon as i remove the dc:created property query gets about 10-100x faster
(TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).


Is there anything left i could do to increase performance besides hosting it on a more powerful system?



Reply via email to