Re: [Virtuoso-users] bif:contains - using a string variable as search term

Robert Globisch Mon, 13 Jun 2011 22:24:40 +0000

Hi Hugh,

thank you!

There seems to be no noticeable decrease of execution time when i usethis ORDER BY clause.

In the meantime I did some further tests. It seems to me like the ORDERBY clause causes this massive performance slowdowns.When i use the following query without the ORDER BY clause theperformance is quite acceptable even /dc:created/ property is included.

[notice: to improve performance i use /dbpprop:name/ instead of/rdfs:label/ now. Think it will avoid labels of concepts/ontologys etc.i do not need for my results]



See:
/
/

   /PREFIX dc: <http://purl.org/dc/elements/1.1/> /
   /select distinct ?title ?u/
   /from <http://dbpedia.org <http://dbpedia.org/>>/
   /WHERE/
   /{/
   /?prog dc:title ?title;/
   /dc:created ?created./

   /?u dbpprop:name ?name ./
   /FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))/
   /}/
   /LIMIT     1/

> execution time on my 4gb quadcore system: about 60s with LIMIT 1 (2,4) ; and about 120s with LIMIT 8

When i add the ORDER BY clause (within search pattern or beyond usingyour proposition) it's not usable anymore (execution time: about 10minutes).My aim is to find the result(s) for my last created triple (newest one)thats because i added the /dc:created/ property.

As far as i read the ORDER BY clause it sorting the whole table before,right? Not only the limited results?

Could this be bypassed someway?

Maybe putting all my /dc:title/ triples into a separate graph, sortingthem by ?created and put them into a subquery using bif:strstr function?Phil M pointed out a similar solution onhttp://stackoverflow.com/questions/1154546/sorting-sparql-results-by-date?answertab=votes#tab-topstoring the creation date in a separate graph.


In this case i will refer to my question on semanticweb.com too:

http://answers.semanticweb.com/questions/10014/limit-before-order-by-clause

Unfortunately this subquery does not work at the second SELECT clause ->syntax error at 'SELECT' before 'distinct'



/PREFIX dc: <http://purl.org/dc/elements/1.1/> /
/        SELECT DISTINCT */
/        from <http://dbpedia.org <http://dbpedia.org/>>/
/        WHERE/
/        {
            SELECT DISTINCT//?title ?u //
            WHERE
            {
/ /              ?prog dc:title ?title;/
/              dc:created ?created./

/            ?u dbpprop:name ?name ./
/            FILTER (bif:isnotnull (bif:strstr ( str(?name), ?title)))
           } LIMIT 1
/ /        } ORDER BY DESC (?created)/


Best regards,

Robert


On 13.06.2011 11:19, Hugh Williams wrote:

Hi Robert,

Development suggest the query:

sparql
PREFIX dc: <http://purl.org/dc/elements/1.1/>
select distinct ?title ?u
from <http://dbpedia.org <http://dbpedia.org/>>
WHERE
{
?prog dc:title ?title .
?u rdfs:label ?label .
FILTER (bif:isnotnull (bif:strstr (?label, ?title)))
}
ORDER BY DESC ((select ?created where { ?prog dc:created ?created. } ))
LIMIT     1

should be the fastest. Note FROM <graph> and an implicit hint to the
optimizer that ?created can be calculated later and does not affect
filtering (i.e. the presence of ?created is not essential).

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 12 Jun 2011, at 15:28, Robert Globisch wrote:

Hi Hugh,

that's me, yes. Hello :)
When i remove the /dc:created/ property (bounded to my ?progvariable) it gets a lot faster on my Thinkpad (TP) and QuadCore system.I need the /dc:created/ property to order the results based on theirdate of creation (time you tuned in to a channel) of my files loadedinto the store.
As you can see it improves execution time massively.
I run the explain function for the following query using thevirtuoso.db loaded with the whole en.dbpedia dataset.
Hope that's what you wanted to have.


************************************************************************************
************************************************************************************

    /PREFIX po: <http://purl.org/ontology/po/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dbpprop: <http://dbpedia.org/property/>
    PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
    PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dcterms: <http://purl.org/dc/terms/>

    select distinct ?title ?u
    WHERE
    {
    ?prog dc:title ?title;
    dc:created ?created.


    ?u rdfs:label ?label.

    FILTER (bif:isnotnull (bif:strstr (?label, ?title)))


    }
    ORDER BY DESC (?created)
    LIMIT     1/




************************************************************************************
************************************************************************************


Result:

REPORT
VARCHAR
_______________________________________________________________________________

{
Subquery 21
{
Fork 50
{
END Node

After test:
      0: if ( 0  1(=)  1 ) then 10 else 3 unkn 10
      3: if ( 0  1(=)  1 ) then 10 else 6 unkn 10
      6: if ( 0  1(=)  1 ) then 10 else 9 unkn 10
      9: BReturn 1
      10: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD_POGS   1.1e+002 rows
Key RDF_QUAD_POGS  ASC ($27 "s-13-1-t0.O", $26 "s-13-1-t0.S")
 inlined <col=554 P =  #dc/elements/1.1/title >
 Local Test
      0: if ( 0  1(=)  1 ) then 4 else 3 unkn 4
      3: BReturn 1
      4: BReturn 0


Precode:
      0: $30 "__ro2sq" := Call __ro2sq ($27 "s-13-1-t0.O")
      5: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD       0.23 rows
Key RDF_QUAD  ASC ($32 "s-13-1-t1.O")
inlined <col=554 P = #dc/elements/1.1/created > , <col=553 S = $26"s-13-1-t0.S">
from DB.DBA.RDF_QUAD by RDF_QUAD   9.6e+006 rows
Key RDF_QUAD  ASC ($37 "s-13-1-t2.O", $36 "s-13-1-t2.S")
 inlined <col=554 P =  #label >


After test:
      0: $40 "__ro2sq" := Call __ro2sq ($37 "s-13-1-t2.O")
      5: $41 "strstr" := Call strstr ($40 "__ro2sq", $30 "__ro2sq")
      10: $42 "isnotnull" := Call isnotnull ($41 "strstr")
      15: if ( 0  1(=) $42 "isnotnull") then 19 else 18 unkn 19
      18: BReturn 1
      19: BReturn 0

After code:
      0: $43 "__id2i" := Call __id2i ($36 "s-13-1-t2.S")
      5: BReturn 0
Distinct (HASH) ($27 "s-13-1-t0.O", $36 "s-13-1-t2.S")

Precode:
      0: $49 "__ro2sq" := Call __ro2sq ($32 "s-13-1-t1.O")
      5: BReturn 0
Sort (HASH) (TOP  1  ) ($49 "__ro2sq") -> ($30 "__ro2sq", $43 "__id2i")

}
top order by node

After code:
      0: $22 "title" :=  := artm $30 "__ro2sq"
      4: $23 "u" :=  := artm $43 "__id2i"
      8: BReturn 0
Subquery Select($22 "title", $23 "u", <$39 "<DB.DBA.RDF_QUADs-13-1-t2>" spec 5>, <$34 "<DB.DBA.RDF_QUAD s-13-1-t1>" spec 5>, <$29"<DB.DBA.RDF_QUAD s-13-1-t0>" spec 5>)
}


After code:
      0: $70 "title" := Call __ro2sq ($22 "title")
      5: $71 "u" := Call __ro2sq ($23 "u")
      10: BReturn 0
Select ($70 "title", $71 "u")
}

69 Rows. -- 328 msec.

************************************************************************************
************************************************************************************


Best regards,

Robert



On 12.06.2011 15:39, Hugh Williams wrote:
Hi Robert,
I presume you are also "Robbet <[email protected]<mailto:[email protected]>>” who posted similar questions on the vosmailing list ?
Can you use the Virtuoso explain function to generate a compilerquery execution plan so we can so how this is being constructed asdetailed at:
http://docs.openlinksw.com/virtuoso/fn_explain.html


It is also
not clear to me what the figures you state in the following
mean:
As soon as i remove the dc:created property query gets about10-100x faster
(TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).
What
is TP and what are the timing difference with and without the
dc:created property ?


Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com <http://www.openlinksw.com/>
Support: http://support.openlinksw.com <http://support.openlinksw.com/>
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 12 Jun 2011, at 12:43, Kingsley Idehen wrote:
On 6/12/11 1:22 AM, Robert Globisch wrote:
Hello Kingsley,

i will need your help once again. Actually i'm a bit frustrated :/
During the last few hours i made some test examples to find outhow my query performs:
First i created a new virtuoso.db with the /labels_en.nt/ dbpediadataset only (virtuoso.db size about 2.6GB).I added some of my own triples. Only a few with some dc: and poproperties. (see attachment - example file).
Afterwards i ran the following query with free text searc indexdisabled / enabled to get matching title strings within dbpedia:
    /SELECT distinct ?title ?label

    WHERE
    {

    ?prog dc:title ?title;
    dc:created ?created.

    ?dbpedia rdfs:label ?label

    FILTER (bif:isnotnull (bif:strstr (?label, ?title)))

    }
    LIMIT 1


    /
Execution time on an Intel QuadCore system with 4gb of ram (asalready discussed) was about 7 minutes (with free text enabled /disabled).I performed same query on the whole de.dbpedia data set (separatevirtuoso.db - size about 8,5 GB) on a small Thinkpad (AMD DualCore with 4gb ram)and it took about 3,5 minutes to execute. Some interesting fact inoticed: As soon as i remove the dc:created property query getsabout 10-100x faster
(TP: from 3,5mins > 30s / Quad core: 7 mins > 5,5mins).
Is there anything left i could do to increase performance besideshosting it on a more powerful system?

Re: [Virtuoso-users] bif:contains - using a string variable as search term

Reply via email to