Re: Querying TDB takes ages

Julien Plu Mon, 02 Oct 2017 03:07:06 -0700

Thanks Rob for your quick reply!

hummm I see, what you are saying indeed makes sense, so what you propose is to 
have a query like this?


PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?p (GROUP_CONCAT(DISTINCT ?o;separator="-----") AS ?vals) ?id 
?pr ?link WHERE {
    {
        SELECT DISTINCT ?link (STR(?o3) AS ?id) (STR(?o2) AS ?pr) WHERE {
            ?link dbo:wikiPageRank ?o2 .
            ?link dbo:wikiPageID ?o3 .
            FILTER NOT EXISTS{?link dbo:wikiPageRedirects ?x} .
            FILTER NOT EXISTS{?link dbo:wikiPageDisambiguates ?y} .
        } LIMIT 1 OFFSET %offset
    }
    {
        ?link ?p ?o .
        FILTER(DATATYPE(?o) = xsd:string || LANG(?o) = "en") .
    } UNION {
        VALUES ?p {dbo:wikiPageRedirects dbo:wikiPageDisambiguates} .
        ?x ?p ?link .
        ?x rdfs:label ?o .
    } UNION {
        VALUES ?p {rdf:type} .
        ?link ?p ?o .
        FILTER(CONTAINS(STR(?o), "http://dbpedia.org/ontology/";)) .
    }
} GROUP BY ?p ?id ?pr ?link



Julien Plu
PhD Student, EURECOM
[email protected] <mailto:[email protected]> | [email protected] 
<mailto:[email protected]>
http://jplu.github.io <http://jplu.github.io/>
Campus SophiaTech
450 route des Chappes
06410 Biot, France
Phone: +33 (0) 4 93008103 <tel:+33%20(0)4%2093008103>








> Le 2 oct. 2017 à 11:58, Rob Vesse <[email protected]> a écrit :
> 
> Julien
> 
> At a glance your query is very broad in that it effectively selects the 
> entire dataset and applies string filters over the data e.g. the CONTAINS 
> filter.
> 
> This will force TDB to read pretty much the entire dataset on every single 
> query.You may be better off moving the subquery with the limit on it to the 
> start of your query as then TDB can probably use the single result to limit 
> the amount of data it has to read to answer the rest of your query.
> 
> Rob
> 
> On 02/10/2017 10:30, "Julien Plu" <[email protected] on behalf of 
> [email protected]> wrote:
> 
>    Hello,
> 
>    The code I'm using can be found here:
>    https://gist.github.com/jplu/9d3aa4075145e31c2882f3372b1be3e3
> 
>    My problem is that one iteration of my loop (line 88) takes a very long
>    time (between 3 and 5 minutes), and I don't understand why.
> 
>    I think it is because I'm certainly missing something in the usage of TDB,
>    but I don't see what.
> 
>    The dataset is DBpedia.
> 
>    Thanks in advance for any light.
> 
>    Regards.
> 
>    *Julien Plu*
>    PhD Student, EURECOM
>    [email protected] | [email protected]
>    *http://jplu.github.io* <http://jplu.github.io/>
>    Campus SophiaTech
>    450 route des Chappes
>    06410 Biot, France
>    Phone: +33 (0) 4 93008103 <+33%20(0)4%2093008103>
> 
> 
> 
> 
>

signature.asc
Description: Message signed with OpenPGP

Re: Querying TDB takes ages

Reply via email to