Re: Querying TDB takes ages

Rob Vesse Mon, 02 Oct 2017 03:18:08 -0700

Yes exactly

Rob

From: Julien Plu <[email protected]>
Reply-To: <[email protected]>
Date: Monday, 2 October 2017 11:06
To: <[email protected]>
Subject: Re: Querying TDB takes ages

Thanks Rob for your quick reply!

hummm I see, what you are saying indeed makes sense, so what you propose is to 
have a query like this? 

PREFIX dc: <http://purl.org/dc/elements/1.1/>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?p (GROUP_CONCAT(DISTINCT ?o;separator="-----") AS ?vals) ?id 
?pr ?link WHERE {

    {

        SELECT DISTINCT ?link (STR(?o3) AS ?id) (STR(?o2) AS ?pr) WHERE {

            ?link dbo:wikiPageRank ?o2 .

            ?link dbo:wikiPageID ?o3 .

            FILTER NOT EXISTS{?link dbo:wikiPageRedirects ?x} .

            FILTER NOT EXISTS{?link dbo:wikiPageDisambiguates ?y} .

        } LIMIT 1 OFFSET %offset

    }

    {

        ?link ?p ?o .

        FILTER(DATATYPE(?o) = xsd:string || LANG(?o) = "en") .

    } UNION {

        VALUES ?p {dbo:wikiPageRedirects dbo:wikiPageDisambiguates} .

        ?x ?p ?link .

        ?x rdfs:label ?o .

    } UNION {

        VALUES ?p {rdf:type} .

        ?link ?p ?o .

        FILTER(CONTAINS(STR(?o), "http://dbpedia.org/ontology/";)) .

    }

} GROUP BY ?p ?id ?pr ?link

Julien Plu 
PhD Student, EURECOM
[email protected] | [email protected]
http://jplu.github.io
Campus SophiaTech
450 route des Chappes
06410 Biot, France
Phone: +33 (0) 4 93008103

Le 2 oct. 2017 à 11:58, Rob Vesse <[email protected]> a écrit :

Julien

At a glance your query is very broad in that it effectively selects the entire 
dataset and applies string filters over the data e.g. the CONTAINS filter.

This will force TDB to read pretty much the entire dataset on every single 
query.You may be better off moving the subquery with the limit on it to the 
start of your query as then TDB can probably use the single result to limit the 
amount of data it has to read to answer the rest of your query.

Rob

On 02/10/2017 10:30, "Julien Plu" <[email protected] on behalf of 
[email protected]> wrote:

   Hello,

   The code I'm using can be found here:
   https://gist.github.com/jplu/9d3aa4075145e31c2882f3372b1be3e3

   My problem is that one iteration of my loop (line 88) takes a very long
   time (between 3 and 5 minutes), and I don't understand why.

   I think it is because I'm certainly missing something in the usage of TDB,
   but I don't see what.

   The dataset is DBpedia.

   Thanks in advance for any light.

   Regards.

   *Julien Plu*
   PhD Student, EURECOM
   [email protected] | [email protected]
   *http://jplu.github.io* <http://jplu.github.io/>
   Campus SophiaTech
   450 route des Chappes
   06410 Biot, France
   Phone: +33 (0) 4 93008103 <+33%20(0)4%2093008103>

Re: Querying TDB takes ages

Reply via email to