Yes exactly
Rob From: Julien Plu <[email protected]> Reply-To: <[email protected]> Date: Monday, 2 October 2017 11:06 To: <[email protected]> Subject: Re: Querying TDB takes ages Thanks Rob for your quick reply! hummm I see, what you are saying indeed makes sense, so what you propose is to have a query like this? PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT DISTINCT ?p (GROUP_CONCAT(DISTINCT ?o;separator="-----") AS ?vals) ?id ?pr ?link WHERE { { SELECT DISTINCT ?link (STR(?o3) AS ?id) (STR(?o2) AS ?pr) WHERE { ?link dbo:wikiPageRank ?o2 . ?link dbo:wikiPageID ?o3 . FILTER NOT EXISTS{?link dbo:wikiPageRedirects ?x} . FILTER NOT EXISTS{?link dbo:wikiPageDisambiguates ?y} . } LIMIT 1 OFFSET %offset } { ?link ?p ?o . FILTER(DATATYPE(?o) = xsd:string || LANG(?o) = "en") . } UNION { VALUES ?p {dbo:wikiPageRedirects dbo:wikiPageDisambiguates} . ?x ?p ?link . ?x rdfs:label ?o . } UNION { VALUES ?p {rdf:type} . ?link ?p ?o . FILTER(CONTAINS(STR(?o), "http://dbpedia.org/ontology/")) . } } GROUP BY ?p ?id ?pr ?link Julien Plu PhD Student, EURECOM [email protected] | [email protected] http://jplu.github.io Campus SophiaTech 450 route des Chappes 06410 Biot, France Phone: +33 (0) 4 93008103 Le 2 oct. 2017 à 11:58, Rob Vesse <[email protected]> a écrit : Julien At a glance your query is very broad in that it effectively selects the entire dataset and applies string filters over the data e.g. the CONTAINS filter. This will force TDB to read pretty much the entire dataset on every single query.You may be better off moving the subquery with the limit on it to the start of your query as then TDB can probably use the single result to limit the amount of data it has to read to answer the rest of your query. Rob On 02/10/2017 10:30, "Julien Plu" <[email protected] on behalf of [email protected]> wrote: Hello, The code I'm using can be found here: https://gist.github.com/jplu/9d3aa4075145e31c2882f3372b1be3e3 My problem is that one iteration of my loop (line 88) takes a very long time (between 3 and 5 minutes), and I don't understand why. I think it is because I'm certainly missing something in the usage of TDB, but I don't see what. The dataset is DBpedia. Thanks in advance for any light. Regards. *Julien Plu* PhD Student, EURECOM [email protected] | [email protected] *http://jplu.github.io* <http://jplu.github.io/> Campus SophiaTech 450 route des Chappes 06410 Biot, France Phone: +33 (0) 4 93008103 <+33%20(0)4%2093008103>
