RE: CMS diff: TDB Datasets
Hi Andy, Thanks for the response. Your suggestion worked and the query completed in a similar time to the union graph approach. I'd tried moving the filter into the graph clause but not swapping the graph order. I added that update on the documentation so if anyone else was having similar problems it might help. Do you still want me to create a JIRA for it? More generally, is there a page/section for tips on query writing to help optimisation? I searched but could only find description of TDB's optimisation functionality and extending query execution. I spent quite a while hunting for tips and trying different ways to influence the resolution order until I thought I'd try the union graph. Thanks, Greg -Original Message- From: Andy Seaborne Sent: 19 June 2018 13:56 To: dev@jena.apache.org; Greg Albiston Subject: Re: CMS diff: TDB Datasets Greg, Could you create a JIRA ticket for this please? It is something that looks addressable. The solution proposed (using union graph) is a bit specialised. Andy The query may be better if written (but the "..." may be making a difference.) GRAPH dataset:SmallB { ?b rdf:type my:BThing. ?b my:hasData ?bData. FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral)) } GRAPH dataset:BigA { ?a rdf:type my:AThing. ?a noa:hasGeometry ?aData. } FILTER(my:filterFunction1(?bData, ?aData)) On 19/06/18 10:59, Greg Albiston wrote: > Clone URL (Committers only): > https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://j > ena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext > > Greg Albiston > > Index: trunk/content/documentation/tdb/datasets.mdtext > === > --- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775) > +++ trunk/content/documentation/tdb/datasets.mdtext (working copy) > @@ -51,6 +51,51 @@ > ... > } > > +### Named Graphs & Filters > + > +Named graphs provide a convenient way to organise and store your data. > +However, be aware that in certain situations named graphs can make it > difficult for the query optimiser. > + > +For example, a query with the following structure took 29 minutes to > complete: > + > +SELECT ?b ... > +WHERE { > + > +GRAPH dataset:BigA { > +?a rdf:type my:AThing. > +?a noa:hasGeometry ?aData. > +... > +} > + > +GRAPH dataset:SmallB { > +?b rdf:type my:BThing. > +?b my:hasData ?bData. > +... > +} > + > +FILTER(my:filterFunction1(?bData, ?aData)) > +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 > + 2.0"^^my:dataLiteral) ) > + > +} > + > +The completion duration was reduced to 7 seconds by applying the global > TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the > query as follows: > + > +SELECT ?b ... > +WHERE { > + > +?a rdf:type my:AThing. > +?a noa:hasGeometry ?aData. > +... > + > +?b rdf:type my:BThing. > +?b my:hasData ?bData. > +... > + > +FILTER(my:filterFunction1(?bData, ?aData)) > +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 > + 2.0"^^my:dataLiteral) ) > + > +} > + > ## Special Graph Names > > URI | Meaning >
Re: CMS diff: TDB Datasets
Greg, Could you create a JIRA ticket for this please? It is something that looks addressable. The solution proposed (using union graph) is a bit specialised. Andy The query may be better if written (but the "..." may be making a difference.) GRAPH dataset:SmallB { ?b rdf:type my:BThing. ?b my:hasData ?bData. FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral)) } GRAPH dataset:BigA { ?a rdf:type my:AThing. ?a noa:hasGeometry ?aData. } FILTER(my:filterFunction1(?bData, ?aData)) On 19/06/18 10:59, Greg Albiston wrote: Clone URL (Committers only): https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext Greg Albiston Index: trunk/content/documentation/tdb/datasets.mdtext === --- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775) +++ trunk/content/documentation/tdb/datasets.mdtext (working copy) @@ -51,6 +51,51 @@ ... } +### Named Graphs & Filters + +Named graphs provide a convenient way to organise and store your data. +However, be aware that in certain situations named graphs can make it difficult for the query optimiser. + +For example, a query with the following structure took 29 minutes to complete: + +SELECT ?b ... +WHERE { + +GRAPH dataset:BigA { +?a rdf:type my:AThing. +?a noa:hasGeometry ?aData. +... +} + +GRAPH dataset:SmallB { +?b rdf:type my:BThing. +?b my:hasData ?bData. +... +} + +FILTER(my:filterFunction1(?bData, ?aData)) +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) ) + +} + +The completion duration was reduced to 7 seconds by applying the global TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the query as follows: + +SELECT ?b ... +WHERE { + +?a rdf:type my:AThing. +?a noa:hasGeometry ?aData. +... + +?b rdf:type my:BThing. +?b my:hasData ?bData. +... + +FILTER(my:filterFunction1(?bData, ?aData)) +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) ) + +} + ## Special Graph Names URI | Meaning
CMS diff: TDB Datasets
Clone URL (Committers only): https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext Greg Albiston Index: trunk/content/documentation/tdb/datasets.mdtext === --- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775) +++ trunk/content/documentation/tdb/datasets.mdtext (working copy) @@ -51,6 +51,51 @@ ... } +### Named Graphs & Filters + +Named graphs provide a convenient way to organise and store your data. +However, be aware that in certain situations named graphs can make it difficult for the query optimiser. + +For example, a query with the following structure took 29 minutes to complete: + +SELECT ?b ... +WHERE { + +GRAPH dataset:BigA { +?a rdf:type my:AThing. +?a noa:hasGeometry ?aData. +... +} + +GRAPH dataset:SmallB { +?b rdf:type my:BThing. +?b my:hasData ?bData. +... +} + +FILTER(my:filterFunction1(?bData, ?aData)) +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) ) + +} + +The completion duration was reduced to 7 seconds by applying the global TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the query as follows: + +SELECT ?b ... +WHERE { + +?a rdf:type my:AThing. +?a noa:hasGeometry ?aData. +... + +?b rdf:type my:BThing. +?b my:hasData ?bData. +... + +FILTER(my:filterFunction1(?bData, ?aData)) +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) ) + +} + ## Special Graph Names URI | Meaning
CMS diff: TDB Datasets
Clone URL (Committers only): https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext Greg Albiston Index: trunk/content/documentation/tdb/datasets.mdtext === --- trunk/content/documentation/tdb/datasets.mdtext (revision 1812597) +++ trunk/content/documentation/tdb/datasets.mdtext (working copy) @@ -65,5 +65,22 @@ `urn:x-arq:UnionGraph` using `Dataset.getNamedModel("urn:x-arq:UnionGraph")` . +## Dataset Inferencing +Inferencing on a Model in a Dataset, using the [TDB Java API](java_api.html), follows the same pattern as an in-memory InfModel. +The use of [TDB Transactions](tdb_transactions.html) is **strongly** recommended to avoid data corruption. + //Open TDB Dataset + String directory = ... + Dataset dataset = TDBFactory.createDataset(directory); + + //Retrieve Named Graph from Dataset, or use Default Graph. + String graphURI = "http://example.org/myGraph;; + Model model = dataset.getNamedModel(graphURI); + + //Create RDFS Inference Model, or use other Reasoner e.g. OWL. + InfModel infModel = ModelFactory.createRDFSModel(model); + + ... + //Perform operations on infModel. + ... \ No newline at end of file