RE: CMS diff: TDB Datasets

2018-06-20 Thread Greg Albiston
Hi Andy,

Thanks for the response. Your suggestion worked and the query completed in a 
similar time to the union graph approach.
I'd tried moving the filter into the graph clause but not swapping the graph 
order.

I added that update on the documentation so if anyone else was having similar 
problems it might help.
Do you still want me to create a JIRA for it?

More generally, is there a page/section for tips on query writing to help 
optimisation? 
I searched but could only find description of TDB's optimisation functionality 
and extending query execution. I spent quite a while hunting for tips and 
trying different ways to influence the resolution order until I thought I'd try 
the union graph.

Thanks,

Greg 

-Original Message-
From: Andy Seaborne  
Sent: 19 June 2018 13:56
To: dev@jena.apache.org; Greg Albiston 
Subject: Re: CMS diff: TDB Datasets

Greg,

Could you create a JIRA ticket for this please?  It is something that looks 
addressable.  The solution proposed (using union graph) is a bit specialised.

 Andy

The query may be better if written (but the "..." may be making a
difference.)

  GRAPH dataset:SmallB {
?b rdf:type my:BThing.
?b my:hasData ?bData.
   FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral)) }

GRAPH dataset:BigA {
   ?a rdf:type my:AThing.
   ?a noa:hasGeometry ?aData.
}
FILTER(my:filterFunction1(?bData, ?aData))



On 19/06/18 10:59, Greg Albiston wrote:
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://j
> ena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext
> 
> Greg Albiston
> 
> Index: trunk/content/documentation/tdb/datasets.mdtext
> ===
> --- trunk/content/documentation/tdb/datasets.mdtext   (revision 1833775)
> +++ trunk/content/documentation/tdb/datasets.mdtext   (working copy)
> @@ -51,6 +51,51 @@
>   ...
>   }
>   
> +### Named Graphs & Filters
> +
> +Named graphs provide a convenient way to organise and store your data.
> +However, be aware that in certain situations named graphs can make it 
> difficult for the query optimiser.
> +
> +For example, a query with the following structure took 29 minutes to 
> complete:
> +
> +SELECT ?b ...
> +WHERE {
> +
> +GRAPH dataset:BigA {
> +?a rdf:type my:AThing.
> +?a noa:hasGeometry ?aData.
> +...
> +}
> + 
> +GRAPH dataset:SmallB {
> +?b rdf:type my:BThing.
> +?b my:hasData ?bData.
> +...  
> +}
> +
> +FILTER(my:filterFunction1(?bData, ?aData))
> +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 
> + 2.0"^^my:dataLiteral) )
> +
> +}
> +
> +The completion duration was reduced to 7 seconds by applying the global 
> TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the 
> query as follows:
> +
> +SELECT ?b ...
> +WHERE {
> +
> +?a rdf:type my:AThing.
> +?a noa:hasGeometry ?aData.
> +...
> +
> +?b rdf:type my:BThing.
> +?b my:hasData ?bData.
> +...  
> +
> +FILTER(my:filterFunction1(?bData, ?aData))
> +FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 
> + 2.0"^^my:dataLiteral) )
> +
> +}
> +
>   ## Special Graph Names
>   
>   URI | Meaning
> 


Re: CMS diff: TDB Datasets

2018-06-19 Thread Andy Seaborne

Greg,

Could you create a JIRA ticket for this please?  It is something that 
looks addressable.  The solution proposed (using union graph) is a bit 
specialised.


Andy

The query may be better if written (but the "..." may be making a 
difference.)


 GRAPH dataset:SmallB {
   ?b rdf:type my:BThing.
   ?b my:hasData ?bData.
  FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral))
}

GRAPH dataset:BigA {
  ?a rdf:type my:AThing.
  ?a noa:hasGeometry ?aData.
}
FILTER(my:filterFunction1(?bData, ?aData))



On 19/06/18 10:59, Greg Albiston wrote:

Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext

Greg Albiston

Index: trunk/content/documentation/tdb/datasets.mdtext
===
--- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775)
+++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
@@ -51,6 +51,51 @@
  ...
  }
  
+### Named Graphs & Filters

+
+Named graphs provide a convenient way to organise and store your data.
+However, be aware that in certain situations named graphs can make it 
difficult for the query optimiser.
+
+For example, a query with the following structure took 29 minutes to complete:
+
+SELECT ?b ...
+WHERE {
+
+GRAPH dataset:BigA {
+?a rdf:type my:AThing.
+?a noa:hasGeometry ?aData.
+...
+}
+   
+GRAPH dataset:SmallB {
+?b rdf:type my:BThing.
+?b my:hasData ?bData.
+...
+}
+
+FILTER(my:filterFunction1(?bData, ?aData))
+FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+}
+
+The completion duration was reduced to 7 seconds by applying the global 
TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the 
query as follows:
+
+SELECT ?b ...
+WHERE {
+
+?a rdf:type my:AThing.
+?a noa:hasGeometry ?aData.
+...
+
+?b rdf:type my:BThing.
+?b my:hasData ?bData.
+...
+
+FILTER(my:filterFunction1(?bData, ?aData))
+FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+}
+
  ## Special Graph Names
  
  URI | Meaning




CMS diff: TDB Datasets

2018-06-19 Thread Greg Albiston
Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext

Greg Albiston

Index: trunk/content/documentation/tdb/datasets.mdtext
===
--- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775)
+++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
@@ -51,6 +51,51 @@
 ...
 }
 
+### Named Graphs & Filters
+
+Named graphs provide a convenient way to organise and store your data. 
+However, be aware that in certain situations named graphs can make it 
difficult for the query optimiser.
+
+For example, a query with the following structure took 29 minutes to complete:
+
+SELECT ?b ...
+WHERE {
+
+GRAPH dataset:BigA {
+?a rdf:type my:AThing.
+?a noa:hasGeometry ?aData.
+...
+}
+   
+GRAPH dataset:SmallB {
+?b rdf:type my:BThing.
+?b my:hasData ?bData.
+...
+}
+
+FILTER(my:filterFunction1(?bData, ?aData))
+FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+}
+ 
+The completion duration was reduced to 7 seconds by applying the global 
TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the 
query as follows:
+
+SELECT ?b ...
+WHERE {
+
+?a rdf:type my:AThing.
+?a noa:hasGeometry ?aData.
+...
+
+?b rdf:type my:BThing.
+?b my:hasData ?bData.
+...
+
+FILTER(my:filterFunction1(?bData, ?aData))
+FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+}
+
 ## Special Graph Names
 
 URI | Meaning



CMS diff: TDB Datasets

2017-10-19 Thread Greg Albiston
Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext

Greg Albiston

Index: trunk/content/documentation/tdb/datasets.mdtext
===
--- trunk/content/documentation/tdb/datasets.mdtext (revision 1812597)
+++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
@@ -65,5 +65,22 @@
 `urn:x-arq:UnionGraph` using
 `Dataset.getNamedModel("urn:x-arq:UnionGraph")` .
 
+## Dataset Inferencing
 
+Inferencing on a Model in a Dataset, using the [TDB Java API](java_api.html), 
follows the same pattern as an in-memory InfModel.
+The use of [TDB Transactions](tdb_transactions.html) is **strongly** 
recommended to avoid data corruption.
 
+  //Open TDB Dataset
+  String directory = ...
+  Dataset dataset = TDBFactory.createDataset(directory);
+
+  //Retrieve Named Graph from Dataset, or use Default Graph.
+  String graphURI = "http://example.org/myGraph;;
+  Model model = dataset.getNamedModel(graphURI);
+
+  //Create RDFS Inference Model, or use other Reasoner e.g. OWL.
+  InfModel infModel = ModelFactory.createRDFSModel(model);
+
+  ...
+  //Perform operations on infModel.
+  ...
\ No newline at end of file