On 10/01/17 21:56, Ganesh Selvaraj wrote:
Thank you.
Now I have loaded data using method tdbloader.main(....), and it has
created me index and stats.
I have a query which I am executing, and I feel likethe optimizer is not
optimising the query. Can you advice me if I am using it the right way ?
What is this trying to do?
> Query query = QueryFactory.create(sparqlQueryString);
>
> //Tried with and without the Algebra step
>
> Op op = Algebra.compile(query) ;
>
> query = OpAsQuery.asQuery(op);
because
1/ BGP optimization does not occur in the algebra - it happens in the
TDB storage layer.
2/ Algebra.compile does not run the optimizer - it runs the query syntax
to algebra conversion as defined in the SPARQL spec. Use
Algebra.optimize to get the next stage.
3/ OpAsQuery is for when an application builds or manipulates algebra
and wants (if its possible - it isn't always) a SPARQL query.
Andy
LUBM is an unrealistic benchmark because the universities do not link to
each other. The inventors of the benchmark have stated its limitations.
It is unrealistic for loading rates as well - even system seems to
report about x2 faster for LUBM than the more realistic BSBM. Its down
to LUBM have a lot of triples and few notes + the lack of university
linking making data very local in indexes.
Treat with care.
This is the method;
public void testLUBMQuery1_original() {
long duration = 0l;
Date startTime, endTime;
startTime = new Date();
String sparqlQueryString = "PREFIX ub: <
http://swat.cse.lehigh.edu/onto/univ-bench.owl#> "
+ "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
+ "SELECT ?X ?Y ?Z WHERE { "
+ "?Z ub:subOrganizationOf ?Y. "
+ "?Y rdf:type ub:University. "
+ "?Z rdf:type ub:Department. "
+ "?X ub:memberOf ?Z. "
+ "?X rdf:type ub:GraduateStudent. "
+ "?X ub:undergraduateDegreeFrom ?Y. }";
Query query = QueryFactory.create(sparqlQueryString);
//Tried with and without the Algebra step
Op op = Algebra.compile(query) ;
query = OpAsQuery.asQuery(op);
QueryExecution qexec = QueryExecutionFactory.create(query, dataset);
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(results);
endTime = new Date();
duration = endTime.getTime() - startTime.getTime();
System.out.println(query.toString());
System.out.println("Original Query 1 Duration: " + duration );
}
Thanks Again.
Best,
Ganesh
On 10 January 2017 at 11:37, Andy Seaborne <[email protected]> wrote:
On 09/01/17 19:40, A. Soroka wrote:
The layout of the statistics file is documented here:
https://jena.apache.org/documentation/tdb/optimizer.html#
statistics-rule-file
tdbloader and tdbloader2 are the CLI utilities for building TDB
databases, but they are written in Java and can be used in Java.
https://jena.apache.org/documentation/tdb/commands.html#tdbloader
Jena is open source and maven central has source artifacts that you IDE
will automatically attach to your projects.
See the package:
org.apache.jena.tdb.solver.stats;
Andy
---
A. Soroka
The University of Virginia Library
On Jan 9, 2017, at 2:36 PM, Ganesh Selvaraj <[email protected]>
wrote:
Hi All,
I am using Jena TDB for my work. So far I could not find much
documentation
on data indexing and statistics building for Jena TDB.
I would prefer doing it via a Java API.
Any help/documentation is appreciated.
Thanks
Ganesh