On 10/01/17 21:56, Ganesh Selvaraj wrote:
Thank you.
Now I have loaded data using method tdbloader.main(....), and it has
created me index and stats.

I have a query which I am executing, and I feel likethe optimizer is not
optimising the query. Can you advice me if I am using it the right way ?

What is this trying to do?

> Query query = QueryFactory.create(sparqlQueryString);
>
> //Tried with and without the Algebra step
>
> Op op = Algebra.compile(query) ;
>
> query = OpAsQuery.asQuery(op);

because

1/ BGP optimization does not occur in the algebra - it happens in the TDB storage layer.

2/ Algebra.compile does not run the optimizer - it runs the query syntax to algebra conversion as defined in the SPARQL spec. Use Algebra.optimize to get the next stage.

3/ OpAsQuery is for when an application builds or manipulates algebra and wants (if its possible - it isn't always) a SPARQL query.

        Andy

LUBM is an unrealistic benchmark because the universities do not link to each other. The inventors of the benchmark have stated its limitations.

It is unrealistic for loading rates as well - even system seems to report about x2 faster for LUBM than the more realistic BSBM. Its down to LUBM have a lot of triples and few notes + the lack of university linking making data very local in indexes.

Treat with care.


This is the method;



public void testLUBMQuery1_original() {

long duration = 0l;

Date startTime, endTime;

startTime = new Date();


String sparqlQueryString = "PREFIX ub: <
http://swat.cse.lehigh.edu/onto/univ-bench.owl#> "

+ "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "

+ "SELECT ?X ?Y ?Z WHERE { "

+ "?Z ub:subOrganizationOf ?Y. "

+ "?Y rdf:type ub:University. "

+ "?Z rdf:type ub:Department. "

+ "?X ub:memberOf ?Z. "

+ "?X rdf:type ub:GraduateStudent. "

+ "?X ub:undergraduateDegreeFrom ?Y. }";

Query query = QueryFactory.create(sparqlQueryString);

//Tried with and without the Algebra step

Op op = Algebra.compile(query) ;

query = OpAsQuery.asQuery(op);

QueryExecution qexec = QueryExecutionFactory.create(query, dataset);

ResultSet results = qexec.execSelect();

ResultSetFormatter.out(results);

endTime = new Date();

duration = endTime.getTime() - startTime.getTime();

System.out.println(query.toString());

System.out.println("Original Query 1 Duration: " + duration );

}


Thanks Again.


Best,

Ganesh

On 10 January 2017 at 11:37, Andy Seaborne <[email protected]> wrote:



On 09/01/17 19:40, A. Soroka wrote:

The layout of the statistics file is documented here:

https://jena.apache.org/documentation/tdb/optimizer.html#
statistics-rule-file

tdbloader and tdbloader2 are the CLI utilities for building TDB
databases, but they are written in Java and can be used in Java.

https://jena.apache.org/documentation/tdb/commands.html#tdbloader


Jena is open source and maven central has source artifacts that you IDE
will automatically attach to your projects.

See the package:
org.apache.jena.tdb.solver.stats;

        Andy


---
A. Soroka
The University of Virginia Library

On Jan 9, 2017, at 2:36 PM, Ganesh Selvaraj <[email protected]>
wrote:

Hi All,

I am using Jena TDB for my work. So far I could not find much
documentation
on data indexing and statistics building for Jena TDB.

I would prefer doing it via a Java API.

Any help/documentation is appreciated.

Thanks
Ganesh




Reply via email to