Re: Jena TDB indexing and stats building

Andy Seaborne Wed, 11 Jan 2017 00:53:33 -0800


On 10/01/17 21:56, Ganesh Selvaraj wrote:

Thank you.
Now I have loaded data using method tdbloader.main(....), and it has
created me index and stats.

I have a query which I am executing, and I feel likethe optimizer is not
optimising the query. Can you advice me if I am using it the right way ?


What is this trying to do?

> Query query = QueryFactory.create(sparqlQueryString);
>
> //Tried with and without the Algebra step
>
> Op op = Algebra.compile(query) ;
>
> query = OpAsQuery.asQuery(op);

because

1/ BGP optimization does not occur in the algebra - it happens in theTDB storage layer.

2/ Algebra.compile does not run the optimizer - it runs the query syntaxto algebra conversion as defined in the SPARQL spec. UseAlgebra.optimize to get the next stage.

3/ OpAsQuery is for when an application builds or manipulates algebraand wants (if its possible - it isn't always) a SPARQL query.


        Andy

LUBM is an unrealistic benchmark because the universities do not link toeach other. The inventors of the benchmark have stated its limitations.

It is unrealistic for loading rates as well - even system seems toreport about x2 faster for LUBM than the more realistic BSBM. Its downto LUBM have a lot of triples and few notes + the lack of universitylinking making data very local in indexes.


Treat with care.


This is the method;



public void testLUBMQuery1_original() {

long duration = 0l;

Date startTime, endTime;

startTime = new Date();


String sparqlQueryString = "PREFIX ub: <
http://swat.cse.lehigh.edu/onto/univ-bench.owl#> "

+ "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "

+ "SELECT ?X ?Y ?Z WHERE { "

+ "?Z ub:subOrganizationOf ?Y. "

+ "?Y rdf:type ub:University. "

+ "?Z rdf:type ub:Department. "

+ "?X ub:memberOf ?Z. "

+ "?X rdf:type ub:GraduateStudent. "

+ "?X ub:undergraduateDegreeFrom ?Y. }";

Query query = QueryFactory.create(sparqlQueryString);

//Tried with and without the Algebra step

Op op = Algebra.compile(query) ;

query = OpAsQuery.asQuery(op);

QueryExecution qexec = QueryExecutionFactory.create(query, dataset);

ResultSet results = qexec.execSelect();

ResultSetFormatter.out(results);

endTime = new Date();

duration = endTime.getTime() - startTime.getTime();

System.out.println(query.toString());

System.out.println("Original Query 1 Duration: " + duration );

}


Thanks Again.


Best,

Ganesh

On 10 January 2017 at 11:37, Andy Seaborne <[email protected]> wrote:



On 09/01/17 19:40, A. Soroka wrote:

The layout of the statistics file is documented here:

https://jena.apache.org/documentation/tdb/optimizer.html#
statistics-rule-file

tdbloader and tdbloader2 are the CLI utilities for building TDB
databases, but they are written in Java and can be used in Java.

https://jena.apache.org/documentation/tdb/commands.html#tdbloader

Jena is open source and maven central has source artifacts that you IDE
will automatically attach to your projects.

See the package:
org.apache.jena.tdb.solver.stats;

        Andy


---

A. Soroka
The University of Virginia Library

On Jan 9, 2017, at 2:36 PM, Ganesh Selvaraj <[email protected]>

wrote:

Hi All,

I am using Jena TDB for my work. So far I could not find much
documentation
on data indexing and statistics building for Jena TDB.

I would prefer doing it via a Java API.

Any help/documentation is appreciated.

Thanks
Ganesh

Re: Jena TDB indexing and stats building

Reply via email to