Re: Rebuilding TDB index and updating stats file

Andy Seaborne Sun, 01 Jul 2012 08:33:56 -0700

On 30/06/12 21:04, Sarven Capadisli wrote:

On 2012-06-30 12:48, Andy Seaborne wrote:

On 29/06/12 02:49, Sarven Capadisli wrote:

On 2012-06-28 20:25, Andy Seaborne wrote:

On 28/06/12 10:11, Sarven Capadisli wrote:

I was wondering if there is a way to rebuild the TDB index from
command-line and have it consequently update the stats file?


There isn't a way to rebuild just one of the indexes from another in
the
TDB distribution.  Is that you want to do?

tdbstats calculates the stats.


I want to optimize query response times.

I can't get a satisfactory solution with tdbstats because it doesn't let
me optimize for each named graph in the store.


What sort of queries are you asking the store?


For a store with 165 million triples, some real examples that's :

SELECT DISTINCT ?o WHERE { ?s a ?o }

Time: 159.359 sec (100 sec in second time round)


SELECT DISTINCT ?o WHERE { GRAPH <http://worldbank.270a.info/graph/meta>
{ ?s a ?o } }

Time: 0.394 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-finances> { ?s a ?o } }

Time: 1.946 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-climates { ?s a ?o } }

Time: 46.967 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-development-indicators> { ?s a
?o } }

Time: 61.323 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-projects-and-operations> {
?s a ?o } }

Time: 0.559 sec

A quick note on this: when I run the query where the default graph is
the union of all graphs, it takes much longer in total than the total
time for queries with different named graphs.

Other examples:

SELECT DISTINCT ?p ?o WHERE { GRAPH <g> ?s ?p ?o } --time=10m

SELECT DISTINCT ?g WHERE { GRAPH ?g { } } --time=60s (49s in second
round, 55s on third..)

The low-level optimizer, stats or otherwise, reorders the triples withina basic graph pattern. In your example, there is only one triplepattern so there are no choices of ordering and the optimizer will makeno difference.


SELECT DISTINCT ?o WHERE { ?s a ?o }

over the union default graph is an access to the POSG index. P firstbecause P = rdf:type is fixed. TDB uses 3 indexes for the (real)default graph, 6 for named graphs, which means any access of G/S/P/O canbe found from an index but not in every possible sort order (c.f.hexstore which has 6 indexes for the single graph) It would take 24 (=4*3*2*1) all possibilities of names graphs.

And when it is the union graph, the results have to be reduced to uniquetriples so { ?s a ?o } becomes what is effectively


DISTINCT ?s ?o { GRAPH ?g { ?s a ?o } }

Each triple pattern has to have the distinct-ness applied so it putsstress on memory as well. If it were cleverer, it would know it coulduse a cheaper filter to calculate distinct-ness.

Also the system isn't smart enough to notice you have a DISTINCT of aunique expression and it does not need the outer DISTINCT.


Something similar happens for

 SELECT DISTINCT ?g WHERE { GRAPH ?g { } }

The thing that will most help performance is RAM. How much RAM and onwhat sort of OS are you running?


        Andy

Re: Rebuilding TDB index and updating stats file

Reply via email to