On 30/06/12 21:04, Sarven Capadisli wrote:
On 2012-06-30 12:48, Andy Seaborne wrote:
On 29/06/12 02:49, Sarven Capadisli wrote:
On 2012-06-28 20:25, Andy Seaborne wrote:
On 28/06/12 10:11, Sarven Capadisli wrote:
I was wondering if there is a way to rebuild the TDB index from
command-line and have it consequently update the stats file?

There isn't a way to rebuild just one of the indexes from another in
the
TDB distribution.  Is that you want to do?

tdbstats calculates the stats.

I want to optimize query response times.

I can't get a satisfactory solution with tdbstats because it doesn't let
me optimize for each named graph in the store.

What sort of queries are you asking the store?

For a store with 165 million triples, some real examples that's :

SELECT DISTINCT ?o WHERE { ?s a ?o }

Time: 159.359 sec (100 sec in second time round)


SELECT DISTINCT ?o WHERE { GRAPH <http://worldbank.270a.info/graph/meta>
{ ?s a ?o } }

Time: 0.394 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-finances> { ?s a ?o } }

Time: 1.946 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-climates { ?s a ?o } }

Time: 46.967 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-development-indicators> { ?s a
?o } }

Time: 61.323 sec

SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-projects-and-operations> {
?s a ?o } }

Time: 0.559 sec

A quick note on this: when I run the query where the default graph is
the union of all graphs, it takes much longer in total than the total
time for queries with different named graphs.

Other examples:

SELECT DISTINCT ?p ?o WHERE { GRAPH <g> ?s ?p ?o } --time=10m

SELECT DISTINCT ?g WHERE { GRAPH ?g { } } --time=60s (49s in second
round, 55s on third..)

The low-level optimizer, stats or otherwise, reorders the triples within a basic graph pattern. In your example, there is only one triple pattern so there are no choices of ordering and the optimizer will make no difference.

SELECT DISTINCT ?o WHERE { ?s a ?o }

over the union default graph is an access to the POSG index. P first because P = rdf:type is fixed. TDB uses 3 indexes for the (real) default graph, 6 for named graphs, which means any access of G/S/P/O can be found from an index but not in every possible sort order (c.f. hexstore which has 6 indexes for the single graph) It would take 24 (= 4*3*2*1) all possibilities of names graphs.

And when it is the union graph, the results have to be reduced to unique triples so { ?s a ?o } becomes what is effectively

DISTINCT ?s ?o { GRAPH ?g { ?s a ?o } }

Each triple pattern has to have the distinct-ness applied so it puts stress on memory as well. If it were cleverer, it would know it could use a cheaper filter to calculate distinct-ness.

Also the system isn't smart enough to notice you have a DISTINCT of a unique expression and it does not need the outer DISTINCT.

Something similar happens for

 SELECT DISTINCT ?g WHERE { GRAPH ?g { } }

The thing that will most help performance is RAM. How much RAM and on what sort of OS are you running?

        Andy

Reply via email to