On 30/06/12 21:04, Sarven Capadisli wrote:
On 2012-06-30 12:48, Andy Seaborne wrote:
On 29/06/12 02:49, Sarven Capadisli wrote:
On 2012-06-28 20:25, Andy Seaborne wrote:
On 28/06/12 10:11, Sarven Capadisli wrote:
I was wondering if there is a way to rebuild the TDB index from
command-line and have it consequently update the stats file?
There isn't a way to rebuild just one of the indexes from another in
the
TDB distribution. Is that you want to do?
tdbstats calculates the stats.
I want to optimize query response times.
I can't get a satisfactory solution with tdbstats because it doesn't let
me optimize for each named graph in the store.
What sort of queries are you asking the store?
For a store with 165 million triples, some real examples that's :
SELECT DISTINCT ?o WHERE { ?s a ?o }
Time: 159.359 sec (100 sec in second time round)
SELECT DISTINCT ?o WHERE { GRAPH <http://worldbank.270a.info/graph/meta>
{ ?s a ?o } }
Time: 0.394 sec
SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-finances> { ?s a ?o } }
Time: 1.946 sec
SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-climates { ?s a ?o } }
Time: 46.967 sec
SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-development-indicators> { ?s a
?o } }
Time: 61.323 sec
SELECT DISTINCT ?o WHERE { GRAPH
<http://worldbank.270a.info/graph/world-bank-projects-and-operations> {
?s a ?o } }
Time: 0.559 sec
A quick note on this: when I run the query where the default graph is
the union of all graphs, it takes much longer in total than the total
time for queries with different named graphs.
Other examples:
SELECT DISTINCT ?p ?o WHERE { GRAPH <g> ?s ?p ?o } --time=10m
SELECT DISTINCT ?g WHERE { GRAPH ?g { } } --time=60s (49s in second
round, 55s on third..)
The low-level optimizer, stats or otherwise, reorders the triples within
a basic graph pattern. In your example, there is only one triple
pattern so there are no choices of ordering and the optimizer will make
no difference.
SELECT DISTINCT ?o WHERE { ?s a ?o }
over the union default graph is an access to the POSG index. P first
because P = rdf:type is fixed. TDB uses 3 indexes for the (real)
default graph, 6 for named graphs, which means any access of G/S/P/O can
be found from an index but not in every possible sort order (c.f.
hexstore which has 6 indexes for the single graph) It would take 24 (=
4*3*2*1) all possibilities of names graphs.
And when it is the union graph, the results have to be reduced to unique
triples so { ?s a ?o } becomes what is effectively
DISTINCT ?s ?o { GRAPH ?g { ?s a ?o } }
Each triple pattern has to have the distinct-ness applied so it puts
stress on memory as well. If it were cleverer, it would know it could
use a cheaper filter to calculate distinct-ness.
Also the system isn't smart enough to notice you have a DISTINCT of a
unique expression and it does not need the outer DISTINCT.
Something similar happens for
SELECT DISTINCT ?g WHERE { GRAPH ?g { } }
The thing that will most help performance is RAM. How much RAM and on
what sort of OS are you running?
Andy