Re: TDB: node table VS index disk usage

Andy Seaborne Tue, 05 Jul 2022 05:23:12 -0700



On 05/07/2022 11:53, Vilnis Termanis wrote:

Hi,

A previous thread about TDB disk usage
(https://markmail.org/message/u45e63nmwkqnsykc) says: "Nodes are not
garbage collected so old blank nodes (and unused URIs and literals)
remain in the node table."

For TDB2, the indexes are MVCC copy-on-write trees. Changes do notoverwrite previous transactions, new blocks are allocated.


Existing outstanding read transactions continue executing without locking.

(In fact, until compaction happens, the entire state of all databasetransactions is in the indexes and node table. The ability is notexposed, but it would be possible to reset to the end of any previoustransaction.)

As we're deleting & re-inserting data into named graphs (where often
the end result is no change in triples), we've observed growth of
indexes GOSP, GPOS, GSPO, OSPG, POSG and SPOG. But not: GOSP, GPOS &
GSPO.

All of the 4-element quad indexes may grow. Whether it triggersallocating an 8M segment or touching part of a sparse file is anothermatter.


The 3 indexes are not touched by the update shown.

    Andy


I can reproduce this (v4.5.0) with by repeatedly running:

DELETE { GRAPH ?g { ?s ?p ?o } }
INSERT { GRAPH ?g { ?s ?p ?o } }
WHERE {
VALUES ?g { some:graph }
VALUES (?s ?p ?o) {
# Set of 1000 fixed triples with URI objects
}
}

Is that expected behaviour, i.e. that the indexes and not just the
node table will grow? (Or maybe I have misunderstood the definition of
"node table")
And: Is there a reason why GOSP, GPOS & GSPO don't grow with such a
delete/insert clause?

(I appreciate that TBD2 offers compaction and TDB space could be
reclaimed by re-creating the database.)

Regards,
Vilnis

Re: TDB: node table VS index disk usage

Reply via email to