On 05/07/2022 11:53, Vilnis Termanis wrote:
Hi,
A previous thread about TDB disk usage
(https://markmail.org/message/u45e63nmwkqnsykc) says: "Nodes are not
garbage collected so old blank nodes (and unused URIs and literals)
remain in the node table."
For TDB2, the indexes are MVCC copy-on-write trees. Changes do not
overwrite previous transactions, new blocks are allocated.
Existing outstanding read transactions continue executing without locking.
(In fact, until compaction happens, the entire state of all database
transactions is in the indexes and node table. The ability is not
exposed, but it would be possible to reset to the end of any previous
transaction.)
As we're deleting & re-inserting data into named graphs (where often
the end result is no change in triples), we've observed growth of
indexes GOSP, GPOS, GSPO, OSPG, POSG and SPOG. But not: GOSP, GPOS &
GSPO.
All of the 4-element quad indexes may grow. Whether it triggers
allocating an 8M segment or touching part of a sparse file is another
matter.
The 3 indexes are not touched by the update shown.
Andy
I can reproduce this (v4.5.0) with by repeatedly running:
DELETE { GRAPH ?g { ?s ?p ?o } }
INSERT { GRAPH ?g { ?s ?p ?o } }
WHERE {
VALUES ?g { some:graph }
VALUES (?s ?p ?o) {
# Set of 1000 fixed triples with URI objects
}
}
Is that expected behaviour, i.e. that the indexes and not just the
node table will grow? (Or maybe I have misunderstood the definition of
"node table")
And: Is there a reason why GOSP, GPOS & GSPO don't grow with such a
delete/insert clause?
(I appreciate that TBD2 offers compaction and TDB space could be
reclaimed by re-creating the database.)
Regards,
Vilnis