Re: Fuseki TDB database size growth

Rob Vesse Mon, 21 Aug 2017 03:29:23 -0700

All the data structures used in TDB are broadly speaking append only. This 
means that the database Will tend to grow in size overtime.


Certain ways of using the database can exacerbate this. In your example I would 
guess that you have a lot of blank nodes present in the data?

Each unique blank node generates a unique identifier inside the system and will 
continually expand the node table. TDB does not implement reference counting so 
even if you delete every triple that references a given RDF node it will never 
be removed from the node table.

Similarly as the indexes are updated they do not reclaim space so the B+Tree’s 
will continue to grow over time.

Reloading from scratch creates a smaller database because it is able to 
maximally pack the data into the Data structures on disk and you do not have 
any unused identifiers allocated.

Rob

On 21/08/2017 11:20, "Lorenzo Manzoni" <[email protected]> wrote:

    Hi,
    
        I'm writing you because we have a behavior of fuseki TDB  we can not 
    understand:
    
    */the fuseki database filesystem size continues to grow even if the 
    number of triples does not increase substantially./*
    
    We are using the latest version of fuseki (3.4.0) as triple store of a 
    semantic media wiki (mw 1.24, smw 2.1.1) and all the night we have a 
    scheduled job that updates the wiki pages and executes maintenance 
    scripts(e.g. 
    
https://www.semantic-mediawiki.org/wiki/Help:Maintenance_script_%22rebuildData.php%22)
 
    . These scripts update the semantic data on the wiki and the triples on 
    fuseki. Basically every triple are rewritten.
    
    We have observed that the fuseki database filesystem size grew over time 
    to 20Gb but when we recreate it from scratch the database size is only 
    500 Mb.
    
    After that every day  fuseki database grows about 200Mb and the number 
    of triples does not change substantially
    
    I originally assumed that the rebuild data script was the problem but 
    when I executed it alone the fuseki database space did not increase.
    
    We are running fueski on a 64 bit redhat machine.
    
    Someone can  help us?
    
    Thanks in advance,
    
    Lorenzo

Re: Fuseki TDB database size growth

Reply via email to