Does the size stabilise?
If not, do some files stabilise in size and other not?

There are two places for growth:

nodes - does the new data have new RDF terms in it? Old terms are not deleted, just left round to be reused so if you are adding terms, the node table can grow. (Terms are not reference counted - that would be very expensive for sucgh a small data item.)

TDB (current version) does not properly reuse freed up space in indexes but should do within a transaction. put is delete-add and some space should be reused

A proper fix to reuse across transactions may require a database format change but I haven't had time to workout the details though off the top of my head, much use should be doable by moving the free chain management onto the main database on a transaction as its single-active writer. The code is currently too cautious about old generation readers which I now see it need not be.

        Andy

On 12/02/15 17:51, Trevor Donaldson wrote:
Any thoughts anyone? If I change my model every hour with new data or data
to replace. Lets say over a period of inserting years worth of triples
should I persist potentially millions of triples at one time using
putModel? Committing one time seems to be the only way to not mitigate
against the directory growing exponentially.

On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]>
wrote:

Damian,

I am using du -ksh ./* on the databases directory.

I am getting
25M      ./test_store

On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]> wrote:

On 12/02/15 13:49, Trevor Donaldson wrote:
On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson <[email protected]

wrote:

Hi,

I am in the middle of updating our store from RDB to TDB. I have
noticed
a significant size increase in the amount of storage needed.
Currently RDB
is able to hold all the data I need (4 third party services and 4
years of
their data) and it equals ~ 12G. I started inserting data from 1 third
party service, only 4 months of their data into TDB and the TDB
database
size has already reached 15G. Is this behavior expected?

Hi Trevor,

How are you measuring the space used? TDB files tend to be sparse, so
the disk use reported can be unreliable. Example from my system:

6.2M [...] 264M [...] GOSP.dat

The first number (6.2M) is essentially the disk space taken, the second
(264M!) is the 'length' of the file.

Damian

--
Damian Steer
Senior Technical Researcher
Research IT
+44 (0) 117 928 7057





Reply via email to