Re: TDB size

Andy Seaborne Mon, 16 Feb 2015 10:27:10 -0800

On 16/02/15 18:14, Trevor Donaldson wrote:

Hi think my question got lost. Is it correct to add millions of triples to
the model and then persist the model once using putmodel. I didn't want to
get a timeout or anything like that.


Well, the question was at:

On Feb 13, 2015 5:11 PM, "Trevor Donaldson" <[email protected]> wrote:


which is 22:11 my time on a Friday.

And today is Monday.

Sometimes, things take time to work though the email and provide ananswer. Or a support contract.


        Andy

I am using Fuseki2. I thought it manages the transactions for me. Is this
not the case? I was using datasetfactory to interact with fuseki.
On Feb 13, 2015 12:10 PM, "Andy Seaborne" <[email protected]> wrote:

This may be related:

https://issues.apache.org/jira/browse/JENA-804

I say "may" because the exact patterns of use deep affect the outcome. In
JENA-804 it is across transaction boundaries, which your "putModel" isn't.

(Are you really running without transactions?)

         Andy

On 13/02/15 16:56, Andy Seaborne wrote:

Does the size stabilise?
If not, do some files stabilise in size and other not?

There are two places for growth:

nodes - does the new data have new RDF terms in it?  Old terms are not
deleted, just left round to be reused so if you are adding terms, the
node table can grow.  (Terms are not reference counted - that would be
very expensive for sucgh a small data item.)

TDB (current version) does not properly reuse freed up space in indexes
but should do within a transaction. put is delete-add and some space
should be reused

A proper fix to reuse across transactions may require a database format
change but I haven't had time to workout the details though off the top
of my head, much use should be doable by moving the free chain
management onto the main database on a transaction as its single-active
writer.  The code is currently too cautious about old generation readers
which I now see it need not be.

      Andy

On 12/02/15 17:51, Trevor Donaldson wrote:

Any thoughts anyone? If I change my model every hour with new data or
data
to replace. Lets say over a period of inserting years worth of triples
should I persist potentially millions of triples at one time using
putModel? Committing one time seems to be the only way to not mitigate
against the directory growing exponentially.

On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]>
wrote:

  Damian,


I am using du -ksh ./* on the databases directory.

I am getting
25M      ./test_store

On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]>
wrote:

  On 12/02/15 13:49, Trevor Donaldson wrote:

On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson

<[email protected]


  wrote:

Hi,


I am in the middle of updating our store from RDB to TDB. I have

noticed

a significant size increase in the amount of storage needed.

Currently RDB

is able to hold all the data I need (4 third party services and 4

years of

their data) and it equals ~ 12G. I started inserting data from 1

third
party service, only 4 months of their data into TDB and the TDB

database

size has already reached 15G. Is this behavior expected?

Hi Trevor,

How are you measuring the space used? TDB files tend to be sparse, so
the disk use reported can be unreliable. Example from my system:

6.2M [...] 264M [...] GOSP.dat

The first number (6.2M) is essentially the disk space taken, the
second
(264M!) is the 'length' of the file.

Damian

--
Damian Steer
Senior Technical Researcher
Research IT
+44 (0) 117 928 7057

Re: TDB size

Reply via email to