1. Size does not grow when using sparql quiries to insert using fuseki. 2. Should I use the jena model api to insert 10s of thousands of triples over a period of years with one putmodel? It seems the database only grows by 200b with each putmodel call. So I "flush" once the size issue seems minimal.
To answer your question Andy, the same predicates are being used with potentially different subjects and potentially different objects. On Feb 13, 2015 11:56 AM, "Andy Seaborne" <[email protected]> wrote: > Does the size stabilise? > If not, do some files stabilise in size and other not? > > There are two places for growth: > > nodes - does the new data have new RDF terms in it? Old terms are not > deleted, just left round to be reused so if you are adding terms, the node > table can grow. (Terms are not reference counted - that would be very > expensive for sucgh a small data item.) > > TDB (current version) does not properly reuse freed up space in indexes > but should do within a transaction. put is delete-add and some space should > be reused > > A proper fix to reuse across transactions may require a database format > change but I haven't had time to workout the details though off the top of > my head, much use should be doable by moving the free chain management onto > the main database on a transaction as its single-active writer. The code > is currently too cautious about old generation readers which I now see it > need not be. > > Andy > > On 12/02/15 17:51, Trevor Donaldson wrote: > >> Any thoughts anyone? If I change my model every hour with new data or data >> to replace. Lets say over a period of inserting years worth of triples >> should I persist potentially millions of triples at one time using >> putModel? Committing one time seems to be the only way to not mitigate >> against the directory growing exponentially. >> >> On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]> >> wrote: >> >> Damian, >>> >>> I am using du -ksh ./* on the databases directory. >>> >>> I am getting >>> 25M ./test_store >>> >>> On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]> >>> wrote: >>> >>> On 12/02/15 13:49, Trevor Donaldson wrote: >>>> >>>>> On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson <[email protected] >>>>>> >>>>> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> I am in the middle of updating our store from RDB to TDB. I have >>>>>>> >>>>>> noticed >>>> >>>>> a significant size increase in the amount of storage needed. >>>>>>> >>>>>> Currently RDB >>>> >>>>> is able to hold all the data I need (4 third party services and 4 >>>>>>> >>>>>> years of >>>> >>>>> their data) and it equals ~ 12G. I started inserting data from 1 third >>>>>>> party service, only 4 months of their data into TDB and the TDB >>>>>>> >>>>>> database >>>> >>>>> size has already reached 15G. Is this behavior expected? >>>>>>> >>>>>> >>>> Hi Trevor, >>>> >>>> How are you measuring the space used? TDB files tend to be sparse, so >>>> the disk use reported can be unreliable. Example from my system: >>>> >>>> 6.2M [...] 264M [...] GOSP.dat >>>> >>>> The first number (6.2M) is essentially the disk space taken, the second >>>> (264M!) is the 'length' of the file. >>>> >>>> Damian >>>> >>>> -- >>>> Damian Steer >>>> Senior Technical Researcher >>>> Research IT >>>> +44 (0) 117 928 7057 >>>> >>>> >>> >>> >> >
