I am using Fuseki2. I thought it manages the transactions for me. Is this not the case? I was using datasetfactory to interact with fuseki. On Feb 13, 2015 12:10 PM, "Andy Seaborne" <[email protected]> wrote:
> This may be related: > > https://issues.apache.org/jira/browse/JENA-804 > > I say "may" because the exact patterns of use deep affect the outcome. In > JENA-804 it is across transaction boundaries, which your "putModel" isn't. > > (Are you really running without transactions?) > > Andy > > On 13/02/15 16:56, Andy Seaborne wrote: > >> Does the size stabilise? >> If not, do some files stabilise in size and other not? >> >> There are two places for growth: >> >> nodes - does the new data have new RDF terms in it? Old terms are not >> deleted, just left round to be reused so if you are adding terms, the >> node table can grow. (Terms are not reference counted - that would be >> very expensive for sucgh a small data item.) >> >> TDB (current version) does not properly reuse freed up space in indexes >> but should do within a transaction. put is delete-add and some space >> should be reused >> >> A proper fix to reuse across transactions may require a database format >> change but I haven't had time to workout the details though off the top >> of my head, much use should be doable by moving the free chain >> management onto the main database on a transaction as its single-active >> writer. The code is currently too cautious about old generation readers >> which I now see it need not be. >> >> Andy >> >> On 12/02/15 17:51, Trevor Donaldson wrote: >> >>> Any thoughts anyone? If I change my model every hour with new data or >>> data >>> to replace. Lets say over a period of inserting years worth of triples >>> should I persist potentially millions of triples at one time using >>> putModel? Committing one time seems to be the only way to not mitigate >>> against the directory growing exponentially. >>> >>> On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]> >>> wrote: >>> >>> Damian, >>>> >>>> I am using du -ksh ./* on the databases directory. >>>> >>>> I am getting >>>> 25M ./test_store >>>> >>>> On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]> >>>> wrote: >>>> >>>> On 12/02/15 13:49, Trevor Donaldson wrote: >>>>> >>>>>> On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson >>>>>>> <[email protected] >>>>>>> >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>>> >>>>>>>> I am in the middle of updating our store from RDB to TDB. I have >>>>>>>> >>>>>>> noticed >>>>> >>>>>> a significant size increase in the amount of storage needed. >>>>>>>> >>>>>>> Currently RDB >>>>> >>>>>> is able to hold all the data I need (4 third party services and 4 >>>>>>>> >>>>>>> years of >>>>> >>>>>> their data) and it equals ~ 12G. I started inserting data from 1 >>>>>>>> third >>>>>>>> party service, only 4 months of their data into TDB and the TDB >>>>>>>> >>>>>>> database >>>>> >>>>>> size has already reached 15G. Is this behavior expected? >>>>>>>> >>>>>>> >>>>> Hi Trevor, >>>>> >>>>> How are you measuring the space used? TDB files tend to be sparse, so >>>>> the disk use reported can be unreliable. Example from my system: >>>>> >>>>> 6.2M [...] 264M [...] GOSP.dat >>>>> >>>>> The first number (6.2M) is essentially the disk space taken, the second >>>>> (264M!) is the 'length' of the file. >>>>> >>>>> Damian >>>>> >>>>> -- >>>>> Damian Steer >>>>> Senior Technical Researcher >>>>> Research IT >>>>> +44 (0) 117 928 7057 >>>>> >>>>> >>>> >>>> >>> >> >
