Hi think my question got lost. Is it correct to add millions of triples to the model and then persist the model once using putmodel. I didn't want to get a timeout or anything like that. On Feb 13, 2015 5:11 PM, "Trevor Donaldson" <[email protected]> wrote:
> I am using Fuseki2. I thought it manages the transactions for me. Is this > not the case? I was using datasetfactory to interact with fuseki. > On Feb 13, 2015 12:10 PM, "Andy Seaborne" <[email protected]> wrote: > >> This may be related: >> >> https://issues.apache.org/jira/browse/JENA-804 >> >> I say "may" because the exact patterns of use deep affect the outcome. In >> JENA-804 it is across transaction boundaries, which your "putModel" isn't. >> >> (Are you really running without transactions?) >> >> Andy >> >> On 13/02/15 16:56, Andy Seaborne wrote: >> >>> Does the size stabilise? >>> If not, do some files stabilise in size and other not? >>> >>> There are two places for growth: >>> >>> nodes - does the new data have new RDF terms in it? Old terms are not >>> deleted, just left round to be reused so if you are adding terms, the >>> node table can grow. (Terms are not reference counted - that would be >>> very expensive for sucgh a small data item.) >>> >>> TDB (current version) does not properly reuse freed up space in indexes >>> but should do within a transaction. put is delete-add and some space >>> should be reused >>> >>> A proper fix to reuse across transactions may require a database format >>> change but I haven't had time to workout the details though off the top >>> of my head, much use should be doable by moving the free chain >>> management onto the main database on a transaction as its single-active >>> writer. The code is currently too cautious about old generation readers >>> which I now see it need not be. >>> >>> Andy >>> >>> On 12/02/15 17:51, Trevor Donaldson wrote: >>> >>>> Any thoughts anyone? If I change my model every hour with new data or >>>> data >>>> to replace. Lets say over a period of inserting years worth of triples >>>> should I persist potentially millions of triples at one time using >>>> putModel? Committing one time seems to be the only way to not mitigate >>>> against the directory growing exponentially. >>>> >>>> On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]> >>>> wrote: >>>> >>>> Damian, >>>>> >>>>> I am using du -ksh ./* on the databases directory. >>>>> >>>>> I am getting >>>>> 25M ./test_store >>>>> >>>>> On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]> >>>>> wrote: >>>>> >>>>> On 12/02/15 13:49, Trevor Donaldson wrote: >>>>>> >>>>>>> On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson >>>>>>>> <[email protected] >>>>>>>> >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am in the middle of updating our store from RDB to TDB. I have >>>>>>>>> >>>>>>>> noticed >>>>>> >>>>>>> a significant size increase in the amount of storage needed. >>>>>>>>> >>>>>>>> Currently RDB >>>>>> >>>>>>> is able to hold all the data I need (4 third party services and 4 >>>>>>>>> >>>>>>>> years of >>>>>> >>>>>>> their data) and it equals ~ 12G. I started inserting data from 1 >>>>>>>>> third >>>>>>>>> party service, only 4 months of their data into TDB and the TDB >>>>>>>>> >>>>>>>> database >>>>>> >>>>>>> size has already reached 15G. Is this behavior expected? >>>>>>>>> >>>>>>>> >>>>>> Hi Trevor, >>>>>> >>>>>> How are you measuring the space used? TDB files tend to be sparse, so >>>>>> the disk use reported can be unreliable. Example from my system: >>>>>> >>>>>> 6.2M [...] 264M [...] GOSP.dat >>>>>> >>>>>> The first number (6.2M) is essentially the disk space taken, the >>>>>> second >>>>>> (264M!) is the 'length' of the file. >>>>>> >>>>>> Damian >>>>>> >>>>>> -- >>>>>> Damian Steer >>>>>> Senior Technical Researcher >>>>>> Research IT >>>>>> +44 (0) 117 928 7057 >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>
