1. Size does not grow when using sparql quiries to insert using fuseki.

2.  Should I use the jena model api to insert 10s of thousands of triples
over a period of years with one putmodel? It seems the database only grows
by 200b with each putmodel call.  So I "flush" once the size issue seems
minimal.

To answer your question Andy, the same predicates are being used with
potentially different subjects and potentially different objects.
On Feb 13, 2015 11:56 AM, "Andy Seaborne" <[email protected]> wrote:

> Does the size stabilise?
> If not, do some files stabilise in size and other not?
>
> There are two places for growth:
>
> nodes - does the new data have new RDF terms in it?  Old terms are not
> deleted, just left round to be reused so if you are adding terms, the node
> table can grow.  (Terms are not reference counted - that would be very
> expensive for sucgh a small data item.)
>
> TDB (current version) does not properly reuse freed up space in indexes
> but should do within a transaction. put is delete-add and some space should
> be reused
>
> A proper fix to reuse across transactions may require a database format
> change but I haven't had time to workout the details though off the top of
> my head, much use should be doable by moving the free chain management onto
> the main database on a transaction as its single-active writer.  The code
> is currently too cautious about old generation readers which I now see it
> need not be.
>
>         Andy
>
> On 12/02/15 17:51, Trevor Donaldson wrote:
>
>> Any thoughts anyone? If I change my model every hour with new data or data
>> to replace. Lets say over a period of inserting years worth of triples
>> should I persist potentially millions of triples at one time using
>> putModel? Committing one time seems to be the only way to not mitigate
>> against the directory growing exponentially.
>>
>> On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]>
>> wrote:
>>
>>  Damian,
>>>
>>> I am using du -ksh ./* on the databases directory.
>>>
>>> I am getting
>>> 25M      ./test_store
>>>
>>> On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]>
>>> wrote:
>>>
>>>  On 12/02/15 13:49, Trevor Donaldson wrote:
>>>>
>>>>> On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson <[email protected]
>>>>>>
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> I am in the middle of updating our store from RDB to TDB. I have
>>>>>>>
>>>>>> noticed
>>>>
>>>>> a significant size increase in the amount of storage needed.
>>>>>>>
>>>>>> Currently RDB
>>>>
>>>>> is able to hold all the data I need (4 third party services and 4
>>>>>>>
>>>>>> years of
>>>>
>>>>> their data) and it equals ~ 12G. I started inserting data from 1 third
>>>>>>> party service, only 4 months of their data into TDB and the TDB
>>>>>>>
>>>>>> database
>>>>
>>>>> size has already reached 15G. Is this behavior expected?
>>>>>>>
>>>>>>
>>>> Hi Trevor,
>>>>
>>>> How are you measuring the space used? TDB files tend to be sparse, so
>>>> the disk use reported can be unreliable. Example from my system:
>>>>
>>>> 6.2M [...] 264M [...] GOSP.dat
>>>>
>>>> The first number (6.2M) is essentially the disk space taken, the second
>>>> (264M!) is the 'length' of the file.
>>>>
>>>> Damian
>>>>
>>>> --
>>>> Damian Steer
>>>> Senior Technical Researcher
>>>> Research IT
>>>> +44 (0) 117 928 7057
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to