Re: TDB size

Trevor Donaldson Mon, 16 Feb 2015 10:16:22 -0800

Hi think my question got lost. Is it correct to add millions of triples to
the model and then persist the model once using putmodel. I didn't want to
get a timeout or anything like that.
On Feb 13, 2015 5:11 PM, "Trevor Donaldson" <[email protected]> wrote:


> I am using Fuseki2. I thought it manages the transactions for me. Is this
> not the case? I was using datasetfactory to interact with fuseki.
> On Feb 13, 2015 12:10 PM, "Andy Seaborne" <[email protected]> wrote:
>
>> This may be related:
>>
>> https://issues.apache.org/jira/browse/JENA-804
>>
>> I say "may" because the exact patterns of use deep affect the outcome. In
>> JENA-804 it is across transaction boundaries, which your "putModel" isn't.
>>
>> (Are you really running without transactions?)
>>
>>         Andy
>>
>> On 13/02/15 16:56, Andy Seaborne wrote:
>>
>>> Does the size stabilise?
>>> If not, do some files stabilise in size and other not?
>>>
>>> There are two places for growth:
>>>
>>> nodes - does the new data have new RDF terms in it?  Old terms are not
>>> deleted, just left round to be reused so if you are adding terms, the
>>> node table can grow.  (Terms are not reference counted - that would be
>>> very expensive for sucgh a small data item.)
>>>
>>> TDB (current version) does not properly reuse freed up space in indexes
>>> but should do within a transaction. put is delete-add and some space
>>> should be reused
>>>
>>> A proper fix to reuse across transactions may require a database format
>>> change but I haven't had time to workout the details though off the top
>>> of my head, much use should be doable by moving the free chain
>>> management onto the main database on a transaction as its single-active
>>> writer.  The code is currently too cautious about old generation readers
>>> which I now see it need not be.
>>>
>>>      Andy
>>>
>>> On 12/02/15 17:51, Trevor Donaldson wrote:
>>>
>>>> Any thoughts anyone? If I change my model every hour with new data or
>>>> data
>>>> to replace. Lets say over a period of inserting years worth of triples
>>>> should I persist potentially millions of triples at one time using
>>>> putModel? Committing one time seems to be the only way to not mitigate
>>>> against the directory growing exponentially.
>>>>
>>>> On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]>
>>>> wrote:
>>>>
>>>>  Damian,
>>>>>
>>>>> I am using du -ksh ./* on the databases directory.
>>>>>
>>>>> I am getting
>>>>> 25M      ./test_store
>>>>>
>>>>> On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]>
>>>>> wrote:
>>>>>
>>>>>  On 12/02/15 13:49, Trevor Donaldson wrote:
>>>>>>
>>>>>>> On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson
>>>>>>>> <[email protected]
>>>>>>>>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hi,
>>>>>>>>>
>>>>>>>>> I am in the middle of updating our store from RDB to TDB. I have
>>>>>>>>>
>>>>>>>> noticed
>>>>>>
>>>>>>> a significant size increase in the amount of storage needed.
>>>>>>>>>
>>>>>>>> Currently RDB
>>>>>>
>>>>>>> is able to hold all the data I need (4 third party services and 4
>>>>>>>>>
>>>>>>>> years of
>>>>>>
>>>>>>> their data) and it equals ~ 12G. I started inserting data from 1
>>>>>>>>> third
>>>>>>>>> party service, only 4 months of their data into TDB and the TDB
>>>>>>>>>
>>>>>>>> database
>>>>>>
>>>>>>> size has already reached 15G. Is this behavior expected?
>>>>>>>>>
>>>>>>>>
>>>>>> Hi Trevor,
>>>>>>
>>>>>> How are you measuring the space used? TDB files tend to be sparse, so
>>>>>> the disk use reported can be unreliable. Example from my system:
>>>>>>
>>>>>> 6.2M [...] 264M [...] GOSP.dat
>>>>>>
>>>>>> The first number (6.2M) is essentially the disk space taken, the
>>>>>> second
>>>>>> (264M!) is the 'length' of the file.
>>>>>>
>>>>>> Damian
>>>>>>
>>>>>> --
>>>>>> Damian Steer
>>>>>> Senior Technical Researcher
>>>>>> Research IT
>>>>>> +44 (0) 117 928 7057
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: TDB size

Reply via email to