> It needs a process to reclaim old space (a GC problem) although access to
> temporal versions could be considered a advantage as well.
The single most important use case I have in hand would benefit enormously from
this kind of persistence (although really on a transaction granularity). So at
least some people out here would consider it an advantage! {grin}
---
A. Soroka
The University of Virginia Library
> On Aug 9, 2016, at 8:34 AM, Andy Seaborne <[email protected]> wrote:
>
> On 08/08/16 12:33, Dick Murray wrote:
>> Hello.
>>
>> Looking for ideas and if anyone else has come across this...
>>
>> I have a bulk load (same as the previous OOME question) which auto commits
>> after 25k quads have been added then begins a new write transaction. All of
>> the commits average 2 seconds but one takes 42 seconds. ~500K quads are
>> added with ~500MB on disk storage. I've changed the underlying storage from
>> HHD to SSD, to USB MS and I still get the same symptoms.
>>
>> Different files give different stalls, some have multiple stalls, typically
>> around 40 seconds but some are 2 minutes. iotop is not showing anything
>> "odd" and the GC isn't stressing. I can repeat this with a new TDB and a
>> 25M quad TDB.
>>
>> Is the TDB having to copy write new "blocks" to balance it's storage at
>> some point? Whilst it will stall at some point the point is not always the
>> same.
>>
>> Jena 3.1, Ubuntu 16.04, 8 cores 16GB RAM, JVM Xmx 4GB G1GC.
>>
>> Log below shows consistent ~2 second commits bar one.
>>
>> TIA Dick.
>
> Hi there,
>
> The burstiness might be due to the commit batching though interactions with
> the OS file system is also possible.
>
> Try setting
> TransactionManger.QueueBatchSize
> to 0, 2, and a few other small integers (the default is 10).
>
> If you could try that, it would be more data as to what is happening.
>
> This is to amalgamate small commits - it would be better to factor in the
> size of commits but it doesn't (the size of the journal is easy to determime
> so a simple threshold there could work).
>
>
> Have you had a moment to try TDB2? It will behave differently here - the
> updates to the database happen as the transaction proceeds so they happen
> once and have OS-level write buffering going on, rather than happening
> exactly when told to. And they only write once, not once to the journal and
> once in a random access pattern to the main DB which is also potentially
> nasty.
>
> The only issue with TDB2 at the moment is that the database grows. It has all
> generations of the database available for all time. It needs a process to
> reclaim old space (a GC problem) although access to temporal versions could
> be considered a advantage as well.
>
> Andy
>