A couple of extra data points would be useful:
1/ How large does the journal get (in bytes)? It'll go up and down in
size so catching its size is tricky.
2/ Are there any readers about? They block writing the journal to the
main database in addition to the QueueBatchSize
Andy
On 09/08/16 13:34, Andy Seaborne wrote:
On 08/08/16 12:33, Dick Murray wrote:
Hello.
Looking for ideas and if anyone else has come across this...
I have a bulk load (same as the previous OOME question) which auto
commits
after 25k quads have been added then begins a new write transaction.
All of
the commits average 2 seconds but one takes 42 seconds. ~500K quads are
added with ~500MB on disk storage. I've changed the underlying storage
from
HHD to SSD, to USB MS and I still get the same symptoms.
Different files give different stalls, some have multiple stalls,
typically
around 40 seconds but some are 2 minutes. iotop is not showing anything
"odd" and the GC isn't stressing. I can repeat this with a new TDB and a
25M quad TDB.
Is the TDB having to copy write new "blocks" to balance it's storage at
some point? Whilst it will stall at some point the point is not always
the
same.
Jena 3.1, Ubuntu 16.04, 8 cores 16GB RAM, JVM Xmx 4GB G1GC.
Log below shows consistent ~2 second commits bar one.
TIA Dick.
Hi there,
The burstiness might be due to the commit batching though interactions
with the OS file system is also possible.
Try setting
TransactionManger.QueueBatchSize
to 0, 2, and a few other small integers (the default is 10).
If you could try that, it would be more data as to what is happening.
This is to amalgamate small commits - it would be better to factor in
the size of commits but it doesn't (the size of the journal is easy to
determime so a simple threshold there could work).
Have you had a moment to try TDB2? It will behave differently here -
the updates to the database happen as the transaction proceeds so they
happen once and have OS-level write buffering going on, rather than
happening exactly when told to. And they only write once, not once to
the journal and once in a random access pattern to the main DB which is
also potentially nasty.
The only issue with TDB2 at the moment is that the database grows. It
has all generations of the database available for all time. It needs a
process to reclaim old space (a GC problem) although access to temporal
versions could be considered a advantage as well.
Andy