Re: TDB details - write transactions.

Jean-Marc Vanel Fri, 03 Nov 2017 05:25:17 -0700

The answer is in the docs  linked to Jena 3.5 announce:
http://jena.apache.org/documentation/tdb2/tdb2_admin.html#compaction


2017-11-03 13:08 GMT+01:00 zPlus <[email protected]>:

> Does Jena have a way to compact TDB2 databases, maybe with some CLI
> tool to run manually? Or do TDB2 databases just grow indefinitely?
>
> ----- Original Message -----
> From: [email protected]
> To:<[email protected]>
> Cc:
> Sent:Fri, 3 Nov 2017 11:47:01 +0000
> Subject:TDB details - write transactions.
>
>  This is a long message that explains the changes in TDB2 around the
> way
>  write transactions work.
>
>  TDB2 transactions are completely different to TDB1 transactions. The
>  transaction coordinator is general purpose and works on a set of
>  transaction components, each index is a separate component. In TDB1,
>  the transaction manager works on the TDB1 database.
>
>  ** TDB1
>
>  In TDB1, a write transaction creates a number of changes to be made
> to
>  the database. These are stored in the journal. They consist of
>  replacement blocks (i.e overwrite) and new blocks for the indexes.
> All
>  later transactions (after the W commits) use the in-memory cache of
> the
>  journal and the main database.
>
>  The Node changes are written ahead to the node storage which is
>  append-only so don't need recording in the journal. They are
>  inaccessible to earlier transactions as they are unreferenced via the
>
>  node table indexes.
>
>  The journal needs to be written to the main database. TBD1 is
>  update-in-place. TDB1 is also lock-free. Writing to the main index
>  requires that there are no other transactions using the database. If
>  there are other active transactions, the work is not done but queued.
>
>  This queue is checked whenever a transaction, read or write,
> finishes.
>  If at that point, the transaction is the only one active, TDB1 writes
>
>  the journal to the main database, and clears the journal. That
>  transaction can be a reader - work done to write-back is incurred by
> the
>  reader.
>
>  This is the delayed replay queue. (Replay because it's a log-ahead
>  system and writing back the journal is replaying changes.) Write
>  transaction changes are always delayed for efficiency to amortize the
>
>  overhead of the costs of write-back.
>
>  There will be be layers : writers running with more changes to the
>  database still in the delayed replay queue yet these may be in=-use
> by
>  readers. A new layer is added for the new writer.
>
>  Under load, the delayed replay queue grows. There isn't a moment to
>  write back the changes to the main database.
>
>  There are couple of mechanisms to catch this - if the queue is over a
>
>  certain length, or the total size of the journal is over a threshold,
>
>  TDB1 holds back transactions as they begin, waits for the current
> ones
>  to finish, then writes the queue.
>
>  ** TDB2
>
>  In TDB2, data structures are "append-only" in that once written and
>  committed they are never changed. New data is written to new blocks,
>  and the root of the tree changes (in the case of the B+Trees -
>  copy-on-write, also call "persistent datastructures", where
> 'persistent'
>  is not related to external storage - different branch of computer
>  science using the same word with a different meaning) or the visible
>  length of the file changes (append-only .dat files).
>
>  The only use of the journal is to transactionally manage small
> control
>  data such as the block id of the new tree root. A transaction is less
>
>  than a disk block.
>
>  Compared to TDB1, TDB2:
>
>  + writers change to the database as the writer proceeds.
>
>  Write efficiency: They go directly to the databases,so only one
> write,
>  not two, once to journal, once to database, and they get
> write-buffered
>  by the operating system with all the usual efficiency the OS can
> provide
>  in disk scheduling.
>
>  This improves bulk loading to the point where tdb2.tdbloader isn't
> doing
>  low level file manipulation but this a simple write to database. If
> low
>  level manipulation is an an improvement, it can fit there.
>
>  No variable size heap cache: Large inserts and deletes go to a live
>  database can be any size. There is no caching of the old-style
> journal
>  that depends on the size of the changes. No more running out of heap
>  with a large transaction.
>
>  + Readers only read
>
>  A read transaction does not need to do anything about the delayed
> replay
>  queue. Readers just read the database, never write.
>
>  Predictable read performance.
>
>  Of course, there is a downside.
>
>  The database grows faster and needs compaction when the
>
>  People will start asking why the database is so large. They ask about
>
>  TDB1 and TDB2 databases will be bigger.
>
>  Maintaining compact databases while the system runs has costs,
> depending
>  on how it is done. e.g. it's slower - with some kind of incremental
>  maintenance overhead (disk/SDD I/O); transaction performance less
>  predicable; (very) complicated locking schemes, including system
> aborts
>  when the DB detects a deadlock (and bugs because its complicated);
> large
>  writes impact concurrent readers much more.
>
>  TDB1 and TDB2 don't system-abort due to deadlock.
>
>  Other: TDB2 transaction coordinator is general, not TDB2 specific so
> it
>  will be able to include text indexes in the future.
>
>  ** TDB3
>
>  An experiment, not part of Jena. Currently, it's working and not bad.
>
>  Bulk loads are slower at 100m but the promise is that large loads
>  (billion triple range) are better. As an experiment, it may not be a
>  good idea - and will make slow progress. There are no releases and
> none
>  planned.
>
>  TDB3 uses RocksDB -- http://rocksdb.org/.
>
>  That means using SSTables, not CoW B+Trees. At the moment, one single
>
>  SSTable for everything because the storage data can be partitioned so
> no
>  need to have several RocksDB databases.
>
>  Still needs compaction. That's a innate feature of SSTable and LSM
> (Log
>  Structured Merge) systems.
>
>  It also based on work (RocksDB PR#1298) by Adam Retter to expose the
>  RocksDB transaction system to java.
>
>  https://github.com/facebook/rocksdb/wiki/A-Tutorial-of-
> RocksDB-SST-formats
>
>  Andy
>
>


-- 
Jean-Marc Vanel
http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject
<http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me>
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: TDB details - write transactions.

Reply via email to