On 30/04/13 22:35, Stephen Allen wrote:
> P.S. Andy, please chime in where I've gotten something wrong (as I
> undoubtedly have, it's a pretty complex bit of the code)
The description is spot-on.
The only clarification is that the blocks in a transaction are always
written to the file-based journal, the commit record written (this is
the true pont at which a transaction commits in a single disk operation
of syn on the journal). The journal is replayed to update the database.
The in-memory blocks are not written directly even if theer is only
one writer around.
Long term, it would be good to move to a single-write transaction system
where the data is written index files as append operations, not writen
to the journal and then to the indexes in-place. It is a significant to
file formats but also the way teh B+Tree work because currently they
don't need to understand transactions, only that they are given a block
manager (which is transactional).
Andy