Re: [sqlite] presentation about ordering and atomicity of filesystems

Nico Williams Mon, 15 Sep 2014 14:55:00 -0700

On Fri, Sep 12, 2014 at 7:21 PM, Richard Hipp <d...@sqlite.org> wrote:
> On Fri, Sep 12, 2014 at 8:07 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>>   one thing that annoys me about SQLite is that it needs to make a
>> journal file which isn't part of the database file.  Why ?  Why can't it
>> just write the journal to the database file it already has open ?  This
>> would reduce the problems where the OS prevents an application from
>> creating a new file because of permissions or sandboxing.
>>
>
> Where in the database does the journal information get stored?  At the
> end?  What happens then if the transaction is an INSERT and the size of the
> content has to grow?  Does that leave a big hole in the middle of the file
> when the journal is removed?  During recovery after a crash, where does the
> recovery process go to look for the journal information?   If the journal
> is at some arbitrary point in the file, where does it look.  Note that we
> cannot write the journal location in the file header because the header
> cannot be (safely) changed without first journaling it but we cannot
> journal the header without first writing the journal location into the
> header.


One answer is to use a COW patter, with two or more ubberblocks that
store the previous and current/next root of the DB; each ubberblock
would also reference a free space map.  When you write you just
allocate currently unused space from the previous ubberblock's free
space map -or grow the file if necessary-, then when all writes for a
transaction reach stable storage you write a new ubberblock, with a
new free space map.

That's a fairly standard COW model (e.g., ZFS does it).  I believe you
can find prior art going back a long time.  E.g., the 4.4BSD LFS was a
design sort of like this.  ZFS is quite similar to the 4.4BSD LFS, in
fact, mostly differing in that a handful of very obvious ways: it
doesn't use fixed-sized log chunks, doesn't insist on writing
contiguous blocks, and doesn't need a cleaner (the trade-off is
framentation); other differences (snapshots, clones, ...) are just
icing on the cake that come from reifying ubberblocks.

This way you have no log as such because the file is written in a
log-like manner anyways: the file *is* the log!

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to