On Fri, Sep 12, 2014 at 7:21 PM, Richard Hipp <d...@sqlite.org> wrote: > On Fri, Sep 12, 2014 at 8:07 PM, Simon Slavin <slav...@bigfraud.org> wrote: >> one thing that annoys me about SQLite is that it needs to make a >> journal file which isn't part of the database file. Why ? Why can't it >> just write the journal to the database file it already has open ? This >> would reduce the problems where the OS prevents an application from >> creating a new file because of permissions or sandboxing. >> > > Where in the database does the journal information get stored? At the > end? What happens then if the transaction is an INSERT and the size of the > content has to grow? Does that leave a big hole in the middle of the file > when the journal is removed? During recovery after a crash, where does the > recovery process go to look for the journal information? If the journal > is at some arbitrary point in the file, where does it look. Note that we > cannot write the journal location in the file header because the header > cannot be (safely) changed without first journaling it but we cannot > journal the header without first writing the journal location into the > header.
One answer is to use a COW patter, with two or more ubberblocks that store the previous and current/next root of the DB; each ubberblock would also reference a free space map. When you write you just allocate currently unused space from the previous ubberblock's free space map -or grow the file if necessary-, then when all writes for a transaction reach stable storage you write a new ubberblock, with a new free space map. That's a fairly standard COW model (e.g., ZFS does it). I believe you can find prior art going back a long time. E.g., the 4.4BSD LFS was a design sort of like this. ZFS is quite similar to the 4.4BSD LFS, in fact, mostly differing in that a handful of very obvious ways: it doesn't use fixed-sized log chunks, doesn't insist on writing contiguous blocks, and doesn't need a cleaner (the trade-off is framentation); other differences (snapshots, clones, ...) are just icing on the cake that come from reifying ubberblocks. This way you have no log as such because the file is written in a log-like manner anyways: the file *is* the log! Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users