I think the gist of Ben's proposal is as follows
(please correct me if I am wrong):

   Writes do not modify the main database file until
   they are ready to commit - meaning that reader can
   continue to read from the file during lengthy
   transactions.  The journal contains modified pages,
   not the original pages, and is thus a roll-forward
   journal rather than a roll-back journal.

The main difficulty in this approach is locating
a particular page in the journal file when it needs
to be reloaded.  For example, suppose page X is read
from the main database, modified, then written into
the journal.  Due to cache pressure, that page is then
flushed from the in-memory cache.  Later on, we need
to read page X again.  How do we locate the position
in the journal where the current data for page X was
written so that we can reread it?

I think we can rule out a linear scan of the journal
file for efficiency reasons.

Do you record the offset of each page in the journal
with an in-memory hash table, perhaps?  If so, consider
what happens when you do a massive update to a large
(say 10GB) database.  If we touch most pages in the
database and assume (reasonably) that each entry in
the mapping table requires about 48 bytes, then the
resulting mapping table would require 500MB of RAM.
Granted, this is an extreme case - how often do you
modify every page of a 10GB database. But it might
happen, and we would like SQLite to be able to do it
without having to malloc for 0.5GB of memory to pull
it off.  (To be fair, SQLite already does some other
per-page allocations, but those would amount to less
than 50MB of RAM in the example above, an order of
magnitude less than a mapping table.)

Another idea is to store the mapping in a separate,
temporary, unjournaled database file.  I have not
pursued that approach because of efficiency concerns.
Perhaps the mapping could be kept in memory until it
got too big, then spilled to an external file.  That
way it would be fast for the common case but memory
efficient for the occasional massive update.  Either
way, the mapping table would be the responsibility
of the pager layer which (up until now) has known
nothing about the btree layer.  But with this approach,
the pager layer would need to call the btree layer
recursively.  Thinking about that is starting to
make my head spin.

Does anybody have any other ideas for doing this
mapping?  The constraints are that it needs to be
fast for the common case (where no more than a few
hundred or a few thousand pages change) but should
not require too much memory for the occasional
massive update.

--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to