I think the gist of Ben's proposal is as follows (please correct me if I am wrong):
Writes do not modify the main database file until they are ready to commit - meaning that reader can continue to read from the file during lengthy transactions. The journal contains modified pages, not the original pages, and is thus a roll-forward journal rather than a roll-back journal.
The main difficulty in this approach is locating a particular page in the journal file when it needs to be reloaded. For example, suppose page X is read from the main database, modified, then written into the journal. Due to cache pressure, that page is then flushed from the in-memory cache. Later on, we need to read page X again. How do we locate the position in the journal where the current data for page X was written so that we can reread it?
I think we can rule out a linear scan of the journal file for efficiency reasons.
Do you record the offset of each page in the journal with an in-memory hash table, perhaps? If so, consider what happens when you do a massive update to a large (say 10GB) database. If we touch most pages in the database and assume (reasonably) that each entry in the mapping table requires about 48 bytes, then the resulting mapping table would require 500MB of RAM. Granted, this is an extreme case - how often do you modify every page of a 10GB database. But it might happen, and we would like SQLite to be able to do it without having to malloc for 0.5GB of memory to pull it off. (To be fair, SQLite already does some other per-page allocations, but those would amount to less than 50MB of RAM in the example above, an order of magnitude less than a mapping table.)
Another idea is to store the mapping in a separate, temporary, unjournaled database file. I have not pursued that approach because of efficiency concerns. Perhaps the mapping could be kept in memory until it got too big, then spilled to an external file. That way it would be fast for the common case but memory efficient for the occasional massive update. Either way, the mapping table would be the responsibility of the pager layer which (up until now) has known nothing about the btree layer. But with this approach, the pager layer would need to call the btree layer recursively. Thinking about that is starting to make my head spin.
Does anybody have any other ideas for doing this mapping? The constraints are that it needs to be fast for the common case (where no more than a few hundred or a few thousand pages change) but should not require too much memory for the occasional massive update.
-- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]