On Tue, Mar 16, 2010 at 06:18:13PM +0300, Max Vlasov scratched on the wall:

> When I read the comments it was obvious that the algorithm uses very simple
> approach:
> Attach blank database, copy all data,  detach,  rename. Sure I might be
> wrong in details, but generally it looks like this.

  With one exception.  The temporary database is not renamed back to
  the original database name.  Rather, the data is copied back, page by
  page, from the temp database file back into the original database
  file.  The original database file is then truncated to the correct size.

> With this actions journal file(s) for the new database should not contain
> much data.

  That is correct.  The comments in the code support this.

> So my quess is one only need at most the size of the actual data
> from the source base plus very tiny journal file. But in fact (as everyone
> sees) it uses much more.

  Almost.  In addition to the original file, you need enough free space
  for the newly vacuumed database, as well as the journal files for
  both databases.

  As mentioned, the journal file for the temporary database is quite
  small.  However, the "copy back" operation creates a journal file on
  the original database.  This file is going to be similar in size to
  the original database, since every page in that file will be touched
  as part of the vacuum process.  The journal file is required, as any
  failure or error (including a process or power failure) will require
  the journal file to recreate the original pre-vacuum database file.
> So the question is what is so special about sqlite3RunVacuum that it needs
> more space than a simple emulation of its actions?

  The copy-back operation.

  Yes, it would be faster and easier to just copy the temp database
  back over to the original database name, but there are issues with
  First, I'm sure it is nearly impossible to do this as a
  guaranteed, atomic operation on most OSes and filesystems.  That's
  bad for data security.  
  Second, if this is meant to look like a cleanup operation on the
  original file, the original file (including any filesystem meta-data)
  should be kept in-tact.  Replacing the file by copying something on
  top of it won't do this.
  Last, file manipulations of this sort aren't supported by the current
  VFS interface, and (IMHO) with good reason.  The current vacuum
  process works regardless of the operating environment, including
  in-memory databases and any custom VFS modules.  Reaching outside
  of that context to things like filesystem and directory manipulations
  complicates this.  This makes modification of the current vacuum
  process unlikely.

  However, it might be nice to have a "VACUUM TO <file>" version of the
  command.  This would allow the user/application to vacuum a database
  to another non-temporary file and skip the copy-back operation.  This
  would save considerable space and some amount of time.  You could
  then do as you want with the new file.


Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
sqlite-users mailing list

Reply via email to