On Wed, Sep 01, 2010 at 11:41:00AM -0700, Taras Glek scratched on the wall:
>   Hi,
> Currently VACUUM takes care of sqlite-level fragmentation. Unfortunately 
> it does little for fs-level fragmentation since the same file is being 
> reused. It would be really beneficial for Mozilla performance if we 
> could get a vacuum/hotcopy mode.
> 
> As I understand it, currently vacuum works by copying out the data to a 
> new file, then copying it back in and truncating the original db file.
> 
> It would be really nice to instead do something like:
> 
>     copy the data to a new file

  You're good up to here.  This is basically the "VACUUM TO <file>"
  idea I threw out some months ago, and Max referenced.

>     swap the underlying filehandles to point at new file
> 
>     remove old file
> 
>     rename the new file to old name.

  ...and watch every other database connection that was accessing the
  database file get very, very confused.

  You're also breaking transactional integrity.  You need the option of
  backing-out of the operation right up until the moment it works, and
  this procedure can't do that.  For example, if you lose power right
  after "remove old file", your database is no longer there.

  ...And before someone says something like "but the application can just
  open the other one", I agree-- that it is an application problem.

> Currently we can copy the data to a new file, but it is a pretty 
> invasive change to swap all of the current sqlite connections to the new 
> file.

  You'd have to do that anyways, for all connections other than the one
  that issued the VACUUM command.  Coordinating FD swaps across multiple
  connections in the same process would be confusing enough-- there 
  would be no possible way to do it across database connections in
  multiple processes.

  And I really doubt you could do any of this in a transactional-safe way.
  It's application level stuff.

> Things like prepared statements, etc need to be updated for every 
> single db consumer. Thus it would make sense to have this feature on the 
> sqlite side.

  I'm not sure I agree that it makes sense.  As soon as you start to
  talk about filesystem operations (such as rename) you're way outside
  of the scope of SQLite.  Again, doing this kind of thing in a
  transactional-safe way is nearly impossible.



  Even if we stick with the existing procedure, I'm wondering if it is
  really that bad (that's an honest question; I don't have data one way
  or the other).  In the last step of the VACUUM process, the data is
  copied back to the original file, page by page.  If I understand it
  correctly, this is more or less a blanket write, blasting pages from
  the temp DB to the original DB, front to back (intermixed with copies
  to the journal file).  There is no reason to assume the filesystem
  will over-write the existing allocations, rather than just create new
  ones, especially if the pages are shuffled in groups... although if
  the goal is to defragment the file, we're kind of depending on that,
  which is unwise.

  Maybe there would be some way to pre-populate the rollback journal
  with the full contents of the original database.  Then the file could
  be truncated before the copy-back procedure.  That would make it
  clear to the OS that it is free to allocate whatever file blocks it
  wants, hopefully in better patterns.  The copy back could also be
  done in very large chunks.

  I'm not totally sure, but I'm guessing that would require several
  modifications and enhancements to the rollback and B-tree code.
  You would need the ability to force a full rollback copy and the
  ability do a blind, non-journaled copy, but it seems like it could
  be done without modifications to the existing VACUUM semantics, all
  while keeping the transactional integrity of the VACUUM command.



  On a personal level, I don't think it is worth it.  In the end, you're
  still hoping the OS and filesystem will make smart choices about block
  allocations.  An application shouldn't need to be spending a lot
  of time worrying about this level of filesystem performance.  No
  matter what, you're just hinting and setting up conditions that
  should allow the filesystem driver to do something smart and fast.
  It may, or it may not, actually do so.

    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to