Re: [sqlite] Feature request: copying vacuum

2010-09-10 Thread Jim Wilcoxson
I'd also be interested in a VACUUM TO feature, more for performance aspect
than the fragmentation, although that's a plus too.  The backup program I'm
working on packs many files into archives, which are SQLite databases.  I
have run some vacuum tests here; the filesystem cache was purged before each
test:

cp 1GB archive: 44 seconds (for baseline comparison)

Vacuum 1GB archive w/sqlite3:
real   2m15.421s
user   0m8.776s
sys0m34.205s

Dump and reload 1GB archive:
$ time sqlite3 arc.0.0.rm ".dump"|sqlite3 arc.0.0.new

real0m52.174s
user0m23.750s
sys 0m9.086s

Creating a new archive is more than twice as fast as doing a vacuum on an
existing archive, and nearly as fast as a straight cp.  While an extra
minute and a half for SQLite vacuum may not seem like a big deal, a backup
retention operation could affect many archives.  So 30 archives would
require an extra 45 minutes to vacuum.  I've had to add code to the backup
program to avoid doing vacuums whenever possible because they're slow.

I would suggest the VACUUM TO feature takes a read lock on the database and
creates a new, vacuumed database, but leaves it up to the application
whether to replace the original or not.  If the application decides to do a
rename over the original database, then yes, this could goof up other
connections, but that could happen anyway if an app decided to delete a
database: the other connections would keep on using the database, even
though it is unlinked from the filesystem.  For my single-user application,
VACUUM TO would be very useful.  In multi-connection cases, the app would
have to provide some kind of mechanism outside SQLite to coordinate things,
or just stick to the regular vacuum operation.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Thu, Sep 9, 2010 at 7:19 PM, Taras Glek  wrote:

>  On 09/01/2010 11:41 AM, Taras Glek wrote:
> > Hi,
> > Currently VACUUM takes care of sqlite-level fragmentation.
> > Unfortunately it does little for fs-level fragmentation since the same
> > file is being reused. It would be really beneficial for Mozilla
> > performance if we could get a vacuum/hotcopy mode.
> >
> > As I understand it, currently vacuum works by copying out the data to
> > a new file, then copying it back in and truncating the original db file.
> >
> > It would be really nice to instead do something like:
> >
> > copy the data to a new file
> >
> > swap the underlying filehandles to point at new file
> >
> > remove old file
> >
> > rename the new file to old name.
> >
> > This yields two benefits:
> >
> > A less fragmented db
> >
> > ~50% vacuum speedup since the data is only copied once
> >
> > Currently we can copy the data to a new file, but it is a pretty
> > invasive change to swap all of the current sqlite connections to the
> > new file. Things like prepared statements, etc need to be updated for
> > every single db consumer. Thus it would make sense to have this
> > feature on the sqlite side.
> >
> > Is this reasonable request?
> >
> > Thanks,
> > Taras
> I sent this last week, wanted to address some issues that were raised
> about my proposal.
>
> Sorry for the weird(and late!) reply, I'm subscribed via a digest(would
> be so nice to have a gmane mirror), so I can't reply directly.
>
> Richard Hipp:
> > If other connections have the database file open while it is being
> vacuumed,
> > then on unix the other connections will still be left open on the old
> > unlinked version of the database file and will never see the new content.
> > And on windows, the file swapping and renaming simply is not allowed
> while
> > other connections have the database files open.
> > The work around is to modify SQLite so that it is constantly closing and
> > reopening the database files.  But that adds rather large overheads that
> > seem likely to be much greater than any savings seen through a reduction
> in
> > disk FS fragmentation.
> >
> >
> I agree with both points. A copying VACUUM should specify that it does
> not support the multi-connection usecase. It fail abort if it detects
> another db connection( or have this mentioned in documentation if this
> detection isn't possible).
> The wins from avoiding disk fragmentation+copying less data are
> significant. Punting seems like a reasonable alternative to forcing
> sqlite to constantly close/open the db.
>
>
> Jay A. Kreibich:
>
> >You're also breaking transactional integrity.  You need the option of
> >backing-out of the operation right up until the moment it works, and
> >this procedure can't do that.  For example, if you lose power right
> >after "remove old file", your database is no longer there.
> >
> You are right my original sequence of events was flawed, it should be:
>
>copy the data to a new file
>
>swap the underlying filehandles to point at new file
>
>rename the new file to old name(this also removes old file).
>

Re: [sqlite] Feature request: copying vacuum

2010-09-09 Thread Taras Glek
  On 09/01/2010 11:41 AM, Taras Glek wrote:
> Hi,
> Currently VACUUM takes care of sqlite-level fragmentation. 
> Unfortunately it does little for fs-level fragmentation since the same 
> file is being reused. It would be really beneficial for Mozilla 
> performance if we could get a vacuum/hotcopy mode.
>
> As I understand it, currently vacuum works by copying out the data to 
> a new file, then copying it back in and truncating the original db file.
>
> It would be really nice to instead do something like:
>
> copy the data to a new file
>
> swap the underlying filehandles to point at new file
>
> remove old file
>
> rename the new file to old name.
>
> This yields two benefits:
>
> A less fragmented db
>
> ~50% vacuum speedup since the data is only copied once
>
> Currently we can copy the data to a new file, but it is a pretty 
> invasive change to swap all of the current sqlite connections to the 
> new file. Things like prepared statements, etc need to be updated for 
> every single db consumer. Thus it would make sense to have this 
> feature on the sqlite side.
>
> Is this reasonable request?
>
> Thanks,
> Taras
I sent this last week, wanted to address some issues that were raised 
about my proposal.

Sorry for the weird(and late!) reply, I'm subscribed via a digest(would 
be so nice to have a gmane mirror), so I can't reply directly.

Richard Hipp:
> If other connections have the database file open while it is being vacuumed,
> then on unix the other connections will still be left open on the old
> unlinked version of the database file and will never see the new content.
> And on windows, the file swapping and renaming simply is not allowed while
> other connections have the database files open.
> The work around is to modify SQLite so that it is constantly closing and
> reopening the database files.  But that adds rather large overheads that
> seem likely to be much greater than any savings seen through a reduction in
> disk FS fragmentation.
>
>
I agree with both points. A copying VACUUM should specify that it does 
not support the multi-connection usecase. It fail abort if it detects 
another db connection( or have this mentioned in documentation if this 
detection isn't possible).
The wins from avoiding disk fragmentation+copying less data are 
significant. Punting seems like a reasonable alternative to forcing 
sqlite to constantly close/open the db.


Jay A. Kreibich:

>You're also breaking transactional integrity.  You need the option of
>backing-out of the operation right up until the moment it works, and
>this procedure can't do that.  For example, if you lose power right
>after "remove old file", your database is no longer there.
>
You are right my original sequence of events was flawed, it should be:

copy the data to a new file

swap the underlying filehandles to point at new file

rename the new file to old name(this also removes old file).

This takes care of the "remove old file" problem.

>You'd have to do that anyways, for all connections other than the one
>that issued the VACUUM command.  Coordinating FD swaps across multiple
>connections in the same process would be confusing enough-- there
>would be no possible way to do it across database connections in
>multiple processes.
As I said above, I think restricting this to single-connection usecases 
is reasonable.

>Maybe there would be some way to pre-populate the rollback journal
>with the full contents of the original database.  Then the file could
>be truncated before the copy-back procedure.  That would make it
>clear to the OS that it is free to allocate whatever file blocks it
>wants, hopefully in better patterns.  The copy back could also be
>done in very large chunks.
That sounds like a reasonable alternative to me. It does more IO than 
copy+rename, but I'm mostly interested in avoiding fragmentation here 
and this solve that issue.

>On a personal level, I don't think it is worth it.  In the end, you're
>still hoping the OS and filesystem will make smart choices about block
>allocations.  An application shouldn't need to be spending a lot
>of time worrying about this level of filesystem performance.  No
>matter what, you're just hinting and setting up conditions that
>should allow the filesystem driver to do something smart and fast.
>It may, or it may not, actually do so.
I appreciate your feeling on the matter. But there are APIs(ie 
fallocate) that are specifically designed to minimize fragmentation. The 
underlying filesystem can not be expected to guess every possible 
workload correctly.

Scott Hess:
> I agree with Jay - while it is tempting to have SQLite bite off
> optimizing this kind of thing, it's pretty far out of scope.  Next
> we'll be talking about running SQLite on raw partitions!  The recent
> addition of SQLITE_FCNTL_CHUNK_SIZE is probably about all the hinting
> you can 

Re: [sqlite] Feature request: copying vacuum

2010-09-02 Thread Jay A. Kreibich
On Thu, Sep 02, 2010 at 05:42:17AM +0200, Ben Danper scratched on the wall:
> 
> On Wed, Sep 1, 2010 at 12:46 PM, Jay A. Kreibich  wrote:
> > There is no reason to assume the filesystem
> > will over-write the existing allocations, rather than just create new
> > ones, especially if the pages are shuffled in groups...
> 
> Actually there's no reason to do the opposite,
> as it would fragment files that were contiguous in the first place.

  If a filesystem is asked to write out a buffer that is large enough
  to consist of several allocation blocks, it makes sense to write that
  buffer out as a contiguous set of blocks and reassign the allocations
  if the existing blocks are already fragmented.  That wouldn't change
  a file that is already reasonable contiguous, but would tend to bring
  a file back together.  Such a behavior would almost never fully
  defragment the file, but it would help... assuming at least a few
  writes are large, logically contiguous writes.  I'm guessing SQLite
  doesn't tend to do that.
  
  There are also flash-optimized filesystems that move just about every
  write in an attempt to even out the flash write cycles.

> > Maybe there would be some way to pre-populate the rollback journal
> > with the full contents of the original database. Then the file could
> > be truncated before the copy-back procedure. That would make it
> > clear to the OS that it is free to allocate whatever file blocks it
> > wants, hopefully in better patterns. The copy back could also be
> > done in very large chunks.
> 
> This is a fantastic idea! Not only truncate - since you know the new
> size, you could also set the size beforehand before you start
> copying the pages (similar to SQLITE_FCNTL_CHUNK_SIZE). Most
> filesystems will try very hard to place it contiguously.

  Good idea.

> A more involved idea that would improve efficiency (two copies instead
> of three, and twice the database size instead of three times) would
> be to use the journal file directly as the new database

  That does sound a lot more involved.  You would more or less need
  to rewrite the whole pager to deal with two different file formats.
  The VACUUM copy process is not write-only, it logically rebuilds
  the database from the ground up using SQL commands.  That means it
  does stuff like issue "CREATE INDEX..." commands on fresh tables.
  You would need full read/write/update support for the "journal pager."

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-02 Thread Shawn Wilsher
On Wed, Sep 1, 2010 at 12:46 PM, Jay A. Kreibich  wrote:
>  On a personal level, I don't think it is worth it.  In the end, you're
>  still hoping the OS and filesystem will make smart choices about block
>  allocations.  An application shouldn't need to be spending a lot
>  of time worrying about this level of filesystem performance.  No
>  matter what, you're just hinting and setting up conditions that
>  should allow the filesystem driver to do something smart and fast.
>  It may, or it may not, actually do so.
Right, but giving it more hints means it's more likely to do the smart
and fast thing.  For what it's worth, Taras is working on improving
the performance of SQLite in Firefox.  He has data that shows that
this can dramatically reduce the number of page faults (Taras, please
correct me if I'm misrepresenting things) when loading the database.
These leads to faster startup times of Firefox.

Cheers,

Shawn
Mozilla Developer
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Scott Hess
On Wed, Sep 1, 2010 at 10:09 PM, Max Vlasov  wrote:
>> I agree with Jay - while it is tempting to have SQLite bite off
>> optimizing this kind of thing, it's pretty far out of scope.  Next
>> we'll be talking about running SQLite on raw partitions!
>>
>
> Scott, thought about it, ironically sqlite vfs is flexible enough to
> implement raw partition sqlite in days if one _really_ needs it :) I wonder
> whether someone already did it at least for one OS.

That's not really ironic, it's the entire point!  But where something
like Oracle would use a raw partition for performance reasons, you'd
be more likely to do it in SQLite because you don't want to bother
dedicating ROM to a R/W filesystem module when the filesystem will
only contain a SQLite database and journal.

[Probably you could just start the database at block 0 counting
upwards, and start the journal at block N counting downwards, and
throw DISK_FULL if they ever meet.]

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Max Vlasov
> I agree with Jay - while it is tempting to have SQLite bite off
> optimizing this kind of thing, it's pretty far out of scope.  Next
> we'll be talking about running SQLite on raw partitions!
>

Scott, thought about it, ironically sqlite vfs is flexible enough to
implement raw partition sqlite in days if one _really_ needs it :) I wonder
whether someone already did it at least for one OS.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Simon Slavin

On 2 Sep 2010, at 4:42am, Ben Danper wrote:

> On Wed, Sep 1, 2010 at 12:46 PM, Jay A. Kreibich  wrote:
>> Maybe there would be some way to pre-populate the rollback journal
>> with the full contents of the original database. Then the file could
>> be truncated before the copy-back procedure. That would make it
>> clear to the OS that it is free to allocate whatever file blocks it
>> wants, hopefully in better patterns. The copy back could also be
>> done in very large chunks.
> 
> This is a fantastic idea! Not only truncate - since you know the new size, 
> you could also set the size beforehand before you start copying the pages 
> (similar to SQLITE_FCNTL_CHUNK_SIZE). Most filesystems will try very hard to 
> place it contiguously.

What does SQLite actually do if you take a table with a primary key and do this 
on it

INSERT OR REPLACE INTO myTable SELECT * FROM myTable

?  I assume that at some level it realises that no writing actually needs to be 
done.  But could you defeat that and have that statement just force a rewrite 
of the entire table ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Ben Danper

On Wed, Sep 1, 2010 at 12:46 PM, Jay A. Kreibich  wrote:
> There is no reason to assume the filesystem
> will over-write the existing allocations, rather than just create new
> ones, especially if the pages are shuffled in groups...

Actually there's no reason to do the opposite, as it would fragment files that 
were contiguous in the first place. The only filesystems that do this that I'm 
aware of are those that do copy-on-write (and some move the old page, not the 
new one). It certainly doesn't help in EXT2/3/4, or NTFS on Windows.

> Maybe there would be some way to pre-populate the rollback journal
> with the full contents of the original database. Then the file could
> be truncated before the copy-back procedure. That would make it
> clear to the OS that it is free to allocate whatever file blocks it
> wants, hopefully in better patterns. The copy back could also be
> done in very large chunks.

This is a fantastic idea! Not only truncate - since you know the new size, you 
could also set the size beforehand before you start copying the pages (similar 
to SQLITE_FCNTL_CHUNK_SIZE). Most filesystems will try very hard to place it 
contiguously.

A more involved idea that would improve efficiency (two copies instead of 
three, and twice the database size instead of three times) would be to use the 
journal file directly as the new database (which right now it's created in the 
temp folder). This involves writing an invalid journal header (typically 512 
bytes of zeros), and then treating it as a normal database file, only starting 
at offset 512, and with pages with are 8 bytes bigger, preceded by their number 
(sequntial in the case) and followed by the fast checksum. Once the new 
database is written (in journal format), the journal header can be made valid. 
At this point the transaction has commited, and you can truncate and copy to 
the original file as above (which is conceptually a rollback, only you're 
rolling back to the vacuumed database).
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Scott Hess
On Wed, Sep 1, 2010 at 12:46 PM, Jay A. Kreibich  wrote:
> On Wed, Sep 01, 2010 at 11:41:00AM -0700, Taras Glek scratched on the wall:
>> Currently VACUUM takes care of sqlite-level fragmentation. Unfortunately
>> it does little for fs-level fragmentation since the same file is being
>> reused. It would be really beneficial for Mozilla performance if we
>> could get a vacuum/hotcopy mode.

>  On a personal level, I don't think it is worth it.  In the end, you're
>  still hoping the OS and filesystem will make smart choices about block
>  allocations.  An application shouldn't need to be spending a lot
>  of time worrying about this level of filesystem performance.  No
>  matter what, you're just hinting and setting up conditions that
>  should allow the filesystem driver to do something smart and fast.
>  It may, or it may not, actually do so.

I agree with Jay - while it is tempting to have SQLite bite off
optimizing this kind of thing, it's pretty far out of scope.  Next
we'll be talking about running SQLite on raw partitions!  The recent
addition of SQLITE_FCNTL_CHUNK_SIZE is probably about all the hinting
you can productively manage.

[In the end, VACUUM itself is kind of a hack, I sometimes suspect that
more than half of the time when it is being used it is probably as a
result of a micro-benchmark finding, and in the long term may actually
be causing more harm than good.]

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Jay A. Kreibich
On Wed, Sep 01, 2010 at 11:41:00AM -0700, Taras Glek scratched on the wall:
>   Hi,
> Currently VACUUM takes care of sqlite-level fragmentation. Unfortunately 
> it does little for fs-level fragmentation since the same file is being 
> reused. It would be really beneficial for Mozilla performance if we 
> could get a vacuum/hotcopy mode.
> 
> As I understand it, currently vacuum works by copying out the data to a 
> new file, then copying it back in and truncating the original db file.
> 
> It would be really nice to instead do something like:
> 
> copy the data to a new file

  You're good up to here.  This is basically the "VACUUM TO "
  idea I threw out some months ago, and Max referenced.

> swap the underlying filehandles to point at new file
> 
> remove old file
> 
> rename the new file to old name.

  ...and watch every other database connection that was accessing the
  database file get very, very confused.

  You're also breaking transactional integrity.  You need the option of
  backing-out of the operation right up until the moment it works, and
  this procedure can't do that.  For example, if you lose power right
  after "remove old file", your database is no longer there.

  ...And before someone says something like "but the application can just
  open the other one", I agree-- that it is an application problem.

> Currently we can copy the data to a new file, but it is a pretty 
> invasive change to swap all of the current sqlite connections to the new 
> file.

  You'd have to do that anyways, for all connections other than the one
  that issued the VACUUM command.  Coordinating FD swaps across multiple
  connections in the same process would be confusing enough-- there 
  would be no possible way to do it across database connections in
  multiple processes.

  And I really doubt you could do any of this in a transactional-safe way.
  It's application level stuff.

> Things like prepared statements, etc need to be updated for every 
> single db consumer. Thus it would make sense to have this feature on the 
> sqlite side.

  I'm not sure I agree that it makes sense.  As soon as you start to
  talk about filesystem operations (such as rename) you're way outside
  of the scope of SQLite.  Again, doing this kind of thing in a
  transactional-safe way is nearly impossible.



  Even if we stick with the existing procedure, I'm wondering if it is
  really that bad (that's an honest question; I don't have data one way
  or the other).  In the last step of the VACUUM process, the data is
  copied back to the original file, page by page.  If I understand it
  correctly, this is more or less a blanket write, blasting pages from
  the temp DB to the original DB, front to back (intermixed with copies
  to the journal file).  There is no reason to assume the filesystem
  will over-write the existing allocations, rather than just create new
  ones, especially if the pages are shuffled in groups... although if
  the goal is to defragment the file, we're kind of depending on that,
  which is unwise.

  Maybe there would be some way to pre-populate the rollback journal
  with the full contents of the original database.  Then the file could
  be truncated before the copy-back procedure.  That would make it
  clear to the OS that it is free to allocate whatever file blocks it
  wants, hopefully in better patterns.  The copy back could also be
  done in very large chunks.

  I'm not totally sure, but I'm guessing that would require several
  modifications and enhancements to the rollback and B-tree code.
  You would need the ability to force a full rollback copy and the
  ability do a blind, non-journaled copy, but it seems like it could
  be done without modifications to the existing VACUUM semantics, all
  while keeping the transactional integrity of the VACUUM command.



  On a personal level, I don't think it is worth it.  In the end, you're
  still hoping the OS and filesystem will make smart choices about block
  allocations.  An application shouldn't need to be spending a lot
  of time worrying about this level of filesystem performance.  No
  matter what, you're just hinting and setting up conditions that
  should allow the filesystem driver to do something smart and fast.
  It may, or it may not, actually do so.

-j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Richard Hipp
On Wed, Sep 1, 2010 at 2:41 PM, Taras Glek  wrote:

>  Hi,
> Currently VACUUM takes care of sqlite-level fragmentation. Unfortunately
> it does little for fs-level fragmentation since the same file is being
> reused. It would be really beneficial for Mozilla performance if we
> could get a vacuum/hotcopy mode.
>
> As I understand it, currently vacuum works by copying out the data to a
> new file, then copying it back in and truncating the original db file.
>
> It would be really nice to instead do something like:
>
>copy the data to a new file
>
>swap the underlying filehandles to point at new file
>
>remove old file
>
>rename the new file to old name.
>
> This yields two benefits:
>
>A less fragmented db
>
>~50% vacuum speedup since the data is only copied once
>
> Currently we can copy the data to a new file, but it is a pretty
> invasive change to swap all of the current sqlite connections to the new
> file. Things like prepared statements, etc need to be updated for every
> single db consumer. Thus it would make sense to have this feature on the
> sqlite side.
>
> Is this reasonable request?
>

If other connections have the database file open while it is being vacuumed,
then on unix the other connections will still be left open on the old
unlinked version of the database file and will never see the new content.
And on windows, the file swapping and renaming simply is not allowed while
other connections have the database files open.

The work around is to modify SQLite so that it is constantly closing and
reopening the database files.  But that adds rather large overheads that
seem likely to be much greater than any savings seen through a reduction in
disk FS fragmentation.




>
> Thanks,
> Taras
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-01 Thread Max Vlasov
> This yields two benefits:
>
>A less fragmented db
>
>~50% vacuum speedup since the data is only copied once
>
> Currently we can copy the data to a new file, but it is a pretty
> invasive change to swap all of the current sqlite connections to the new
> file. Things like prepared statements, etc need to be updated for every
> single db consumer. Thus it would make sense to have this feature on the
> sqlite side.
>
> Is this reasonable request?
>
> Thanks,
> Taras
>


Taras, I once post similar observations and Jay A. Kreibich answered about
the reasons why is implemented the way it is.

You can read it here:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg50941.html

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Feature request: copying vacuum

2010-09-01 Thread Taras Glek
  Hi,
Currently VACUUM takes care of sqlite-level fragmentation. Unfortunately 
it does little for fs-level fragmentation since the same file is being 
reused. It would be really beneficial for Mozilla performance if we 
could get a vacuum/hotcopy mode.

As I understand it, currently vacuum works by copying out the data to a 
new file, then copying it back in and truncating the original db file.

It would be really nice to instead do something like:

copy the data to a new file

swap the underlying filehandles to point at new file

remove old file

rename the new file to old name.

This yields two benefits:

A less fragmented db

~50% vacuum speedup since the data is only copied once

Currently we can copy the data to a new file, but it is a pretty 
invasive change to swap all of the current sqlite connections to the new 
file. Things like prepared statements, etc need to be updated for every 
single db consumer. Thus it would make sense to have this feature on the 
sqlite side.

Is this reasonable request?

Thanks,
Taras
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users