On Tue, Mar 20, 2012 at 01:59:59PM -0700, Udi Karni scratched on the wall: > Hello, > > I am creating large DBs - each with a single table (magnitude of a few > hundred million rows / 100GB). It takes a few transformations to get to the > final product. When done - I VACUUM the final result. > > I am noticing that VACUUM first makes a copy of the data (presumably what > it wants to keep?) into a Temp file. Then it creates a Journal file which I > suspect ends up becoming the final table. Then the Temp and Original go > away - but for a period of time you need 3X the space and you go through 2 > complete copies of the data (more or less). > > Is there a way to go directory from "original" to "journal/final" - > skipping the creation of the Temp version?
No, it requires all three copies. VACUUM has two requirements. First, it must be ACID compliant and transaction safe. In short, even if the power fails in the middle of a VACUUM, you must be able to recover the original file, in place. Second, the VACUUM must result in the original file being updated. Not just "a file with the same name", but the actual, original file. Remember that other processes may have the database open during the VACUUM process, and they must continue to work, using the same active file handle, when the VACUUM finishes. This means two copies are required. First, the original database must be copied to a temp file. During this copy process, the database is cleaned and defragmented. That temp copy must then be copied back on top of the original file. That second copy needs to be transaction safe, however, so we need to journal every change to the original file, just in case something does wrong. All that adds up to three copies-- the original, the optimized temp copy, and the journal file to fix things up should the "copy back" process fail. Almost exactly two years ago I proposed a "VACUUM TO <filename>" version of the command that did the first copy and then quit. Rather than building an optimized temp copy, VACUUM TO would copy the current database to an optimized named file (rather than a temp file), and then skip the copy-back stage. This would allow a system admin to shut down all database users, VACUUM the database, swap files, and finally restart everything. The process would require more manual work, but would only require 2x the drive space, rather than 3x. Nobody spoke up about the idea, however. If you look at the VACUUM code, it isn't black magic. Building an external tool that did a "VACUUM TO" process wouldn't be that difficult. > And a more general question. My PC has 8GB of RAM. I am considering getting > a much larger machine that can take upwards of 100-200GB of RAM. It's a > little pricey - so - will SQLite use it advantageously? I wouldn't want to > get it only to find out that anything beyond X GB is a waste of money. > Should a larger machine with multiple CPU and lots of RAM be able to run 2 > concurrent SQLites successfuly? Any good recommendations on this? You'll need a 64-bit version of SQLite. in the end, however, I'm not sure you'll see much. You can crank up the page cache size, but that's not going to buy you as much as you might think-- SQLite frequently flushes the cache is more than one process is open at the same time. The RAM might work nicely as a file cache at the OS level, but only for read operations. Writes are still going to need to go to disk. Unless reads are more frequent than writes-- by several orders of magnitude-- you are likely to get better performance by looking at fast storage, rather than lots of RAM. If you're really looking for that kind of heavy hitting performance, I might also consider a more traditional client/server RDBMS like PostgreSQL, or a non-relational DB like Redis. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

