On Wed, Sep 7, 2016 at 7:08 PM, Dan Kennedy <danielk1...@gmail.com> wrote:
> On 09/07/2016 11:08 PM, Dominique Devienne wrote: > >> Initial design was to copy the DB file (app is "shutdown", so no >> connection >> to that DB file). >> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables. >> But the 25GB copy was deemed too long in that case (several minutes). >> (after all, copying 95% of 25GB to discard all those GBs is inefficient) >> > I've actually tried it now, with a 46 GB DB, and that took ~ 40min on my old Linux box. 25min copy, 11.5min drop (with TX), and 3.5min vacum, to go down to a 1.5 GB DB. > So alternate design was to create the schema empty in the new DB file, >> attach the old one, and insert the data from the 30 tables into the new >> empty (with identical schema/structure) tables. But that's also very slow >> apparently. >> > > SQLite contains a special optimization for statements of the form: > > INSERT INTO tbl SELECT * FROM tbl2; > > If both tables have the same set of columns, the same PK, the same indexes > and identical UNIQUE constraints and no triggers, and if foreign keys are > disabled, SQLite can copy records directly from one b-tree structure to > another without unpacking the records. This can be much faster. And it > writes the b-tree structures in order too, which leads to a more efficient > use of the cache. > > So if you create your new db with the tables and indexes but no triggers, > then attach it and your main db to the same handle so that you can use an > "INSERT INTO ... SELECT ..." statement of the form above, things might run > a bit faster. And this approach took 1min11s, much faster! The DB file also happened to be a little smaller, 1.45 GB. Dan, how does one know whether this optimization kicks in or not??? Any way to know it, using an explain (query) plan for example? Note that I later learned they were reordering columns too, so it wasn't a insert into SomeTable select * from old.SomeTable like in my test (with decent enough performance). Thus I suspect that's why it was too slow for them (but it's just a guess). In any case, thanks for your answer Dan. --DD PS: Here's my script .timer ON attach 'DB.db' as old; BEGIN; insert into SomeTable select * from old.SomeTable; ... COMMIT; and here's the output: [ddevienne]$ time sqlite3 DB-new.db < DB-new-insert-from.sql Run Time: real 0.038 user 0.001999 sys 0.001000 (BEGIN) Run Time: real 0.000 user 0.000000 sys 0.000000 (INSERT) Run Time: real 0.001 user 0.000000 sys 0.000000 ... Run Time: real 0.000 user 0.000000 sys 0.001000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.018 user 0.000000 sys 0.000999 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.299 user 0.126981 sys 0.017998 Run Time: real 0.001 user 0.000000 sys 0.000000 Run Time: real 0.012 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.001000 Run Time: real 0.018 user 0.000000 sys 0.000000 Run Time: real 0.010 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.098 user 0.042993 sys 0.018997 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.001000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.000 user 0.000000 sys 0.000000 Run Time: real 0.015 user 0.006999 sys 0.001000 Run Time: real 0.009 user 0.000000 sys 0.000000 Run Time: real 0.009 user 0.001000 sys 0.000999 Run Time: real 69.194 user 25.081187 sys 4.877259 Run Time: real 0.021 user 0.003000 sys 0.001000 Run Time: real 0.001 user 0.000000 sys 0.000000 (INSERT) Run Time: real 1.457 user 0.000999 sys 0.017997 (COMMIT) 25.267u 4.944s 1:11.21 42.4% 0+0k 3126056+2868824io 0pf+0w One table takes up 99% of the new DB, with 27M rows, so that explains the 69s I guess. *** Table FOO and all its indices **************************** Percentage of total database...................... 99.0% Number of entries................................. 27437811 Bytes of storage consumed......................... 1449852928 Bytes of payload.................................. 1315073619 90.7% Average payload per entry......................... 47.93 Average unused bytes per entry.................... 0.43 Average fanout.................................... 150.00 Maximum payload per entry......................... 596 Entries that use overflow......................... 0 0.0% Index pages used.................................. 2352 Primary pages used................................ 351616 Overflow pages used............................... 0 Total pages used.................................. 353968 Unused bytes on index pages....................... 334021 3.5% Unused bytes on primary pages..................... 11385163 0.79% Unused bytes on overflow pages.................... 0 Unused bytes on all pages......................... 11719184 0.81% _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users