Re: [sqlite] Best way to "split" a large DB into two DBs

Dominique Devienne Thu, 08 Sep 2016 08:26:25 -0700

On Wed, Sep 7, 2016 at 7:08 PM, Dan Kennedy <danielk1...@gmail.com> wrote:


> On 09/07/2016 11:08 PM, Dominique Devienne wrote:
>
>> Initial design was to copy the DB file (app is "shutdown", so no
>> connection
>> to that DB file).
>> Drop the 5 big tables. Vaccum it (the new DB). Add the new tables.
>> But the 25GB copy was deemed too long in that case (several minutes).
>> (after all, copying 95% of 25GB to discard all those GBs is inefficient)
>>
>
I've actually tried it now, with a 46 GB DB, and that took ~ 40min on my
old Linux box.
25min copy, 11.5min drop (with TX), and 3.5min vacum, to go down to a 1.5
GB DB.


> So alternate design was to create the schema empty in the new DB file,
>> attach the old one, and insert the data from the 30 tables into the new
>> empty (with identical schema/structure) tables. But that's also very slow
>> apparently.
>>
>
> SQLite contains a special optimization for statements of the form:
>
>   INSERT INTO tbl SELECT * FROM tbl2;
>
> If both tables have the same set of columns, the same PK, the same indexes
> and identical UNIQUE constraints and no triggers, and if foreign keys are
> disabled, SQLite can copy records directly from one b-tree structure to
> another without unpacking the records. This can be much faster. And it
> writes the b-tree structures in order too, which leads to a more efficient
> use of the cache.
>
> So if you create your new db with the tables and indexes but no triggers,
> then attach it and your main db to the same handle so that you can use an
> "INSERT INTO ... SELECT ..." statement of the form above, things might run
> a bit faster.


And this approach took 1min11s, much faster!
The DB file also happened to be a little smaller, 1.45 GB.

Dan, how does one know whether this optimization kicks in or not???
Any way to know it, using an explain (query) plan for example?

Note that I later learned they were reordering columns too, so it wasn't a
    insert into SomeTable select * from old.SomeTable
like in my test (with decent enough performance).
Thus I suspect that's why it was too slow for them (but it's just a guess).

In any case, thanks for your answer Dan. --DD

PS: Here's my script

.timer ON
attach 'DB.db' as old;
BEGIN;
insert into SomeTable select * from old.SomeTable;
...
COMMIT;

and here's the output:

[ddevienne]$ time sqlite3 DB-new.db < DB-new-insert-from.sql
Run Time: real 0.038 user 0.001999 sys 0.001000 (BEGIN)
Run Time: real 0.000 user 0.000000 sys 0.000000 (INSERT)
Run Time: real 0.001 user 0.000000 sys 0.000000 ...
Run Time: real 0.000 user 0.000000 sys 0.001000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.018 user 0.000000 sys 0.000999
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.299 user 0.126981 sys 0.017998
Run Time: real 0.001 user 0.000000 sys 0.000000
Run Time: real 0.012 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.001000
Run Time: real 0.018 user 0.000000 sys 0.000000
Run Time: real 0.010 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.098 user 0.042993 sys 0.018997
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.001000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.000 user 0.000000 sys 0.000000
Run Time: real 0.015 user 0.006999 sys 0.001000
Run Time: real 0.009 user 0.000000 sys 0.000000
Run Time: real 0.009 user 0.001000 sys 0.000999
Run Time: real 69.194 user 25.081187 sys 4.877259
Run Time: real 0.021 user 0.003000 sys 0.001000
Run Time: real 0.001 user 0.000000 sys 0.000000 (INSERT)
Run Time: real 1.457 user 0.000999 sys 0.017997 (COMMIT)
25.267u 4.944s 1:11.21 42.4%    0+0k 3126056+2868824io 0pf+0w

One table takes up 99% of the new DB, with 27M rows, so that explains the
69s I guess.

*** Table FOO and all its indices ****************************

Percentage of total database......................  99.0%
Number of entries................................. 27437811
Bytes of storage consumed......................... 1449852928
Bytes of payload.................................. 1315073619  90.7%
Average payload per entry......................... 47.93
Average unused bytes per entry.................... 0.43
Average fanout.................................... 150.00
Maximum payload per entry......................... 596
Entries that use overflow......................... 0            0.0%
Index pages used.................................. 2352
Primary pages used................................ 351616
Overflow pages used............................... 0
Total pages used.................................. 353968
Unused bytes on index pages....................... 334021       3.5%
Unused bytes on primary pages..................... 11385163     0.79%
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 11719184     0.81%
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Best way to "split" a large DB into two DBs

Reply via email to