Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

E.Pasma Sun, 02 Dec 2018 11:30:24 -0800

> 2 dec. 2018, Keith Medcalf:
> 
> 
> Well if it is unique and not null, then why not just make it the rowid?  In 
> either case, you would still have to permute the storage tree at insert time 
> if the inserts were not in-order.  So let us compare them shall we:
> 
> sqlite> create table x(value INTEGER PRIMARY KEY);
> sqlite> insert into x select random() from generate_series where start=1 and 
> stop=100000000;
> Run Time: real 185.795 user 184.265625 sys 0.343750
> 
> sqlite> create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
> sqlite> insert into x select random() from generate_series where start=1 and 
> stop=100000000;
> Run Time: real 174.661 user 173.890625 sys 0.000000
> 
> sqlite> create table x(value INTEGER NOT NULL);
> sqlite> insert into x select random() from generate_series where start=1 and 
> stop=100000000;
> Run Time: real 20.287 user 20.265625 sys 0.000000
> sqlite> create unique index ix on x(value);
> Run Time: real 211.556 user 207.562500 sys 2.562500
> 
> sqlite> create table x(value INTEGER NOT NULL);
> sqlite> create unique index ix on x(value);
> sqlite> insert into x select random() from generate_series where start=1 and 
> stop=100000000;
> Run Time: real 196.719 user 195.437500 sys 0.000000
> 
> So, the winner is (in this case, with no other payload) that the INTEGER 
> PRIMARY KEY on a WITHOUT ROWID table is the most time efficient, followed by 
> being the INTEGER PRIMARY KEY of a ROWID table.  Those two also happen to be 
> the most space-efficient as well.  Interestingly it was overall faster to 
> build and maintain the separate index at insert time in this example than to 
> build the index seperately, but not by much.  Note this is for 100,000,000 
> records processed entitely in RAM in a single transaction ...


Thanks because this speed is what I had expected. Except that creating the 
index beforehand can be overall faster than adding it after the data is 
inserted.
Unfortunately, but almost funny, my measurements are almost the reverse of what 
you see. See below. 
Possible clues:
- I tested with a file database, not memory.
- I reduced the number of rows to 10.000.000
- I have only a basic system with 4Gb RAM and a single processor

.timer on
.load series
drop table if exists x;
create table x(value INTEGER PRIMARY KEY);

insert into x select random() from generate_series where start=1 and 
stop=10000000;
Run Time: real 87.673 user 37.691114 sys 43.527249

drop table x;
create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;

insert into x select random() from generate_series where start=1 and 
stop=10000000;
Run Time: real 88.759 user 36.276227 sys 44.190566

drop table x;
create table x(value INTEGER NOT NULL);

insert into x select random() from generate_series where start=1 and 
stop=10000000;
Run Time: real 3.980 user 2.266869 sys 0.124012

create unique index ix on x(value);
Run Time: real 10.131 user 7.623369 sys 0.797015

drop table x;
create table x(value INTEGER NOT NULL);
create unique index ix on x(value);

insert into x select random() from generate_series where start=1 and 
stop=10000000;
Run Time: real 107.647 user 41.249081 sys 50.869456


_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

Reply via email to