On 9/10/19, mailing lists <mailingli...@skywind.eu> wrote:

> What is the background that index creation is so much faster than insertion
> using indices?

Indexes are maintained in key order.  So inserts are happening at
random spots all across the index.  For each insert, the system has to
(1) read a 4096-byte page, (2) update the 10 or 20 bytes corresponding
to the index entry, and finally (3) write the 4096-byte page.

If you keep pages in cache, you might get lucky in update two or three
entries on each page before you have to write it back.  But even then,
you are doing a lot of I/O relative to the amount of content you are
changing.  Your I/O is hundreds of times more than the amount of
content being updated.

CREATE INDEX does an external merge source on the keys, such that keys
are always written in ascending order.  Hence, the pattern is (1) fill
up a 4096-byte page with new entries, then (2) write the page.  The
I/O to content change ratio is now close to 1.0, which is what you
want.  About a hundred times less I/O than random inserts.

The fact that all pages are written in order also helps, as
filesystems tend to be optimized for that case.

-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to