Greetings,

I'm running into some performance problems with my insert operations using
sqlite 3.35. OS is linux with a 2.69 kernel using an out-of-box ext3 file
system.

My schema consists of a single table with 12 columns with a numeric affinity
and 5 columns with text affinity. The table has a primary integer key on a
monotonically increasing row id number.

My first experiment proceeds as follows: Using a single thread, I insert 5
million rows containing around 0.5 KB of data to the table. These inserts
occur in transactions of 1000 inserts each.

In this case, performance is very fast. The entire experiment takes around
10 minutes to run. Typical transaction time is between 0.1 and 0.2 seconds
for every 1000 inserts.

Next, I add in an index on of my numeric columns. The value of this column
is a random 32 bit integer. This is where the problem comes in.

At first, the experiment is fast, and I see transaction times similar to
what I've observed without the index. Then, after around 800,000 entries, I
start seeing drastically larger transaction times of around 2-5 seconds per
transaction. After around 2,500,000 entries, these times are consistently
hovering around 5 seconds. At that point I killed the process.

With the help of I/O stat, it seems that at the point that things get slow
we're seeing an I/O wait of around 50%. Towards the end of the experiment,
I/O wait is around  85%. Additionally, at the beginning of the experiment,
I'm observing a negligible amount of I/O read. When it begins to get slow,
I/O occurs at around 1/10th the rate of I/O write.

I suspect that this had something to do with the way sqlite caches my index.
I modified the code to call sqlite3_enable_shared_cache. It then created a
dummy table, inserted a row, and then performed a select on it. In the
callback for the select, I then proceeded with the experiment (this is
similar to how Mozilla's storage module enables the shared cache).

In this case, performance was slightly worse at first (0.4 seconds for each
transaction, I suspect that's because it's cheaper to use the buffer cache).
As the experiment proceeded, however, the transaction times were consistent
until I filled SQLite's cache. Even with an absurdly high cache PRAGMA (~500
MB), it managed to fill after around 1,000,000 rows. At that point, I began
to see the dreaded 5 second transaction times again.

I tried a few more things (periodically dumping the cache by closing the
connections, then "priming" it by doing a select on the value of the index),
but I didn't have any luck speeding things up.

Anyone care to take a guess what's going on here? Is it a problem with
rebalancing the btree that's storing the index? It seems to me that
something is doing random I/O, and I was hoping that if I could "force" the
index's pages to stay in the cache, then I'd be able to avoid that.

Any help is appreciated,
-Stan

Reply via email to