[HACKERS] B-Tree contention?

Alan Li Sun, 07 Jun 2009 19:53:41 -0700

Hello,

I have multiple files with that have very similar distributions and I'm
seeing contention when concurrent COPY's are happening against a table with
a b-tree index on the timestamp column.  Each file look something like the
following:


~4M rows with timestamp1
~4M rows with timestamp2
...

To simulate the problem, I have a single file that has:

a int, t timestamp
[1-4000000],2007-01-01 00:00:01
[1-4000000], 2007-01-01 00:00:02

The machine is an eight-core system w/ 16gb of memory.  postgresql.conf is
configured with:

shared_buffers = 256MB     #256MB w/ one COPY, 512MB w/ two COPY's
wal_buffers = 16MB
checkpoint_segments = 100

When I COPY a single file into the table without an index on the time
column, it takes:

real    0m19.628s
user    0m0.001s
sys     0m0.002s

When I have two COPY's with the same file concurrently into the same table
without an index on the time column, it takes:

real    0m39.933s
user    0m0.001s
sys     0m0.000s

real    0m41.294s
user    0m0.000s
sys     0m0.003s

So it takes roughly twice the time and we're not getting any increase in
throughput for the parallel COPY's.  The top LWLock contentions shows that
it's the WALInsertLock.

postgres=# select * from lwlocks order by blk desc limit 20;
  pid  | lockid | share |  write  |  blk
-------+--------+-------+---------+--------
 14065 |      7 |     0 | 8000001 | 311884
 14062 |      7 |     0 | 8000001 | 289561
 14062 |     45 |     0 |   89783 |   1287
 14065 |     45 |     0 |   90104 |   1207
 14062 |      8 |     0 |     652 |     10
 14062 |     17 |     0 |   86469 |      7
 14065 |    370 |     0 |       7 |      6
 14062 |    370 |     0 |  251466 |      6
 14065 |      8 |     0 |     660 |      4
 14065 |     29 | 46010 |    5337 |      3
 14065 |     17 |     0 |   86467 |      3
 14062 |     35 |  8830 |    5279 |      1
 14062 |     40 |  6735 |    5139 |      1
 14062 |     37 | 12860 |    5304 |      1
 14062 |     29 | 46037 |    5387 |      1
 14065 |    521 |     0 |       1 |      1
 14065 |     28 |  6775 |    5236 |      1
 14065 |     37 | 12831 |    5057 |      1
 14062 |    506 |     0 |    2043 |      1
 14065 |     40 |  6763 |    5251 |      1
(20 rows)

When I COPY a single file into the table with an index on the time column,
it takes:

real    0m51.486s
user    0m0.000s
sys     0m0.002s

When I have two COPY's with the same file concurrently into the same table
with an index on the time column, it takes:

real    2m4.414s
user    0m0.001s
sys     0m0.001s

real    2m4.428s
user    0m0.000s
sys     0m0.002s

So it's taking more than twice the time when the timestamp index was added.
The top LWLock contentions shows that there's a bit more contention on the
WALInsertLock, but there's additional contention on two buffer pages.

postgres=# select * from lwlocks order by blk desc limit 20;
  pid  | lockid |  share   |  write   |  blk
-------+--------+----------+----------+--------
 13888 |      7 |        0 | 16027716 | 342522
 13889 |      7 |        0 | 16027881 | 339595
 13889 |    400 |  4000000 |  4000002 |  72875
 13889 |    388 |  4000001 |  4000000 |  72545
 13888 |    388 |  4000001 |  4000001 |  70554
 13888 |    400 |  4000000 |  4000002 |  68748
 13889 |     45 |        0 |    89402 |   1340
 13888 |     45 |        0 |    89259 |   1290
 13889 |     33 | 12117131 |     6772 |   1102
 13888 |     33 | 12118267 |     7178 |   1054
 13888 |     29 |  4232749 |     7159 |    415
 13889 |     29 |  4233831 |     7057 |    353
 13888 |     34 |  4190329 |     6907 |    318
 13889 |     34 |  4191618 |     6870 |    292
 13888 |     38 |  4185303 |     7004 |    283
 13889 |     38 |  4186310 |     7084 |    259
 13889 |    922 |        0 |      278 |     86
 13889 |    940 |        0 |      207 |     80
 13889 | 102316 |        0 |      346 |     72
 13889 |    854 |        0 |      194 |     70
(20 rows)

The two hotspots seem to represent the first BTree pages that contain the
'2007-01-01 00:00:01' and '2007-01-01 00:00:02' keys.  I suspect it's due to
concurrent _bt_doinsert(), which tries to acquire both a read lock
(_bt_search) and a write lock (_bt_findinsertloc) on those first BTree
pages.  Does this sound like a reasonable explanation for the contention?

Btw, this is against a 8.4beta2 build as of yesterday, and I'm pretty sure
that this happens in REL8_3_STABLE as well.

Thanks, Alan

[HACKERS] B-Tree contention?

Reply via email to