Hello, I have multiple files with that have very similar distributions and I'm seeing contention when concurrent COPY's are happening against a table with a b-tree index on the timestamp column. Each file look something like the following:
~4M rows with timestamp1 ~4M rows with timestamp2 ... To simulate the problem, I have a single file that has: a int, t timestamp [1-4000000],2007-01-01 00:00:01 [1-4000000], 2007-01-01 00:00:02 The machine is an eight-core system w/ 16gb of memory. postgresql.conf is configured with: shared_buffers = 256MB #256MB w/ one COPY, 512MB w/ two COPY's wal_buffers = 16MB checkpoint_segments = 100 When I COPY a single file into the table without an index on the time column, it takes: real 0m19.628s user 0m0.001s sys 0m0.002s When I have two COPY's with the same file concurrently into the same table without an index on the time column, it takes: real 0m39.933s user 0m0.001s sys 0m0.000s real 0m41.294s user 0m0.000s sys 0m0.003s So it takes roughly twice the time and we're not getting any increase in throughput for the parallel COPY's. The top LWLock contentions shows that it's the WALInsertLock. postgres=# select * from lwlocks order by blk desc limit 20; pid | lockid | share | write | blk -------+--------+-------+---------+-------- 14065 | 7 | 0 | 8000001 | 311884 14062 | 7 | 0 | 8000001 | 289561 14062 | 45 | 0 | 89783 | 1287 14065 | 45 | 0 | 90104 | 1207 14062 | 8 | 0 | 652 | 10 14062 | 17 | 0 | 86469 | 7 14065 | 370 | 0 | 7 | 6 14062 | 370 | 0 | 251466 | 6 14065 | 8 | 0 | 660 | 4 14065 | 29 | 46010 | 5337 | 3 14065 | 17 | 0 | 86467 | 3 14062 | 35 | 8830 | 5279 | 1 14062 | 40 | 6735 | 5139 | 1 14062 | 37 | 12860 | 5304 | 1 14062 | 29 | 46037 | 5387 | 1 14065 | 521 | 0 | 1 | 1 14065 | 28 | 6775 | 5236 | 1 14065 | 37 | 12831 | 5057 | 1 14062 | 506 | 0 | 2043 | 1 14065 | 40 | 6763 | 5251 | 1 (20 rows) When I COPY a single file into the table with an index on the time column, it takes: real 0m51.486s user 0m0.000s sys 0m0.002s When I have two COPY's with the same file concurrently into the same table with an index on the time column, it takes: real 2m4.414s user 0m0.001s sys 0m0.001s real 2m4.428s user 0m0.000s sys 0m0.002s So it's taking more than twice the time when the timestamp index was added. The top LWLock contentions shows that there's a bit more contention on the WALInsertLock, but there's additional contention on two buffer pages. postgres=# select * from lwlocks order by blk desc limit 20; pid | lockid | share | write | blk -------+--------+----------+----------+-------- 13888 | 7 | 0 | 16027716 | 342522 13889 | 7 | 0 | 16027881 | 339595 13889 | 400 | 4000000 | 4000002 | 72875 13889 | 388 | 4000001 | 4000000 | 72545 13888 | 388 | 4000001 | 4000001 | 70554 13888 | 400 | 4000000 | 4000002 | 68748 13889 | 45 | 0 | 89402 | 1340 13888 | 45 | 0 | 89259 | 1290 13889 | 33 | 12117131 | 6772 | 1102 13888 | 33 | 12118267 | 7178 | 1054 13888 | 29 | 4232749 | 7159 | 415 13889 | 29 | 4233831 | 7057 | 353 13888 | 34 | 4190329 | 6907 | 318 13889 | 34 | 4191618 | 6870 | 292 13888 | 38 | 4185303 | 7004 | 283 13889 | 38 | 4186310 | 7084 | 259 13889 | 922 | 0 | 278 | 86 13889 | 940 | 0 | 207 | 80 13889 | 102316 | 0 | 346 | 72 13889 | 854 | 0 | 194 | 70 (20 rows) The two hotspots seem to represent the first BTree pages that contain the '2007-01-01 00:00:01' and '2007-01-01 00:00:02' keys. I suspect it's due to concurrent _bt_doinsert(), which tries to acquire both a read lock (_bt_search) and a write lock (_bt_findinsertloc) on those first BTree pages. Does this sound like a reasonable explanation for the contention? Btw, this is against a 8.4beta2 build as of yesterday, and I'm pretty sure that this happens in REL8_3_STABLE as well. Thanks, Alan