Heikki Linnakangas wrote:
I still need to test the worst-case performance, with input that has a lot of escapes.

Ok, I've done some more performance testing with this. I tested COPY FROM with a table with a single "text" column. There was a million rows in the table, with a 1000 character long string:

postgres=# CREATE TABLE narrowtable2 (id text);
CREATE TABLE
postgres=# INSERT INTO narrowtable2 SELECT repeat(E'\\', 1000) FROM generate_series(1, 1000000);
INSERT 0 1000000

After that, I dumped that to a file, and loaded it back using COPY FROM:

time ~/installations/cvshead/bin/psql postgres -c "BEGIN; TRUNCATE narrowtable2; COPY narrowtable2 FROM '/home/perftester/narrowtable3.tbl'; ROLLBACK;"

I repeated the test with different frequencies of backslashes in the string, with and without the patch, and the took the smallest number of each test case:

backslashes     with    without patch
all             24.9    15.6
every 4th       12.7    11.4
every 8th       10.4    10.7
every 16th      8.7     10.3
none            6.8     9.8

So the overhead of using memchr slows us down if there's a lot of escape or quote characters. The breakeven point seems to be about 1 in 8 characters. I'm not sure if that's a good tradeoff or not...


I also tested a table with single integer column, and found no meaningful difference (10.5 without patch vs 10.6 with patch). oprofile shows that in this test case, only ~5% of the CPU time is spent in CopyReadLineText, and the patch doesn't change that.

Without patch:
samples % image name app name symbol name 7563 12.7220 no-vmlinux postgres (no symbols)
4050      6.8127  postgres                 postgres                 DoCopy
3334 5.6083 postgres postgres LWLockAcquire 3238 5.4468 postgres postgres CopyReadLine 2900 4.8782 postgres postgres LWLockRelease 2781 4.6780 libc-2.7.so postgres __GI_____strtoll_l_internal 2778 4.6730 postgres postgres heap_formtuple
2636      4.4341  postgres                 postgres                 hash_any
2087 3.5106 no-vmlinux no-vmlinux (no symbols)
1748      2.9404  libc-2.7.so              postgres                 memset
1724 2.9000 postgres postgres PinBuffer 1670 2.8092 postgres postgres PageAddItem 1645 2.7671 postgres postgres heap_insert 1459 2.4542 postgres postgres UnpinBuffer 1457 2.4509 postgres postgres ReadBuffer_common 1321 2.2221 postgres postgres hash_search_with_hash_value 1278 2.1498 postgres postgres MarkBufferDirty 1219 2.0505 oprofiled oprofiled (no symbols) 972 1.6350 postgres postgres pg_verify_mbstr_len 756 1.2717 postgres postgres RelationPutHeapTuple
665       1.1186  postgres                 postgres                 pg_atoi
631 1.0614 postgres postgres RelationGetBufferForTuple 613 1.0312 postgres postgres AllocSetReset
...

With patch:
samples % image name app name symbol name 42720 18.1450 no-vmlinux postgres (no symbols)
15367     6.5270  postgres                 postgres                 DoCopy
11831 5.0251 postgres postgres LWLockAcquire 11500 4.8845 no-vmlinux no-vmlinux (no symbols) 10182 4.3247 postgres postgres LWLockRelease 9912 4.2100 libc-2.7.so postgres __GI_____strtoll_l_internal
9811      4.1671  postgres                 postgres                 hash_any
8824 3.7479 postgres postgres heap_formtuple 7459 3.1682 postgres postgres CopyReadLine 7187 3.0526 postgres postgres PageAddItem
6313      2.6814  libc-2.7.so              postgres                 memset
5842 2.4813 postgres postgres PinBuffer 5230 2.2214 postgres postgres UnpinBuffer 5160 2.1917 postgres postgres heap_insert 4838 2.0549 postgres postgres ReadBuffer_common 4819 2.0468 postgres postgres hash_search_with_hash_value 4691 1.9925 postgres postgres MarkBufferDirty
3675      1.5609  libc-2.7.so              postgres                 memchr
3617 1.5363 postgres postgres AllocSetAlloc 3585 1.5227 postgres postgres pg_verify_mbstr_len 3326 1.4127 postgres postgres AllocSetReset
...


These tests were on a test server with a dual-core 64-bit Intel Xeons. I'd still like to hear reports from other platforms.

Another thing that seems like an obvious win is to merge CopyReadLine and CopyReadAttributesText/CSV so that we do just one pass over the input. But that seems suspiciously obvious, I wonder if I'm missing something.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

Reply via email to