On Wed, 18 Dec 2024 at 23:42, John Naylor <[email protected]> wrote:
> The difference is small enough that normally I'd say it's likely
> unrelated to the patch, but on the other hand it's consistent with
> saving (3 * 10 * 10 million) cycles because of 1 less multiplication
> each, which is not nothing, but for shoving bytes into /dev/null it's
> not exciting either. The lookup for the 64-bit case has grown to 1024
> bytes, which will compete for cache space. I don't have a strong
> reason to be either for or against this patch. Anyone else want to
> test?
I tried it out too on my Zen4 machine. I don't doubt David saw a
speedup when testing the performance in isolation, but I can't detect
anything going faster when using it in Postgres.
Maybe we can revisit if we make COPY TO faster someday. As of today,
it's a pretty inefficient lump of code.
My results:
$ echo master && ./intbench.sh
master
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 246.294 ms
latency average = 243.167 ms
latency average = 245.620 ms
latency average = 247.135 ms
latency average = 248.206 ms
latency average = 253.433 ms
latency average = 259.296 ms
latency average = 248.856 ms
latency average = 247.518 ms
latency average = 259.581 ms
latency average = 244.426 ms
latency average = 244.553 ms
latency average = 249.909 ms
latency average = 244.079 ms
latency average = 246.422 ms
latency average = 248.763 ms
latency average = 247.318 ms
latency average = 249.675 ms
latency average = 245.192 ms
latency average = 253.975 ms
$ echo patched && ./intbench.sh
patched
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 253.964 ms
latency average = 257.463 ms
latency average = 250.506 ms
latency average = 252.401 ms
latency average = 260.806 ms
latency average = 250.120 ms
latency average = 251.539 ms
latency average = 262.180 ms
latency average = 252.349 ms
latency average = 251.332 ms
latency average = 249.490 ms
latency average = 252.696 ms
latency average = 251.895 ms
latency average = 248.466 ms
latency average = 255.839 ms
latency average = 253.334 ms
latency average = 250.548 ms
latency average = 288.164 ms
latency average = 252.587 ms
latency average = 256.059 ms
perf top:
master:
16.59% postgres [.] CopyAttributeOutText
15.63% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.94% postgres [.] pg_ltoa
9.85% postgres [.] CopyOneRowTo
6.86% postgres [.] AllocSetAlloc
6.73% postgres [.] tts_buffer_heap_getsomeattrs
patched
19.53% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.52% postgres [.] pg_ltoa
11.76% postgres [.] CopyAttributeOutText
11.40% postgres [.] CopyOneRowTo
6.96% postgres [.] tts_buffer_heap_getsomeattrs
6.35% postgres [.] AllocSetAlloc
I can't think of what we have that exercises pg_ltoa() or pg_ultoa_n()
more. timestamp_out() might, but that's lots of small ints.
David
#!/bin/bash
dbname=postgres
psql -c "create table if not exists tmp as select random(1,1e9) c1,
random(1,1e9) c2,random(1,1e9) c3,random(1,1e9) c4,random(1,1e9)
c5,random(1,1e9) c6,random(1,1e9) c7,random(1,1e9) c8,random(1,1e9)
c9,random(1,1e9) c10 from generate_series(1,1000000);" $dbname
for i in {1..20}
do
sleep 10
echo "copy tmp to '/dev/null'" > bench.sql
pgbench -n -f bench.sql -T 10 $dbname | grep latency
done;