I am seeking advice. For now I hope for a suggestion about changes from 17beta1 to 17beta2 that might cause the problem -- assuming there is a problem, and not a mistake in my testing.
One of the sysbench microbenchmarks that I run does a table scan with a WHERE clause that filters out all rows. That WHERE clause is there to reduce network IO. While running it on a server with 16 real cores, 12 concurrent queries and a cached database the query takes ~5% more time on 17beta2 than on 17beta1 or 16.3. Alas, this is a Google Cloud server and perf doesn't work there. On small servers I have at home I can reproduce the problem without concurrent queries and 17beta2 is 5% to 10% slower there. The SQL statement for the scan microbenchmark is: SELECT * from %s WHERE LENGTH(c) < 0 I will call my small home servers SER4 and PN53. They are described here: https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html The SER4 is a SER 4700u from Beelink and the PN53 is an ASUS ExpertCenter PN53. Both use an AMD CPU with 8 cores, AMD SMT disabled and Ubuntu 22.04. The SER4 has an older, slower CPU than the PN53. In all cases I compile from source using a configure command line like: ./configure --prefix=$pfx --enable-debug CFLAGS="-O2 -fno-omit-frame-pointer" I used perf to get flamegraphs during the scan microbenchmark and they are archived here: https://github.com/mdcallag/mytools/tree/master/bench/bugs/pg17beta2/24Jul5.sysbench.scan For both SER4 and PN53 the time to finish the scan microbenchmark is ~10% longer in 17beta2 than it was in 17beta1 and 16.3. On the PN53 the query takes ~20 seconds with 16.3 and 17beta1 vs ~22.5 seconds for 17beta2 when the table has 60M rows. >From the SVG files for SER4 and 17beta2 I see ~2X more time in slot_getsomeattrs_int vs 17beta1 or 16.3 with all of that time spent in its child -- tts_buffer_heap_getsomeattrs <https://draft.blogger.com/blog/post/edit/9149523927864751087/2076930226137683424#>. That function is defined in src/backend/executor/execTuples.c and that file has not changed from 17beta1 to 17beta2. But I don't keep up with individual commits to Postgres so I won't guess as to the root cause. But the SVG files for PN53 don't show the same problem: - for 16.3 I see 85.24% in ExecInterpExpr vs 11.64% in SeqNext - for 17beta1 I see 82.82% in ExecInterpExpr vs 14.51% in SeqNext - for 17beta2 I see 85.03% in ExecInterpExpr vs 12.31% in SeqNext - for 17beta1 and 17beta2 the flamegraphs shows time spent handling page faults during SeqNext, and that isn't visible on the 16.3 flamegraph And then for PN53 looking at slot_getsomeattrs_int, a child of ExecInterpExpr - for 16.3 I see 6.99% in slot_getsomeattrs_int - for 17beta1 I see 4.29% in slot_getsomeattrs_int - for 17beta2 I see 3.99% in slot_getsomeattrs_int So at this point I am confused and repeating the test with a slightly larger table, but I am trying to keep the table small enough to fit in the Postgres buffer pool. I also have results from tables that are much larger than memory, and even in that case the problem can be reproduced. -- Mark Callaghan mdcal...@gmail.com