Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-11-15 10:09:06 -0500, Tom Lane wrote: > "Anton A. Melnikov" writes: > > I can't understand why i get the opposite results on my pc and on the > > server. It is clear that the absolute > > TPS values will be different for various configurations. This is normal. > > But differences? > > Is it unlikely that some kind of reference configuration is needed to > > accurately > > measure the difference in performance. Probably something wrong with my pc, > > but now > > i can not figure out what's wrong. > > > Would be very grateful for any advice or comments to clarify this problem. > > Benchmarking is hard :-(. Indeed. > IME it's absolutely typical to see variations of a couple of percent even > when "nothing has changed", for example after modifying some code that's > nowhere near any hot code path for the test case. I usually attribute this > to cache effects, such as a couple of bits of hot code now sharing or not > sharing a cache line. FWIW, I think we're overusing that explanation in our community. Of course you can encounter things like this, but the replacement policies of cpu caches have gotten a lot better and the caches have gotten bigger too. IME this kind of thing is typically dwarfed by much bigger variations from things like - cpu scheduling - whether the relevant pgbench thread is colocated on the same core as the relevant backend can make a huge difference, particularly when CPU power saving modes are not disabled. Just looking at tps from a fully cached readonly pgbench, with a single client: Power savings enabled, same core: 37493 Power savings enabled, different core: 28539 Power savings disabled, same core: 38167 Power savings disabled, different core: 37365 - can transparent huge pages be used for the executable mapping, or not On newer kernels linux (and some filesystems) can use huge pages for the executable. To what degree that succeeds is a large factor in performance. Single threaded read-only pgbench postgres mapped without huge pages: 37155 TPS with 2MB of postgres as huge pages: 37695 TPS with 6MB of postgres as huge pages: 42733 TPS The really annoying thing about this is that entirely unpredictable whether huge pages are used or not. Building the same way, sometimes 0, sometimes 2MB, sometimes 6MB are mapped huge. Even though the on-disk contents are precisely the same. And it can even change without rebuilding, if the binary is evicted from the page cache. This alone makes benchmarking extremely annoying. It basically can't be controlled and has huge effects. - How long has the server been started If e.g. once you run your benchmark on the first connection to a database, and after a restart not (e.g. autovacuum starts up beforehand), you can get a fairly different memory layout and cache situation, due to [not] using the relcache init file. If not, you'll have a catcache that's populated, otherwise not. Another mean one is whether you start your benchmark within a relatively short time of the server starting. Readonly pgbench with a single client, started immediately after the server: progress: 12.0 s, 37784.4 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 13.0 s, 37779.6 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 14.0 s, 37668.2 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 15.0 s, 32133.0 tps, lat 0.031 ms stddev 0.113, 0 failed progress: 16.0 s, 37564.9 tps, lat 0.027 ms stddev 0.012, 0 failed progress: 17.0 s, 37731.7 tps, lat 0.026 ms stddev 0.001, 0 failed There's a dip at 15s, odd - turns out that's due to bgwriter writing a WAL record, which triggers walwriter to write it out and then initialize the whole WAL buffers with 0s - happens once. In this case I've exagerated the effect a bit by using a 1GB wal_buffers, but it's visible otherwise too. Whether your benchmark period includes that dip or not adds a fair bit of noise. You can even see the effects of autovacuum workers launching - even if there's nothing to do! Not as a huge dip, but enough to add some "run to run" variation. - How much other dirty data is there in the kernel pagecache. If you e.g. just built a new binary, even with just minor changes, the kernel will need to flush those pages eventually, which may contend for IO and increases page faults. Rebuilding an optimized build generates something like 1GB of dirty data. Particularly with ccache, that'll typically not yet be flushed by the time you run a benchmark. That's not nothing, even with a decent NVMe SSD. - many more, unfortunately Greetings, Andres Freund
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-11-15 11:33:44 +0300, Anton A. Melnikov wrote: > The configure options and test scripts on my pc and server were the same: > export CFLAGS="-O2" > ./configure --enable-debug --with-perl --with-icu --enable-depend > --enable-tap-tests > #reinstall > #reinitdb > #create database bench > for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1; > psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 > -j6 bench; Even with scale 8 you're likely significantly impacted by contention. And obviously WAL write latency. See below for why that matters. > I can't understand why i get the opposite results on my pc and on the server. > It is clear that the absolute > TPS values will be different for various configurations. This is normal. But > differences? > Is it unlikely that some kind of reference configuration is needed to > accurately > measure the difference in performance. Probably something wrong with my pc, > but now > i can not figure out what's wrong. One very common reason for symptoms like this are power-saving measures by the CPU. In workloads where the CPU is not meaningfully utilized, the CPU will go into a powersaving mode - which can cause workloads that are latency sensitive to be badly affected. Both because initially the cpu will just work at a lower frequency and because it takes time to shift to a higher latency. Here's an example: I bound the server and psql to the same CPU core (nothing else is allowed to use that core. And ran the following: \o /dev/null SELECT 1; SELECT 1; SELECT 1; SELECT pg_sleep(0.1); SELECT 1; SELECT 1; SELECT 1; Time: 0.181 ms Time: 0.085 ms Time: 0.071 ms Time: 100.474 ms Time: 0.153 ms Time: 0.077 ms Time: 0.069 ms You can see how the first query timing was slower, the next two were faster, and then after the pg_sleep() it's slow again. # tell the CPU to optimize for performance not power cpupower frequency-set --governor performance # disable going to lower power states cpupower idle-set -D0 # disable turbo mode for consistent performance echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo Now the timings are: Time: 0.038 ms Time: 0.028 ms Time: 0.025 ms Time: 1000.262 ms (00:01.000) Time: 0.027 ms Time: 0.024 ms Time: 0.023 ms Look, fast and reasonably consistent timings. Switching back: Time: 0.155 ms Time: 0.123 ms Time: 0.074 ms Time: 1001.235 ms (00:01.001) Time: 0.120 ms Time: 0.077 ms Time: 0.068 ms The perverse thing is that this often means that *reducing* the number of instructions executed yields to *worse* behaviour when under non-sustained load, because from the CPUs point of view there is less need to increase clock speed. To show how much of a difference that can make, I ran pgbench with a single client on one core, and the server on another (so the CPU is idle inbetween): numactl --physcpubind 11 pgbench -n -M prepared -P1 -S -c 1 -T10 With power optimized configuration: latency average = 0.035 ms latency stddev = 0.002 ms initial connection time = 5.255 ms tps = 28434.334672 (without initial connection time) With performance optimized configuration: latency average = 0.025 ms latency stddev = 0.001 ms initial connection time = 3.544 ms tps = 40079.995935 (without initial connection time) That's a whopping 1.4x in throughput! Now, the same thing, except that I used a custom workload where pgbench transactions are executed in a pipelined fashion, 100 read-only transactions in one script execution: With power optimized configuration: latency average = 1.055 ms latency stddev = 0.125 ms initial connection time = 6.915 ms tps = 947.985286 (without initial connection time) (this means we actually executed 94798.5286 readonly pgbench transactions/s) With performance optimized configuration: latency average = 1.376 ms latency stddev = 0.083 ms initial connection time = 3.759 ms tps = 726.849018 (without initial connection time) Suddenly the super-duper performance optimized settings are worse (but note that stddev is down)! I suspect the problem is that now because we disabled idle states, the cpu ends up clocking *lower*, due to power usage. If I just change the relevant *cores* to the performance optimized configuration: cpupower -c 10,11 idle-set -D0; cpupower -c 10,11 frequency-set --governor performance latency average = 0.940 ms latency stddev = 0.061 ms initial connection time = 3.311 ms tps = 1063.719116 (without initial connection time) It wins again. Now, realistically you'd never use -D0 (i.e. disabling all downclocking, not just lower states) - the power differential is quite big and as shown here it can hurt performance as well. On an idle system, looking at the cpu power usage with: powerstat -D -R 5 1000 TimeUser Nice Sys IdleIO Run Ctxt/s IRQ/s Fork Exec Exit Watts pkg-0 dram pkg-1 09:45:03 0.6 0.0 0.2 99.2 0.01 2861 2823000 46.84 24.82 3.68 18.33 09:45:08 1.0 0.0 0.1 99.0
Re: Some performance degradation in REL_16 vs REL_15
"Anton A. Melnikov" writes: > I can't understand why i get the opposite results on my pc and on the server. > It is clear that the absolute > TPS values will be different for various configurations. This is normal. But > differences? > Is it unlikely that some kind of reference configuration is needed to > accurately > measure the difference in performance. Probably something wrong with my pc, > but now > i can not figure out what's wrong. > Would be very grateful for any advice or comments to clarify this problem. Benchmarking is hard :-(. IME it's absolutely typical to see variations of a couple of percent even when "nothing has changed", for example after modifying some code that's nowhere near any hot code path for the test case. I usually attribute this to cache effects, such as a couple of bits of hot code now sharing or not sharing a cache line. If you use two different compiler versions then that situation is likely to occur all over the place even with exactly the same source code. NUMA creates huge reproducibility problems too on multisocket machines (which your server is IIUC). When I had a multisocket workstation I'd usually bind all the server processes to one socket if I wanted more-or-less-repeatable numbers. I wouldn't put a lot of faith in the idea that measured pgbench differences of up to several percent are meaningful at all, especially when comparing across different hardware and different OS+compiler versions. There are too many variables that have little to do with the theoretical performance of the source code. regards, tom lane
Re: Some performance degradation in REL_16 vs REL_15
On 30.10.2023 22:51, Andres Freund wrote: There's really no point in comparing peformance with assertions enabled (leaving aside assertions that cause extreme performance difference, making development harder). We very well might have added assertions making things more expensive, without affecting performance in optimized/non-assert builds. Thanks for advice! I repeated measurements on my pc without asserts and CFLAGS="-O2". Also i reduced the number of clients to -c6 to leave a reserve of two cores from my 8-core cpu and used -j6 accordingly. The results were similar: on my pc REL_10_STABLE(c18c12c9) was faster than REL_16_STABLE(07494a0d) but the effect became weaker: REL_10_STABLE gives ~965+-15 TPS(+-2%) while REL_16_STABLE gives ~920+-30 TPS(+-3%) in the test: pgbench -s8 -c6 -T20 -j6 So 10 is faster than 16 by ~5%. (see raw-my-pc.txt attached for the raw data) Then, thanks to my colleagues, i carried out similar measurements on the more powerful 24-core standalone server. The REL_10_STABLE gives 8260+-100 TPS(+-1%) while REL_16_STABLE gives 8580+-90 TPS(+-1%) in the same test: pgbench -s8 -c6 -T20 -j6 The test gave an opposite result! On that server the 16 is faster than 10 by ~4%. When i scaled the test on server to get the same reserve of two cores, the results became like this: REL_10_STABLE gives ~16000+-300 TPS(+-2%) while REL_16_STABLE gives ~18500+-200 TPS(+-1%) in the scaled test: pgbench -s24 -c22 -T20 -j22 Here the difference is more noticeable: 16 is faster than 10 by ~15%. (raw-server.txt) The configure options and test scripts on my pc and server were the same: export CFLAGS="-O2" ./configure --enable-debug --with-perl --with-icu --enable-depend --enable-tap-tests #reinstall #reinitdb #create database bench for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1; psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 -j6 bench; Configurations: my pc: 8-core AMD Ryzen 7 4700U @ 1.4GHz, 64GB RAM, NVMe M.2 SSD drive. Linux 5.15.0-88-generic #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux server: 2x 12-hyperthreaded cores Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 24GB RAM, RAID from SSD drives. Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux I can't understand why i get the opposite results on my pc and on the server. It is clear that the absolute TPS values will be different for various configurations. This is normal. But differences? Is it unlikely that some kind of reference configuration is needed to accurately measure the difference in performance. Probably something wrong with my pc, but now i can not figure out what's wrong. Would be very grateful for any advice or comments to clarify this problem. With the best wishes! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company REL_10_STABLE c18c12c983a84d55e58b176969782c7ffac3272b pgbench -s8 -c6 -T20 -j6 940.47198 954.621585 902.319686 965.387517 959.44536 970.882218 922.012141 969.642272 964.549628 935.639076 958.835093 976.912892 975.618375 960.599515 981.900039 973.34447 964.563699 960.321335 962.643262 975.631214 971.78315 965.226256 961.106572 968.520002 973.825485 978.49579 963.863368 973.906058 966.676175 965.186708 954.572371 977.620229 981.419347 962.751969 963.676599 967.966257 974.68403 955.342462 957.832817 984.065968 972.364891 977.489316 957.352265 969.463156 974.320994 949.679765 969.081674 963.190554 962.394456 966.84177 975.840044 954.471689 977.764019 968.67597 963.203923 964.752374 965.957151 979.17749 915.412491 975.120789 962.105916 980.343235 957.180492 953.552183 979.783099 967.906392 966.926945 962.962301 965.53471 971.030289 954.21045 961.266889 973.367193 956.736464 980.317352 911.188865 979.274233 980.267316 982.029926 977.731543 937.327052 978.161778 978.575841 962.661776 914.896072 966.902901 973.539272 980.418576 966.073472 963.196341 962.718863 977.062467 958.303102 959.937588 959.52382 934.876632 971.899844 979.71 964.154208 960.051284 REL_16_STABLE 07494a0df9a66219e5f1029de47ecabc34c9cb55 pgbench -s8 -c6 -T20 -j6 952.061203 905.964458 921.009294 921.970342 924.810464 935.988344 917.110599 925.62075 933.423024 923.445651 932.740532 927.994569 913.773152 922.955946 917.680486 923.145074 925.133017 922.36253 907.656249 927.980182 924.249294 933.355461 923.359649 919.694726 923.178731 929.250348 921.643735 916.546247 930.960814 913.333819 773.157522 945.293028 924.839608 932.228796 912.846096 924.01411 924.341422 909.823505 882.105606 920.337305 930.297982 909.238148 880.839643 939.582115 927.263785 927.921499 932.897521 931.77316 943.261293 853.433421 921.813303 916.463003 919.652647 914.662188 912.137913 923.279822 922.967526 936.344334 946.281347 801.718759 950.571673 928.845848 888.181388 885.603875 939.763546 896.841216 934.904546 929.369005 884.065433 874.953048 933.411683 930.654935 952.833611 942.193108 930.491705
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-10-30 15:28:53 +0300, Anton A. Melnikov wrote: > For REL_16_STABLE at 7cc2f59dd the average TPS was: 2020+-70, > for REL_10_STABLE at c18c12c98 - 2260+-70 > > The percentage difference was approximately 11%. > Please see the 16vs10.png picture with the graphical representation of the > data obtained. > Also there are the raw data in the raw_data_s21.txt. > > In some days i hope to perform additional measurements that were mentioned > above in this letter. > It would be interesting to establish the reason for this difference. And i > would be very grateful > if you could advise me what other settings can be tweaked. There's really no point in comparing peformance with assertions enabled (leaving aside assertions that cause extreme performance difference, making development harder). We very well might have added assertions making things more expensive, without affecting performance in optimized/non-assert builds. Greetings, Andres Freund
Re: Some performance degradation in REL_16 vs REL_15
=?utf-8?B?6YKx5a6H6Iiq?= writes: > I wrote a script and test on branch REL_[10-16]_STABLE, and do see > performance drop in REL_13_STABLE, which is about 1~2%. I'm really skeptical that we should pay much attention to these numbers. You've made several of the mistakes that we typically tell people not to make when using pgbench: * scale <= number of sessions means you're measuring a lot of row-update contention * once you crank up the scale enough to avoid that problem, running with the default shared_buffers seems like a pretty poor choice * 10-second runtime is probably an order of magnitude too small to get useful, reliable numbers On top of all that, discrepancies on the order of a percent or two commonly arise from hard-to-control-for effects like the cache alignment of hot spots in different parts of the code. That means that you can see changes of that size from nothing more than day-to-day changes in completely unrelated parts of the code. I'd get excited about say a 10% performance drop, because that's probably more than noise; but I'm not convinced that any of the differences you show here are more than noise. regards, tom lane
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-10-16 11:04:25 +0300, Anton A. Melnikov wrote: > On 13.10.2023 05:05, Andres Freund wrote: > > Could you provide a bit more details about how you ran the benchmark? The > > reason I am asking is that ~330 TPS is pretty slow for -c20. Even on > > spinning > > rust and using the default settings, I get considerably higher results. > > > > Oh - I do get results closer to yours if I use pgbench scale 1, causing a > > lot > > of row level contention. What scale did you use? > > > I use default scale of 1. That means you're largely going to be bottlenecked due to row level contention. For read/write pgbench you normally want to use a scale that's bigger than the client count, best by at least 2x. Have you built postgres with assertions enabled or such? What is the server configuration for both versions? > And run the command sequence: > $pgbench -i bench > $sleep 1 > $pgbench -c20 -T10 -j8 I assume you also specify the database name here, given you specified it for pgbench -i? As you're not doing a new initdb here, the state of the cluster will substantially depend on what has run before. This can matter substantially because a cluster with prior substantial write activity will already have initialized WAL files and can reuse them cheaply, whereas one without that activity needs to initialize new files. Although that matters a bit less with scale 1, because there's just not a whole lot of writes. At the very least you should trigger a checkpoint before or after pgbench -i. The performance between having a checkpoint during the pgbench run or not is substantially different, and if you're not triggering one explicitly, it'll be up to random chance whether it happens during the run or not. It's less important if you run pgbench for an extended time, but if you do it just for 10s... E.g. on my workstation, if there's no checkpoint, I get around 633 TPS across repeated runs, but if there's a checkpoint between pgbench -i and the pgbench run, it's around 615 TPS. Greetings, Andres Freund
Re: Some performance degradation in REL_16 vs REL_15
I wrote a script and test on branch REL_[10-16]_STABLE, and do see performance drop in REL_13_STABLE, which is about 1~2%. scale round 10 11 12 13 14 15 16 1 1 7922.2 8018.3 8102.8 7838.3 7829.2 7870.0 7846.1 2 7922.4 7923.5 8090.3 7887.7 7912.4 7815.2 7865.6 3 7937.6 7964.9 8012.8 7918.5 7879.4 7786.4 7981.1 4 8000.4 7959.5 8141.1 7886.3 7840.9 7863.5 8022.4 5 7921.8 7945.5 8005.2 7993.7 7957.0 7803.8 7899.8 6 7893.8 7895.1 8017.2 7879.8 7880.9 7911.4 7909.2 7 7879.3 7853.5 8071.7 7956.2 7876.7 7863.3 7986.3 8 7980.5 7964.1 8119.2 8015.2 7877.6 7784.9 7923.6 9 8083.9 7946.4 7960.3 7913.9 7924.6 7867.7 7928.6 10 7971.2 7991.8 7999.5 7812.4 7824.3 7831.0 7953.4 AVG 7951.3 7946.3 8052.0 7910.2 7880.3 7839.7 7931.6 MED 7930.0 7952.9 8044.5 7900.8 7878.5 7847.1 7926.1 10 1 41221.5 41394.8 40926.8 40566.6 41661.3 40511.9 40961.8 2 40974.0 40697.9 40842.4 40269.2 41127.7 40795.5 40814.9 3 41453.5 41426.4 41066.2 40890.9 41018.6 40897.3 40891.7 4 41691.9 40294.9 41189.8 40873.8 41539.7 40943.2 40643.8 5 40843.4 40855.5 41243.8 40351.3 40863.2 40839.6 40795.5 6 40969.3 40897.9 41380.8 40734.7 41269.3 41301.0 41061.0 7 40981.1 41119.5 41158.0 40834.6 40967.1 40790.6 41061.6 8 41006.4 41205.9 40740.3 40978.7 40742.4 40951.6 41242.1 9 41089.9 41129.7 40648.3 40622.1 40782.0 40460.5 40877.9 10 41280.3 41462.7 41316.4 40728.0 40983.9 40747.0 40964.6 AVG 41151.1 41048.5 41051.3 40685.0 41095.5 40823.8 40931.5 MED 41048.2 41124.6 41112.1 40731.3 41001.3 40817.6 40926.7 100 1 43429.0 43190.2 44099.3 43941.5 43883.3 44215.0 44604.9 2 43281.7 43795.2 44963.6 44331.5 43559.7 43571.5 43403.9 3 43749.0 43614.1 44616.7 43759.5 43617.8 43530.3 43362.4 4 43362.0 43197.3 44296.7 43692.4 42020.5 43607.3 43081.8 5 43373.4 43288.0 44240.9 43795.0 43630.6 43576.7 43512.0 6 43637.0 43385.2 45130.1 43792.5 43635.4 43905.2 43371.2 7 43621.2 43474.2 43735.0 43592.2 43889.7 43947.7 43369.8 8 43351.0 43937.5 44285.6 43877.2 43771.1 43879.1 43680.4 9 43481.3 43700.5 44119.9 43786.9 43440.8 44083.1 43563.2 10 43238.7 43559.5 44310.8 43406.0 44306.6 43376.3 43242.7 AVG 43452.4 43514.2 44379.9 43797.5 43575.6 43769.2 43519.2 MED 43401.2 43516.8 44291.2 43789.7 43633.0 43743.2 43387.5 The script looks like: initdb data >/dev/null 2>&1 #initdb on every round pg_ctl -D data -l logfile start >/dev/null 2>&1 #start without changing any setting pgbench -i postgres $scale >/dev/null 2>&1 sleep 1 >/dev/null 2>&1 pgbench -c20 -T10 -j8 And here is the pg_config output: ... CONFIGURE = '--enable-debug' '--prefix=/home/postgres/base' '--enable-depend' 'PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig::/usr/lib/pkgconfig' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 CFLAGS_SL = -fPIC LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/home/postgres/base/lib',--enable-new-dtags LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lz -lreadline -lpthread -lrt -ldl -lm VERSION = PostgreSQL 16.0 —- Yuhang Qiu
Re: Some performance degradation in REL_16 vs REL_15
On 13.10.2023 05:05, Andres Freund wrote: Could you provide a bit more details about how you ran the benchmark? The reason I am asking is that ~330 TPS is pretty slow for -c20. Even on spinning rust and using the default settings, I get considerably higher results. Oh - I do get results closer to yours if I use pgbench scale 1, causing a lot of row level contention. What scale did you use? I use default scale of 1. And run the command sequence: $pgbench -i bench $sleep 1 $pgbench -c20 -T10 -j8 in a loop to get similar initial conditions for every "pgbench -c20 -T10 -j8" run. Thanks for your interest! With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-10-12 11:00:22 +0300, Anton A. Melnikov wrote: > Found that simple test pgbench -c20 -T20 -j8 gives approximately > for REL_15_STABLE at 5143f76: 336+-1 TPS > and > for REL_16_STABLE at 4ac7635f: 324+-1 TPS > > The performance drop is approximately 3,5% while the corrected standard > deviation is only 0.3%. > See the raw_data.txt attached. Could you provide a bit more details about how you ran the benchmark? The reason I am asking is that ~330 TPS is pretty slow for -c20. Even on spinning rust and using the default settings, I get considerably higher results. Oh - I do get results closer to yours if I use pgbench scale 1, causing a lot of row level contention. What scale did you use? Greetings, Andres Freund
Re: Some performance degradation in REL_16 vs REL_15
On Thu, Oct 12, 2023 at 09:20:36PM +1300, David Rowley wrote: > It would be interesting to know what's to blame here and if you can > attribute it to a certain commit. +1. -- Michael signature.asc Description: PGP signature
Re: Some performance degradation in REL_16 vs REL_15
On Thu, 12 Oct 2023 at 21:01, Anton A. Melnikov wrote: > > Greetengs! > > Found that simple test pgbench -c20 -T20 -j8 gives approximately > for REL_15_STABLE at 5143f76: 336+-1 TPS > and > for REL_16_STABLE at 4ac7635f: 324+-1 TPS > > And is it worth spending time bisecting for the commit where this degradation > may have occurred? It would be interesting to know what's to blame here and if you can attribute it to a certain commit. David