------------------------------------------------------------------ 发件人:段坤仁(刻韧) <[email protected]> 发送时间:2025年11月14日(周五) 10:17 收件人:"pgsql-hackers"<[email protected]> 主 题:autovacuum cost delay improvement thread Hi all, Recently I noticed an interesting thread about table scheduling in autovacuum in [0], which contained some constructive discussions. After testing nathan's patch, I attempted to make improvements on some issues mentioned in the discussion, mainly related to cost delay. Cost delay was originally introduced to prevent autovacuum workers from consuming excessive resources and impacting normal workloads. However, this value tends to be too low when system load is intense and too high when system is idle. The initial idea is straightforward: we can automatically adjust cost delay based on system load. Currently, I am primarily considering IO resources (Maybe CPU and other resources in the future) and have implemented the following mechanisms: 1.IO Load Monitoring: Tracks IO wait events across all backend processes with weighted contributions (IO-sensitive backends contribute more weight than others). 2.Dynamic Mode Selection: Introduce three modes based on IO wait thresholds: FAST mode: When IO load is low (few of processes in IO wait) SLOW mode: When IO load is high (lots of processes in IO wait) NORMAL mode: Otherwise 3.Adaptive Delay Adjustment: Modifies vacuum delay at vacuum_delay_point(), Normal Mode use guc configured delay, skip delays in FAST mode and add extra delays in SLOW mode based on detected IO contention. I tried using Sami's scripts in[0] to perform some tests. I added a debug GUC parameter 'debug_autovacuum_adaptive_cost_delay'for testing, with the results displayed in result-analysis.txt. My PostgreSQL configuration is as follows: 、、、 max_connections=1000 autovacuum_naptime = '10s' shared_buffers = '4GB' autovacuum_max_workers = 6 autovacuum_vacuum_cost_delay =2ms autovacuum_vacuum_cost_limit =200 、、、 ################################################## ## ADAPTIVE OFF ################################################## relname | reltuples | n_dead_tup | avg_av_time | avg_aa_time ------------------+-----------+------------+-------------+------------- table_batch_2 | 38100700 | 0 | 63426.75 | 2788.63 table_batch_3 | 34528300 | 0 | 43569.38 | 2752.13 table_batch_1 | 21774900 | 0 | 33058.09 | 2675.13 table_13_4096000 | 4094220 | 1663211 | 0.00 | 12047.00 table_12_2048000 | 2045990 | 1202655 | 688704.00 | 10698.33 table_11_1024000 | 1023520 | 446236 | 337606.00 | 17622.00 table_10_512000 | 510631 | 523053 | 138559.00 | 13200.89 table_9_256000 | 253966 | 338711 | 54128.55 | 9159.77 table_8_128000 | 127072 | 304428 | 17124.50 | 3508.30 table_7_64000 | 63934 | 384090 | 4450.55 | 834.38 table_6_32000 | 31999 | 397000 | 1690.46 | 453.33 table_5_16000 | 16000 | 403389 | 1030.89 | 282.55 table_4_8000 | 8000 | 408596 | 846.87 | 221.53 table_3_4000 | 4000 | 413417 | 881.73 | 224.77 table_16_4000 | 3998 | 424019 | 956.04 | 199.35 table_2_2000 | 1996 | 433390 | 829.55 | 207.11 table_15_2000 | 1988 | 429912 | 868.61 | 216.27 table_14_1000 | 991 | 443569 | 856.38 | 149.44 table_1_1000 | 986 | 438926 | 795.70 | 161.82 ############################################ ## ADAPTIVE ON ############################################ relname | reltuples | n_dead_tup | avg_av_time | avg_aa_time ------------------+-----------+------------+-------------+------------- table_batch_3 | 57988400 | 0 | 711.95 | 212.73 table_batch_2 | 51700100 | 0 | 549.80 | 203.18 table_batch_1 | 49688200 | 0 | 574.58 | 206.67 table_13_4096000 | 4096250 | 22439 | 9937.00 | 400.60 table_12_2048000 | 2048270 | 2506 | 4867.50 | 375.00 table_11_1024000 | 1023880 | 211596 | 2351.57 | 389.25 table_10_512000 | 512056 | 23575 | 1024.93 | 352.03 table_9_256000 | 256011 | 42427 | 429.28 | 309.75 table_8_128000 | 127992 | 11739 | 167.56 | 263.62 table_7_64000 | 63998 | 15171 | 69.52 | 221.04 table_6_32000 | 31998 | 5787 | 32.94 | 192.52 table_5_16000 | 16000 | 6769 | 18.11 | 92.08 table_4_8000 | 8000 | 6811 | 10.40 | 41.37 table_3_4000 | 4000 | 7058 | 6.40 | 18.58 table_16_4000 | 4000 | 6786 | 6.32 | 18.55 table_15_2000 | 2000 | 6783 | 4.45 | 8.93 table_2_2000 | 2000 | 6936 | 4.58 | 8.89 table_1_1000 | 1000 | 6798 | 3.66 | 5.50 table_14_1000 | 1000 | 6700 | 3.66 | 5.50 Result shows that av_time decreased significantly, and there is no noticeable degradation in system performance. So far, this is just an exploratory patch with some areas that can be improved. Any constructive feedback and suggestions are welcome. Thank you all -- Duan Alibaba Cloud Computing [0] https://www.postgresql.org/message-id/flat/aOaAuXREwnPZVISO%40nathan <https://www.postgresql.org/message-id/flat/aOaAuXREwnPZVISO%40nathan >
v1-0001-autovacuum-cost-based-delay-improvements.patch
Description: Binary data
result-analysis.txt
Description: Binary data
