vacuum_cost_page_miss has a default value of 10, while vacuum_cost_page_dirty has a default value of 20. This has been the case since cost-based delays were introduced by commit f425b605f4e back in 2004. The obvious implication is that dirtying a page is on average only twice as expensive as a single shared_buffers miss (that doesn't dirty the page). While that might have made sense back in 2004, when magnetic disks were the norm, it seems questionable now.
The trend these days is that the total number of dirty pages is the limiting factor for OLTP workloads. This is a broad trend among all disk-based RDBMSs with an ARIES style design. It is more or less a result of long term trends in main memory size scaling and flash storage. It's rare for OLTP workloads to be truly I/O bound, and yet the backpressure from page cleaning/checkpointing becomes a bottleneck. In short, writes are way more expensive than reads -- the *relative* cost of writes has increased significantly (our use of the OS FS cache makes this even worse). I suspect that this trend will become even more important for Postgres in the coming years, but that's not what I want to talk about right now. I just want to talk about vacuum_cost_page_miss on this thread. Simply decreasing vacuum_cost_page_dirty seems like a low risk way of making the VACUUM costing more useful within autovacuum workers. Halving vacuum_cost_page_dirty to 5 would be a good start, though I think that a value as low as 2 would be better. That would make it only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing a page that is in shared_buffers but did not need to be dirtied), which seems sensible to me when considered in the context in which the value is actually applied (and not some abstract theoretical context). There are a few reasons why this seems like a good idea now: * Throttling/delaying VACUUM is only useful as a way of smoothing the impact on production queries, which is an important goal, but currently we don't discriminate against the cost that we really should keep under control (dirtying new pages within VACUUM) very well. This is due to the aforementioned trends, the use of a strategy ring buffer by VACUUM, the fact that indexes are usually vacuumed in sequential physical order these days, and many other things that were not a factor in 2004. * There is a real downside to throttling VACUUM unnecessarily, and the effects are *non-linear*. On a large table, the oldest xmin cutoff may become very old by the time we're only (say) half way through the initial table scan in lazy_scan_heap(). There may be relatively little work to do because most of the dead tuples won't be before the oldest xmin cutoff by that time (VACUUM just cannot keep up). Excessive throttling for simple page misses may actually *increase* the amount of I/O that VACUUM has to do over time -- we should try to get to the pages that actually need to be vacuumed quickly, which are probably already dirty anyway (and thus are probably going to add little to the cost delay limit in practice). Everything is connected to everything else. * vacuum_cost_page_miss is very much not like random_page_cost, and the similar names confuse the issue -- this is not an optimization problem. Thinking about VACUUM as unrelated to the workload itself is obviously wrong. Changing the default is also an opportunity to clear that up. Even if I am wrong to suggest that a miss within VACUUM should only be thought of as 2x as expensive as a hit in some *general* sense, I am concerned about *specific* consequences. There is no question about picking the best access path here -- we're still going to have to access the same blocks in the same way sooner or later. In general I think that we should move in the direction of more frequent, cheaper VACUUM operations [1], though we've already done a lot of work in that direction (e.g. freeze map work). * Some impact from VACUUM on query performance may in fact be a good thing. Deferring the cost of vacuuming can only make sense if we'll eventually be able to catch up because we're experiencing a surge in demand, which seems kind of optimistic -- it seems more likely that the GC debt will just grow and grow. Why should the DBA not expect to experience some kind of impact, which could be viewed as a natural kind of backpressure? The most important thing is consistent performance. * Other recent work such as the vacuum_cleanup_index_scale_factor patch has increased the relative cost of index vacuuming in some important cases: we don't have a visibility/freeze map for indexes, but index vacuuming that doesn't dirty any pages and has a TID kill list that's concentrated at the end of the heap/table is pretty cheap (the TID binary search is cache efficient/cheap). This change will help these workloads by better reflecting the way in which index vacuuming can be cheap for append-only tables with a small amount of garbage for recently inserted tuples that also got updated/deleted. * Lowering vacuum_cost_page_miss's default (as opposed to changing something else) is a simple and less disruptive way of achieving these goals. This approach seems unlikely to break existing VACUUM-related custom settings from current versions that get reused on upgrade. I expect little impact on small installations. [1] https://postgr.es/m/cad21aod0ske11fmw4jd4renawbmcw1wasvnwpjvw3tvqpoq...@mail.gmail.com -- Peter Geoghegan