Thank you for engaging into the conversation and sharing your thoughts Sami and Robert.
On Thu, May 8, 2025 at 2:57 PM Sami Imseih <samims...@gmail.com> wrote: > > actually determine you are in this situation, or how bad the situation > > was, in order to know that setting truncate off would help? To that > > That's definitely a sound idea and the way I was able to replicate and learn myself was by doing a PATCH that would just print the # of attempts it was doing. That said, just to add more color - I was coming at this from a point of view where truncation would be on a "best effort" basis and the trade off here would be less availability risk at the cost of missed truncations. My first instinct was that, having vacuum_truncate set to ON by default on busy systems that have a lot of churn can catch a lot of developers and PostgreSQL by surprise (sure did to me :D), so the best effort and "fail fast" behavior could help, however at the same time I agree that it doesn't solve either of the problems really well. Another idea I had around this section was to expose an attribute/guc/setting like VACUUM_TRUNCATE_INTERRUPTION_MAX_RETRIES with a default of `1` or something, that users could optionally configure, perhaps that could be a middleground, but I am myself not fully sold on the idea either (?). > Adding counters for this area is not a bad idea in general, as this hits > customers particularly hard on hot standbys when the truncate does actually > occur on the primary. > > What about adding cumulative counters ( per table and pg_stat_database > ) such as: > > pages_vac_truncated - # of pages truncated by vacuum > vac_truncate_conflicts - # of time truncate was skipped due to conflict > vac_truncate_suspended - # of times the truncate was suspended. > > The difference between conflict and suspended is conflict causes vacuum > to skip the truncate phase while suspended causes vacuum to retry the > phase continually. > Would vac_truncate_retries or something similar be more immediately clear? If so, maybe something like - vac_truncate_interruptions - # of time truncate was skipped due to conflict and not retried - vac_truncate_retries - # of times the truncate was suspended and retried > The argument against adding these counters is that the views, > pg_stat_all_tables/ > pg_stat_all_database are becoming super-wide, so maybe we need to think > about inventing a new view for vacuum related counter metrics. This seems > like > a good discussion for v19. > > +1 for this. I'd even be happy to even just start showing the attempts as part of INFO or VERBOSE on conflicts & suspensions, without keeping track, but it is useful statistical data. Thanks Shayon