Re: Vacuum statistics

Alena Rybakina Thu, 04 Sep 2025 09:18:52 -0700

To be honest, I haven’t provided extensions for the PostgreSQL [0] tohackers yet, nor have I encountered this situation in general. Just incase, I created an open repository on GitHub with the code and added adescription in the README.


[0] https://github.com/Alena0704/vacuum_statistics#


On 04.09.2025 18:49, Alena Rybakina wrote:

Hi, all!

On 02.06.2025 19:50, Alena Rybakina wrote:
On 02.06.2025 19:25, Alexander Korotkov wrote:
On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
<[email protected]> wrote:
On 12.05.2025 08:30, Amit Kapila wrote:
On Fri, May 9, 2025 at 5:34 PM Alena Rybakina<[email protected]> wrote:
I did a rebase and finished the part with storing statisticsseparately from the relation statistics - now it is possible todisable the collection of statistics for relationsh using gucs and
this allows us to solve the problem with the memory consumed.
I think this patch is trying to collect data similar to what we dofor
pg_stat_statements for SQL statements. So, can't we follow a similar
idea such that these additional statistics will be collected oncesomeexternal module like pg_stat_statements is enabled? That moduleshould
be responsible for accumulating and resetting the data, so we won't
have this memory consumption issue.
The idea is good, it will require one hook for thepgstat_report_vacuum
function, the extvac_stats_start and extvac_stats_end functions can be
run if the extension is loaded, so as not to add more hooks.
+1
Nice idea of a hook.  Given the volume of the patch, it might be a
good idea to keep this as an extension.
Okay, I'll realize it and apply the patch)
But I see a problem here with tracking deleted objects for which
statistics are no longer needed. There are two solutions to this and I
don't like both of them, to be honest.
The first way is to add a background process that will go through the
table with saved statistics and check whether the relation or the
database are relevant now or not and if not, then
delete the vacuum statistics information for it. This may be
resource-intensive. The second way is to add hooks for deleting the
database and relationships (functions dropdb, index_drop,
heap_drop_with_catalog).
Can we workaround this with object_access_hook?
I think this could fix the problem. For the OAT-DROP access type, wecan call a function to reset the vacuum statistics for relations thatare about to be dropped.
At the moment, I don’t see any limitations to using this approach.
I’ve prepared the first working version of the extension.
I haven’t yet implemented writing the statistics to a file andreloading them into a hash table and shared memory at instancestartup, and I also haven’t implemented a proper output fordatabase-level statistics yet.
I structured the extension as follows: statistics are stored in a hashtable keyed by a composite key - database OID, relation OID, andobject type (index, table, or database). When VACUUM or a workerprocesses a table or index, an exclusive lock is taken to update thecorresponding record; a shared lock is taken when reading thestatistics. For database-level output, I plan to compute the totals bysumming table and index statistics on demand.
To optimize that, I plan to keep entries in the hash table ordered bydatabase OID. When accessing the first element by the partial key(database OID), I’ll scan forward and aggregate until the partitialdatabase key changes.
Right now this requires adding the extension to`shared_preload_libraries`. I haven’t found a way to avoid thatbecause of shared-memory setup, and I’m not sure it’s even possible.
I’m also unsure whether it’s better to store the statistics in thecumulative statistics system (as done here) or entirely inside theextension. Note that the code added to the core to support theextension executes regardless of whether the extension is enabled.

Re: Vacuum statistics

Reply via email to