To be honest, I haven’t provided extensions for the PostgreSQL [0] to
hackers yet, nor have I encountered this situation in general. Just in
case, I created an open repository on GitHub with the code and added a
description in the README.
[0] https://github.com/Alena0704/vacuum_statistics#
On 04.09.2025 18:49, Alena Rybakina wrote:
Hi, all!
On 02.06.2025 19:50, Alena Rybakina wrote:
On 02.06.2025 19:25, Alexander Korotkov wrote:
On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
<[email protected]> wrote:
On 12.05.2025 08:30, Amit Kapila wrote:
On Fri, May 9, 2025 at 5:34 PM Alena Rybakina
<[email protected]> wrote:
I did a rebase and finished the part with storing statistics
separately from the relation statistics - now it is possible to
disable the collection of statistics for relationsh using gucs and
this allows us to solve the problem with the memory consumed.
I think this patch is trying to collect data similar to what we do
for
pg_stat_statements for SQL statements. So, can't we follow a similar
idea such that these additional statistics will be collected once
some
external module like pg_stat_statements is enabled? That module
should
be responsible for accumulating and resetting the data, so we won't
have this memory consumption issue.
The idea is good, it will require one hook for the
pgstat_report_vacuum
function, the extvac_stats_start and extvac_stats_end functions can be
run if the extension is loaded, so as not to add more hooks.
+1
Nice idea of a hook. Given the volume of the patch, it might be a
good idea to keep this as an extension.
Okay, I'll realize it and apply the patch)
But I see a problem here with tracking deleted objects for which
statistics are no longer needed. There are two solutions to this and I
don't like both of them, to be honest.
The first way is to add a background process that will go through the
table with saved statistics and check whether the relation or the
database are relevant now or not and if not, then
delete the vacuum statistics information for it. This may be
resource-intensive. The second way is to add hooks for deleting the
database and relationships (functions dropdb, index_drop,
heap_drop_with_catalog).
Can we workaround this with object_access_hook?
I think this could fix the problem. For the OAT-DROP access type, we
can call a function to reset the vacuum statistics for relations that
are about to be dropped.
At the moment, I don’t see any limitations to using this approach.
I’ve prepared the first working version of the extension.
I haven’t yet implemented writing the statistics to a file and
reloading them into a hash table and shared memory at instance
startup, and I also haven’t implemented a proper output for
database-level statistics yet.
I structured the extension as follows: statistics are stored in a hash
table keyed by a composite key - database OID, relation OID, and
object type (index, table, or database). When VACUUM or a worker
processes a table or index, an exclusive lock is taken to update the
corresponding record; a shared lock is taken when reading the
statistics. For database-level output, I plan to compute the totals by
summing table and index statistics on demand.
To optimize that, I plan to keep entries in the hash table ordered by
database OID. When accessing the first element by the partial key
(database OID), I’ll scan forward and aggregate until the partitial
database key changes.
Right now this requires adding the extension to
`shared_preload_libraries`. I haven’t found a way to avoid that
because of shared-memory setup, and I’m not sure it’s even possible.
I’m also unsure whether it’s better to store the statistics in the
cumulative statistics system (as done here) or entirely inside the
extension. Note that the code added to the core to support the
extension executes regardless of whether the extension is enabled.