On 11/10/14, 7:52 PM, Tom Lane wrote:
On the whole, I'm +1 for just logging the events and seeing what we learn that way. That seems like an appropriate amount of effort for finding out whether there is really an issue.
Attached is a patch that does this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
>From a8e824900d7c68e2c242b28c9c06c854f01b770a Mon Sep 17 00:00:00 2001 From: Jim Nasby <jim.na...@bluetreble.com> Date: Sun, 30 Nov 2014 20:43:47 -0600 Subject: [PATCH] Log cleanup lock acquisition failures in vacuum --- Notes: Count how many times we fail to grab the page cleanup lock on the first try, logging it with different wording depending on whether scan_all is true. doc/src/sgml/ref/vacuum.sgml | 1 + src/backend/commands/vacuumlazy.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index 450c94f..1272c1c 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -252,6 +252,7 @@ DETAIL: CPU 0.01s/0.06u sec elapsed 0.07 sec. INFO: "onek": found 3000 removable, 1000 nonremovable tuples in 143 pages DETAIL: 0 dead tuples cannot be removed yet. There were 0 unused item pointers. +Could not acquire cleanup lock on 0 pages. 0 pages are entirely empty. CPU 0.07s/0.39u sec elapsed 1.56 sec. INFO: analyzing "public.onek" diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 6db6c5c..8f22ed2 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -105,6 +105,8 @@ typedef struct LVRelStats BlockNumber old_rel_pages; /* previous value of pg_class.relpages */ BlockNumber rel_pages; /* total number of pages */ BlockNumber scanned_pages; /* number of pages we examined */ + /* number of pages we could not initially get lock on */ + BlockNumber nolock; double scanned_tuples; /* counts only tuples on scanned pages */ double old_rel_tuples; /* previous value of pg_class.reltuples */ double new_rel_tuples; /* new estimated total # of tuples */ @@ -346,6 +348,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, ereport(LOG, (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" "pages: %d removed, %d remain\n" + "%s cleanup lock on %u pages.\n" "tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n" "buffer usage: %d hits, %d misses, %d dirtied\n" "avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n" @@ -356,6 +359,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, vacrelstats->num_index_scans, vacrelstats->pages_removed, vacrelstats->rel_pages, + scan_all ? "Waited for" : "Could not acquire", vacrelstats->nolock, vacrelstats->tuples_deleted, vacrelstats->new_rel_tuples, vacrelstats->new_dead_tuples, @@ -611,6 +615,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, /* We need buffer cleanup lock so that we can prune HOT chains. */ if (!ConditionalLockBufferForCleanup(buf)) { + vacrelstats->nolock++; + /* * If we're not scanning the whole relation to guard against XID * wraparound, it's OK to skip vacuuming a page. The next vacuum @@ -1101,10 +1107,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, vacrelstats->scanned_pages, nblocks), errdetail("%.0f dead row versions cannot be removed yet.\n" "There were %.0f unused item pointers.\n" + "%s cleanup lock on %u pages.\n" "%u pages are entirely empty.\n" "%s.", nkeep, nunused, + scan_all ? "Waited for" : "Could not acquire", vacrelstats->nolock, empty_pages, pg_rusage_show(&ru0)))); } -- 2.1.2
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers