And Just to reiterate my own understanding of this...
autovacuum priority is less than a user-initiated request, so issuing a
manual vacuum (user-initiated request) will not result in being cancelled.
Regards,
Michael Vitale
Jeff Janes wrote on 12/6/2019 12:47 PM:
On Fri, Dec 6, 2019 at 10:55 AM Mike Schanne <mscha...@kns.com
<mailto:mscha...@kns.com>> wrote:
The error is not actually showing up very often (I have 8
occurrences from 11/29 and none since then). So maybe I should
not be concerned about it. I suspect we have an I/O bottleneck
from other logs (i.e. long checkpoint sync times), so this error
may be a symptom rather than the cause.
I think that at the point it is getting cancelled, it has done all the
work except the truncation of the empty pages, and reporting the
results (for example, updating n_live_tup and n_dead_tup). If this
happens every single time (neither last_autovacuum nor last_vacuum
ever advances) it will eventually cause problems. So this is mostly a
symptom, but not entirely. Simply running a manual vacuum should fix
the reporting problem. It is not subject to cancelling, so it will
detect it is blocking someone and gracefully bow. Meaning it will
suspend the truncation, but will still report its results as normal.
Reading the table backwards in order to truncate it might be
contributing to the IO problems as well as being a victim of those
problems. Upgrading to v10 might help with this, as it implemented a
prefetch where it reads the table forward in 128kB chunks, and then
jumps backwards one chunk at a time. Rather than just reading
backwards 8kB at a time.
Cheers,
Jeff