The error is not actually showing up very often (I have 8 occurrences from 
11/29 and none since then).  So maybe I should not be concerned about it.  I 
suspect we have an I/O bottleneck from other logs (i.e. long checkpoint sync 
times), so this error may be a symptom rather than the cause.

From: Jeff Janes [mailto:jeff.ja...@gmail.com]
Sent: Thursday, December 05, 2019 6:55 PM
To: Mike Schanne
Cc: pgsql-performa...@postgresql.org
Subject: Re: autovacuum locking question

On Thu, Dec 5, 2019 at 5:26 PM Mike Schanne 
<mscha...@kns.com<mailto:mscha...@kns.com>> wrote:
Hi,
I am investigating a performance problem in our application and am seeing 
something unexpected in the postgres logs regarding the autovacuum.

2019-12-01 13:05:39.029 
UTC,"wb","postgres",6966,"127.0.0.1:53976<http://127.0.0.1:53976>",5ddbd990.1b36,17099,"INSERT
 waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,00000,"process 6966 
still waiting for RowExclusiveLock on relation 32938 of database 32768 after 
1000.085 ms","Process holding the lock: 6045. Wait queue: 6966.",,,,,"INSERT 
INTO myschema.mytable (...) VALUES (...) RETURNING 
process.mytable.mytable_id",13,,""
2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 
UTC,10/417900,0,ERROR,57014,"canceling autovacuum task",,,,,"automatic vacuum 
of table ""postgres.myschema.mytable""",,,,""

My understanding from reading the documentation was that a vacuum can run 
concurrently with table inserts/updates, but from reading the logs it appears 
they are conflicting over a row lock.  This particular table gets very frequent 
inserts/updates (10-100 inserts / sec) so I am concerned that if the autovacuum 
is constantly canceled, then the table never gets cleaned and its performance 
will continue to degrade over time.  Is it expected for the vacuum to be 
canceled by an insert in this way?

We are using postgres 9.6.10.

If the vacuum finds a lot of empty pages at the end of the table, it will try 
to truncate them and takes a strong lock to do so.  It is supposed to check 
every 20ms to see if anyone else is blocked on that lock, at which point it 
stops doing the truncation and releases the lock.  So it should never get 
"caught" holding the lock in order to be cancelled.  Is your setting for 
deadlock_timeout much lower than usual?  Also, if the truncation is bogged down 
in very slow IO, perhaps it doesn't actually get around to checking ever 20ms 
despite its intentionsl

How often have you seen it in the logs?

Cheers,

Jeff

________________________________

This email is non-binding, is subject to contract, and neither Kulicke and 
Soffa Industries, Inc. nor its subsidiaries (each and collectively “K&S”) shall 
have any obligation to you to consummate the transactions herein or to enter 
into any agreement, other than in accordance with the terms and conditions of a 
definitive agreement if and when negotiated, finalized and executed between the 
parties. This email and all its contents are protected by International and 
United States copyright laws. Any reproduction or use of all or any part of 
this email without the express written consent of K&S is prohibited.

Reply via email to