Re: autovacuum locking question
On Thu, Dec 5, 2019 at 5:26 PM Mike Schanne wrote: > Hi, > > I am investigating a performance problem in our application and am seeing > something unexpected in the postgres logs regarding the autovacuum. > > > > 2019-12-01 13:05:39.029 > UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT > waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,0,"process > 6966 still waiting for RowExclusiveLock on relation 32938 of database 32768 > after 1000.085 ms","Process holding the lock: 6045. Wait queue: > 6966.","INSERT INTO myschema.mytable (...) VALUES (...) RETURNING > process.mytable.mytable_id",13,,"" > > 2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 > UTC,10/417900,0,ERROR,57014,"canceling autovacuum task","automatic > vacuum of table ""postgres.myschema.mytable""""" > > > > My understanding from reading the documentation was that a vacuum can run > concurrently with table inserts/updates, but from reading the logs it > appears they are conflicting over a row lock. This particular table gets > very frequent inserts/updates (10-100 inserts / sec) so I am concerned that > if the autovacuum is constantly canceled, then the table never gets cleaned > and its performance will continue to degrade over time. Is it expected for > the vacuum to be canceled by an insert in this way? > > > > We are using postgres 9.6.10. > If the vacuum finds a lot of empty pages at the end of the table, it will try to truncate them and takes a strong lock to do so. It is supposed to check every 20ms to see if anyone else is blocked on that lock, at which point it stops doing the truncation and releases the lock. So it should never get "caught" holding the lock in order to be cancelled. Is your setting for deadlock_timeout much lower than usual? Also, if the truncation is bogged down in very slow IO, perhaps it doesn't actually get around to checking ever 20ms despite its intentionsl How often have you seen it in the logs? Cheers, Jeff >
Re: autovacuum locking question
Mike Schanne writes: > I am investigating a performance problem in our application and am seeing > something unexpected in the postgres logs regarding the autovacuum. > 2019-12-01 13:05:39.029 > UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT > waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,0,"process 6966 > still waiting for RowExclusiveLock on relation 32938 of database 32768 after > 1000.085 ms","Process holding the lock: 6045. Wait queue: 6966.","INSERT > INTO myschema.mytable (...) VALUES (...) RETURNING > process.mytable.mytable_id",13,,"" > 2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 > UTC,10/417900,0,ERROR,57014,"canceling autovacuum task","automatic vacuum > of table ""postgres.myschema.mytable""""" > My understanding from reading the documentation was that a vacuum can run > concurrently with table inserts/updates, but from reading the logs it appears > they are conflicting over a row lock. This particular table gets very > frequent inserts/updates (10-100 inserts / sec) so I am concerned that if the > autovacuum is constantly canceled, then the table never gets cleaned and its > performance will continue to degrade over time. Is it expected for the > vacuum to be canceled by an insert in this way? The main part of an autovacuum operation should go through OK. The only part that would get canceled in response to somebody taking a non-exclusive lock is the last step, which is truncation of unused blocks at the end of the table; that requires an exclusive lock. Normally, skipping that step isn't terribly problematic. > We are using postgres 9.6.10. IIRC, we've made improvements in this area since 9.6, to allow a partial truncation to be done if someone wants the lock, rather than just failing entirely. regards, tom lane
Re: autovacuum locking question
On Thu, Dec 5, 2019 at 3:26 PM Mike Schanne wrote: > I am concerned that if the autovacuum is constantly canceled, then the > table never gets cleaned and its performance will continue to degrade over > time. Is it expected for the vacuum to be canceled by an insert in this > way? > > > > We are using postgres 9.6.10. > Have you checked when the table was last autovacuumed in pg_stat_user_tables? If the autovacuum count is high and timestamp of last run is relatively current, then no reason for concern as far as I can figure. Have you already configured (non-default values) for autovacuum options for your system or this table?
Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Thanks, I'll check it out. On Thu, Dec 5, 2019 at 12:51 PM Craig James wrote: > On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson > wrote: > >> Hi, >> >> We are in the process of migrating an oracle database to postgres in >> Google Cloud and are investigating backup/recovery tools. The database is >> size is > 20TB. We have an SLA that requires us to be able to complete a >> full restore of the database within 24 hours. We have been testing >> pgbackreset, barman, and GCP snapshots but wanted to see if there are any >> other recommendations we should consider. >> >> *Desirable features* >> - Parallel backup/recovery >> - Incremental backups >> - Backup directly to a GCP bucket >> - Deduplication/Compression >> > > For your 24-hour-restore requirement, there's an additional feature you > might consider: incremental restore, or what you might call "recovery in > place"; that is, the ability to keep a more-or-less up-to-date copy, and > then in an emergency only restore the diffs on the file system. pgbackup > uses a built-in rsync-like feature, plus a client-server architecture, that > allows it to quickly determine which disk blocks need to be updated. > Checksums are computed on each side, and data are only transferred if > checksums differ. It's very efficient. I assume that a 20 TB database is > mostly static, with only a small fraction of the data updated in any month. > I believe the checksums are precomputed and stored in the pgbackrest > repository, so you can even do this from an Amazon S3 (or whatever Google's > Cloud equivalent is for low-cost storage) backup with just modest bandwidth > usage. > > In a cloud environment, you can do this on modestly-priced hardware (a few > CPUs, modest memory). In the event of a failover, unmount your backup disk, > spin up a big server, mount the database, do the incremental restore, and > you're in business. > > Craig (James) > > >> Any suggestions would be appreciated. >> >> Craig Jackson >> > > > -- Craig
Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson wrote: > Hi, > > We are in the process of migrating an oracle database to postgres in > Google Cloud and are investigating backup/recovery tools. The database is > size is > 20TB. We have an SLA that requires us to be able to complete a > full restore of the database within 24 hours. We have been testing > pgbackreset, barman, and GCP snapshots but wanted to see if there are any > other recommendations we should consider. > > *Desirable features* > - Parallel backup/recovery > - Incremental backups > - Backup directly to a GCP bucket > - Deduplication/Compression > For your 24-hour-restore requirement, there's an additional feature you might consider: incremental restore, or what you might call "recovery in place"; that is, the ability to keep a more-or-less up-to-date copy, and then in an emergency only restore the diffs on the file system. pgbackup uses a built-in rsync-like feature, plus a client-server architecture, that allows it to quickly determine which disk blocks need to be updated. Checksums are computed on each side, and data are only transferred if checksums differ. It's very efficient. I assume that a 20 TB database is mostly static, with only a small fraction of the data updated in any month. I believe the checksums are precomputed and stored in the pgbackrest repository, so you can even do this from an Amazon S3 (or whatever Google's Cloud equivalent is for low-cost storage) backup with just modest bandwidth usage. In a cloud environment, you can do this on modestly-priced hardware (a few CPUs, modest memory). In the event of a failover, unmount your backup disk, spin up a big server, mount the database, do the incremental restore, and you're in business. Craig (James) > Any suggestions would be appreciated. > > Craig Jackson >
Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Hi, We are in the process of migrating an oracle database to postgres in Google Cloud and are investigating backup/recovery tools. The database is size is > 20TB. We have an SLA that requires us to be able to complete a full restore of the database within 24 hours. We have been testing pgbackreset, barman, and GCP snapshots but wanted to see if there are any other recommendations we should consider. *Desirable features* - Parallel backup/recovery - Incremental backups - Backup directly to a GCP bucket - Deduplication/Compression Any suggestions would be appreciated. Craig Jackson
How to run in parallel in Postgres
Hi I have a function that prepares data, so the big job can be run it in parallel. Today I have solved this by using "Gnu parallel" like this. psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4 psql testdb -c /tmp/run_cmd.sql" 2>> /tmp/analyze.log; The problem here is that I depend on external code which may not be installed. Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls. If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that find_overlap_gap_make_run_cmd generates as set of 28 sql calls. So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ? When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work. Thanks . Lars