date:20191205

Re: autovacuum locking question

2019-12-05 Thread Jeff Janes

On Thu, Dec 5, 2019 at 5:26 PM Mike Schanne  wrote:

> Hi,
>
> I am investigating a performance problem in our application and am seeing
> something unexpected in the postgres logs regarding the autovacuum.
>
>
>
> 2019-12-01 13:05:39.029 
> UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT
> waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,0,"process
> 6966 still waiting for RowExclusiveLock on relation 32938 of database 32768
> after 1000.085 ms","Process holding the lock: 6045. Wait queue:
> 6966.","INSERT INTO myschema.mytable (...) VALUES (...) RETURNING
> process.mytable.mytable_id",13,,""
>
> 2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24
> UTC,10/417900,0,ERROR,57014,"canceling autovacuum task","automatic
> vacuum of table ""postgres.myschema.mytable"""""
>
>
>
> My understanding from reading the documentation was that a vacuum can run
> concurrently with table inserts/updates, but from reading the logs it
> appears they are conflicting over a row lock.  This particular table gets
> very frequent inserts/updates (10-100 inserts / sec) so I am concerned that
> if the autovacuum is constantly canceled, then the table never gets cleaned
> and its performance will continue to degrade over time.  Is it expected for
> the vacuum to be canceled by an insert in this way?
>
>
>
> We are using postgres 9.6.10.
>

If the vacuum finds a lot of empty pages at the end of the table, it will
try to truncate them and takes a strong lock to do so.  It is supposed to
check every 20ms to see if anyone else is blocked on that lock, at which
point it stops doing the truncation and releases the lock.  So it should
never get "caught" holding the lock in order to be cancelled.  Is your
setting for deadlock_timeout much lower than usual?  Also, if the
truncation is bogged down in very slow IO, perhaps it doesn't actually get
around to checking ever 20ms despite its intentionsl

How often have you seen it in the logs?

Cheers,

Jeff

>

Re: autovacuum locking question

2019-12-05 Thread Tom Lane

Mike Schanne  writes:
> I am investigating a performance problem in our application and am seeing 
> something unexpected in the postgres logs regarding the autovacuum.

> 2019-12-01 13:05:39.029 
> UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT 
> waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,0,"process 6966 
> still waiting for RowExclusiveLock on relation 32938 of database 32768 after 
> 1000.085 ms","Process holding the lock: 6045. Wait queue: 6966.","INSERT 
> INTO myschema.mytable (...) VALUES (...) RETURNING 
> process.mytable.mytable_id",13,,""
> 2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 
> UTC,10/417900,0,ERROR,57014,"canceling autovacuum task","automatic vacuum 
> of table ""postgres.myschema.mytable"""""

> My understanding from reading the documentation was that a vacuum can run 
> concurrently with table inserts/updates, but from reading the logs it appears 
> they are conflicting over a row lock.  This particular table gets very 
> frequent inserts/updates (10-100 inserts / sec) so I am concerned that if the 
> autovacuum is constantly canceled, then the table never gets cleaned and its 
> performance will continue to degrade over time.  Is it expected for the 
> vacuum to be canceled by an insert in this way?

The main part of an autovacuum operation should go through OK.  The only
part that would get canceled in response to somebody taking a
non-exclusive lock is the last step, which is truncation of unused blocks
at the end of the table; that requires an exclusive lock.  Normally,
skipping that step isn't terribly problematic.

> We are using postgres 9.6.10.

IIRC, we've made improvements in this area since 9.6, to allow a
partial truncation to be done if someone wants the lock, rather
than just failing entirely.

regards, tom lane

RE: autovacuum locking question

2019-12-05 Thread Mike Schanne

In this particular case autovacuum_count is 0.  n_live_tup is 659,631 and 
n_dead_tup is 3,400,347.

We are using the default vacuum parameters.

From: Michael Lewis [mailto:mle...@entrata.com]
Sent: Thursday, December 05, 2019 5:49 PM
To: Mike Schanne
Cc: pgsql-performa...@postgresql.org
Subject: Re: autovacuum locking question

On Thu, Dec 5, 2019 at 3:26 PM Mike Schanne 
mailto:mscha...@kns.com>> wrote:
I am concerned that if the autovacuum is constantly canceled, then the table 
never gets cleaned and its performance will continue to degrade over time.  Is 
it expected for the vacuum to be canceled by an insert in this way?

We are using postgres 9.6.10.

Have you checked when the table was last autovacuumed in pg_stat_user_tables? 
If the autovacuum count is high and timestamp of last run is relatively 
current, then no reason for concern as far as I can figure.

Have you already configured (non-default values) for autovacuum options for 
your system or this table?

This email is non-binding, is subject to contract, and neither Kulicke and 
Soffa Industries, Inc. nor its subsidiaries (each and collectively “K&S”) shall 
have any obligation to you to consummate the transactions herein or to enter 
into any agreement, other than in accordance with the terms and conditions of a 
definitive agreement if and when negotiated, finalized and executed between the 
parties. This email and all its contents are protected by International and 
United States copyright laws. Any reproduction or use of all or any part of 
this email without the express written consent of K&S is prohibited.

Re: autovacuum locking question

2019-12-05 Thread Michael Lewis

On Thu, Dec 5, 2019 at 3:26 PM Mike Schanne  wrote:

> I am concerned that if the autovacuum is constantly canceled, then the
> table never gets cleaned and its performance will continue to degrade over
> time.  Is it expected for the vacuum to be canceled by an insert in this
> way?
>
>
>
> We are using postgres 9.6.10.
>

Have you checked when the table was last autovacuumed in
pg_stat_user_tables? If the autovacuum count is high and timestamp of last
run is relatively current, then no reason for concern as far as I can
figure.

Have you already configured (non-default values) for autovacuum options for
your system or this table?

autovacuum locking question

2019-12-05 Thread Mike Schanne

Hi,
I am investigating a performance problem in our application and am seeing 
something unexpected in the postgres logs regarding the autovacuum.

2019-12-01 13:05:39.029 
UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT 
waiting",2019-11-25 13:39:28 UTC,12/1884256,12615023,LOG,0,"process 6966 
still waiting for RowExclusiveLock on relation 32938 of database 32768 after 
1000.085 ms","Process holding the lock: 6045. Wait queue: 6966.","INSERT 
INTO myschema.mytable (...) VALUES (...) RETURNING 
process.mytable.mytable_id",13,,""
2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 
UTC,10/417900,0,ERROR,57014,"canceling autovacuum task","automatic vacuum 
of table ""postgres.myschema.mytable"""""

My understanding from reading the documentation was that a vacuum can run 
concurrently with table inserts/updates, but from reading the logs it appears 
they are conflicting over a row lock.  This particular table gets very frequent 
inserts/updates (10-100 inserts / sec) so I am concerned that if the autovacuum 
is constantly canceled, then the table never gets cleaned and its performance 
will continue to degrade over time.  Is it expected for the vacuum to be 
canceled by an insert in this way?

We are using postgres 9.6.10.

Thanks,
Mike



This email is non-binding, is subject to contract, and neither Kulicke and 
Soffa Industries, Inc. nor its subsidiaries (each and collectively "K&S") shall 
have any obligation to you to consummate the transactions herein or to enter 
into any agreement, other than in accordance with the terms and conditions of a 
definitive agreement if and when negotiated, finalized and executed between the 
parties. This email and all its contents are protected by International and 
United States copyright laws. Any reproduction or use of all or any part of 
this email without the express written consent of K&S is prohibited.

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

2019-12-05 Thread Nikolay Samokhvalov

Consider WAL-G. It works well with GCS nowadays. We have a good fresh
experience of using it on GCP, for multi-TB databases.

BTW, what is your opinion on using GCE snapshots in this context? You
mention that you've consider them. Thoughts?

Thanks,
Nik

On Thu, Dec 5, 2019 at 09:48 Craig Jackson 
wrote:

> Hi,
>
> We are in the process of migrating an oracle database to postgres in
> Google Cloud and are investigating backup/recovery tools. The database is
> size is > 20TB. We have an SLA that requires us to be able to complete a
> full restore of the database within 24 hours. We have been testing
> pgbackreset, barman, and GCP snapshots but wanted to see if there are any
> other recommendations we should consider.
>
> *Desirable features*
> - Parallel backup/recovery
> - Incremental backups
> - Backup directly to a GCP bucket
> - Deduplication/Compression
>
> Any suggestions would be appreciated.
>
> Craig Jackson
>

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

2019-12-05 Thread Craig Jackson

Thanks, I'll check it out.

On Thu, Dec 5, 2019 at 12:51 PM Craig James  wrote:

> On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson 
> wrote:
>
>> Hi,
>>
>> We are in the process of migrating an oracle database to postgres in
>> Google Cloud and are investigating backup/recovery tools. The database is
>> size is > 20TB. We have an SLA that requires us to be able to complete a
>> full restore of the database within 24 hours. We have been testing
>> pgbackreset, barman, and GCP snapshots but wanted to see if there are any
>> other recommendations we should consider.
>>
>> *Desirable features*
>> - Parallel backup/recovery
>> - Incremental backups
>> - Backup directly to a GCP bucket
>> - Deduplication/Compression
>>
>
> For your 24-hour-restore requirement, there's an additional feature you
> might consider: incremental restore, or what you might call "recovery in
> place"; that is, the ability to keep a more-or-less up-to-date copy, and
> then in an emergency only restore the diffs on the file system. pgbackup
> uses a built-in rsync-like feature, plus a client-server architecture, that
> allows it to quickly determine which disk blocks need to be updated.
> Checksums are computed on each side, and data are only transferred if
> checksums differ. It's very efficient. I assume that a 20 TB database is
> mostly static, with only a small fraction of the data updated in any month.
> I believe the checksums are precomputed and stored in the pgbackrest
> repository, so you can even do this from an Amazon S3 (or whatever Google's
> Cloud equivalent is for low-cost storage) backup with just modest bandwidth
> usage.
>
> In a cloud environment, you can do this on modestly-priced hardware (a few
> CPUs, modest memory). In the event of a failover, unmount your backup disk,
> spin up a big server, mount the database, do the incremental restore, and
> you're in business.
>
> Craig (James)
>
>
>> Any suggestions would be appreciated.
>>
>> Craig Jackson
>>
>
>
>

-- 
Craig

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

2019-12-05 Thread Craig James

On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson 
wrote:

> Hi,
>
> We are in the process of migrating an oracle database to postgres in
> Google Cloud and are investigating backup/recovery tools. The database is
> size is > 20TB. We have an SLA that requires us to be able to complete a
> full restore of the database within 24 hours. We have been testing
> pgbackreset, barman, and GCP snapshots but wanted to see if there are any
> other recommendations we should consider.
>
> *Desirable features*
> - Parallel backup/recovery
> - Incremental backups
> - Backup directly to a GCP bucket
> - Deduplication/Compression
>

For your 24-hour-restore requirement, there's an additional feature you
might consider: incremental restore, or what you might call "recovery in
place"; that is, the ability to keep a more-or-less up-to-date copy, and
then in an emergency only restore the diffs on the file system. pgbackup
uses a built-in rsync-like feature, plus a client-server architecture, that
allows it to quickly determine which disk blocks need to be updated.
Checksums are computed on each side, and data are only transferred if
checksums differ. It's very efficient. I assume that a 20 TB database is
mostly static, with only a small fraction of the data updated in any month.
I believe the checksums are precomputed and stored in the pgbackrest
repository, so you can even do this from an Amazon S3 (or whatever Google's
Cloud equivalent is for low-cost storage) backup with just modest bandwidth
usage.

In a cloud environment, you can do this on modestly-priced hardware (a few
CPUs, modest memory). In the event of a failover, unmount your backup disk,
spin up a big server, mount the database, do the incremental restore, and
you're in business.

Craig (James)

> Any suggestions would be appreciated.
>
> Craig Jackson
>

Postgres backup tool recommendations for multi-terabyte database in Google Cloud

2019-12-05 Thread Craig Jackson

Hi,

We are in the process of migrating an oracle database to postgres in Google
Cloud and are investigating backup/recovery tools. The database is size is
> 20TB. We have an SLA that requires us to be able to complete a full
restore of the database within 24 hours. We have been testing
pgbackreset, barman, and GCP snapshots but wanted to see if there are any
other recommendations we should consider.

*Desirable features*
- Parallel backup/recovery
- Incremental backups
- Backup directly to a GCP bucket
- Deduplication/Compression

Any suggestions would be appreciated.

Craig Jackson

Re: How to run in parallel in Postgres

2019-12-05 Thread Laurenz Albe

On Thu, 2019-12-05 at 12:10 +, Lars Aksel Opsahl wrote:
> have a function that prepares data, so the big job can be run it in parallel. 
> 
> Today I have solved this by using "Gnu parallel" like this.
> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT 
> find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\";
>  parallel -j 4 
> psql testdb -c  /tmp/run_cmd.sql" 2>> /tmp/analyze.log;
> 
> The problem here is that I depend on external code which may not be 
> installed. 
> 
> Since Postgres now supports parallel I was wondering if it's easy to trigger 
> parallel dynamically created SQL calls.
> 
> If you look at 
> https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql
>   you see that
> find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 
> 
> So is it in a simple way possible to use Postgres parallel functionality to 
> call this 28 functions i parallel so I don't have dependent on externally 
> install programs  ?
> 
> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may 
> continue to the next step of work. So the function that triggers parallel 
> calls wait for them complete and then may start on
> the next step of work.

You cannot run several queries in parallel in a PostgreSQL function.

You may want to have a look at PL/Proxy which might be used for things like 
that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

How to run in parallel in Postgres

2019-12-05 Thread Lars Aksel Opsahl

Hi

I have a function that prepares data, so the big job can be run it in parallel.

Today I have solved this by using "Gnu parallel" like this.
psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT 
find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\";
 parallel -j 4  psql testdb -c  /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

The problem here is that I depend on external code which may not be installed.

Since Postgres now supports parallel I was wondering if it's easy to trigger 
parallel dynamically created SQL calls.

If you look at 
https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql
  you see that

find_overlap_gap_make_run_cmd generates as set of 28 sql calls.


So is it in a simple way possible to use Postgres parallel functionality to 
call this 28 functions i parallel so I don't have dependent on externally 
install programs  ?


When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue 
to the next step of work. So the function that triggers parallel calls wait for 
them complete and then may start on the next step of work.


Thanks .


Lars

Re: autovacuum locking question

Re: autovacuum locking question

RE: autovacuum locking question

Re: autovacuum locking question

autovacuum locking question

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

Postgres backup tool recommendations for multi-terabyte database in Google Cloud

Re: How to run in parallel in Postgres

How to run in parallel in Postgres

11 matches

Site Navigation

Mail list logo

Footer information