Re: autovacuum big table taking hours and sometimes seconds

2019-02-07 Thread Jeff Janes
On Thu, Feb 7, 2019 at 6:55 AM Mariel Cherkassky <
mariel.cherkas...@gmail.com> wrote:

I have 3 questions :
> 1)To what value do you recommend to increase the vacuum cost_limit ? 2000
> seems reasonable ? Or maybe its better to leave it as default and assign a
> specific value for big tables ?
>

That depends on your IO hardware, and your workload.  You wouldn't want
background vacuum to use so much of your available IO that it starves your
other processes.



> 2)When the autovacuum reaches the cost_limit while trying to vacuum a
> specific table, it wait nap_time seconds and then it continue to work on
> the same table ?
>

No, it waits for autovacuum_vacuum_cost_delay before resuming within the
same table. During this delay, the table is still open and it still holds a
lock on it, and holds the transaction open, etc.  Naptime is entirely
different, it controls how often the vacuum scheduler checks to see which
tables need to be vacuumed again.



> 3)So in case I have a table that keeps growing (not fast because I set the
> vacuum_scale_factor to 0 and the autovacuum_vacuum_threshold to 1). If
> the table keep growing it means I should try to increase the cost right ?
> Do you see any other option ?
>

 You can use pg_freespacemap to see if the free space is spread evenly
throughout the table, or clustered together.  That might help figure out
what is going on.  And, is it the table itself that is growing, or the
index on it?

Cheers,

Jeff


Managing High Availability in PostgreSQL – Part I

2019-02-07 Thread Kristi Anderson
Managing high availability in your  PostgreSQL hosting
   is very important to ensuring your
clusters maintain exceptional uptime and strong operational performance so
your data is always available to your application. In an earlier blog post,
we introduced you to  configuring high availability for PostgreSQL using
streaming replication

 
, and now we’re going to show you how to best manage PostgreSQL high
availability.

There are multiple tools available for managing the high availability of
your PostgreSQL clusters using streaming replication. These solutions offer
automatic failover capabilities, monitoring, replication, and other useful
administrative tasks. Some of the prominent open source solutions include:

1. PostgreSQL Automatic Failover by  ClusterLabs
  
2. Replication Manager for PostgreSQL Clusters by  repmgr
   (2ndQuadrant)
3. Patroni by  Zalando   

Each of these tools provides their own way of managing the clusters. In our
three-part series of posts on high availability for PostgreSQL, we’ll share
an overview, the prerequisites, and the working and test results for each of
these three tools. Here in Part 1, we’ll deep dive into the PostgreSQL
Automatic Failover (PAF) solution by ClusterLabs.

PostgreSQL Automatic Failover

PostgreSQL Automatic Failover    (PAF)
is a high availability management solution for PostgreSQL by ClusterLabs.
PAF makes use of the popular, industry-standard  Pacemaker
   and  Corosync
   stack. With Pacemaker and Corosync
together, you’ll be able to detect failures in the system and act
accordingly.

Pacemaker is capable of managing many resources, and does so with the help
of their  resource agents  
. Resource agents then have the responsibility of handling a specific
resource, how they should behave, and inform Pacemaker of their results.

Your resource agent implementation must comply to the  Open Cluster
Framework

  
(OCF) specification. This specification defines resource agents’ behavior
and implementation of methods like stop, start, promote, demote and
interaction with Pacemaker.

PAF is an OCF resource agent for PostgreSQL written in Perl. Once your
PostgreSQL cluster is built using internal streaming replication, PAF is
able to expose to Pacemaker the current status of the PostgreSQL instance on
each node: master, slave, stopped, catching up, etc.

How it Works

PAF communicates with Pacemaker regarding the cluster status and monitors
the PostgreSQL functioning. In the event of a failure, it informs Pacemaker,
and if there’s no chance of the current master being recovered, it will
trigger an election between the current standby servers. With the robust
Pacemaker in place, PAF will perform management actions like start, stop,
monitor, and failover on all the PostgreSQL nodes.

Are There Any Setup Requirements?

- PAF supports  PostgreSQL version 9.3
   and higher.
- PAF is not responsible for PostgreSQL master/standby creation or its setup
- you must create and setup streaming replication before using PAF.
- PAF doesn’t edit any configuration of Postgres. However, it requires users
to follow a few prerequisites like:
- Slave must be configured as hot standby.
- A recovery template file (default:
/recovery.conf.pcmk) has to be provided with below
parameters:
- standby_mode = on
- recovery_target_timeline = ‘latest’
- primary_conninfo must have the application_name parameter defined and set
to local node name as in Pacemaker.
- PAF exposes multiple parameters related to the management of a PostgreSQL
resource. This can be configured to suit one’s requirement. Below are the
parameters:
- bindir: location of the PostgreSQL binaries (default: /usr/bin)
- pgdata: location of the PGDATA of your instance (default:
/var/lib/pgsql/data)
- datadir: path to the directory set in data_directory from your
postgresql.conf file
- pghost: the socket directory or IP address to use to connect to the local
instance (default: /tmp)
- pgport: the port to connect to the local instance (default: 5432)
recovery_template: the local template that will be copied as the
PGDATA/recovery.conf file. This template file must exists on all node
(default: $PGDATA/recovery.conf.pcmk)
- start_opts: Additional arguments given to the Postgres process on startup.
See “postgres –help” for available options. Useful when the postgresql.conf
file is not in the data directory (PGDATA), eg.: -c
config_file=/etc/postgresql/9.3/main/postgresql.conf
- system_user: the system owner of 

Re: autovacuum big table taking hours and sometimes seconds

2019-02-07 Thread Mariel Cherkassky
Just to make sure that I understood :
-By increasing the cost_limit or decreasing the cost of the page_cost we
can decrease the time it takes the autovacuum process to vacuum a specific
table.
-The vacuum threshold/scale are used to decide how often the table will be
vacuum and not how long it should take.

I have 3 questions :
1)To what value do you recommend to increase the vacuum cost_limit ? 2000
seems reasonable ? Or maybe its better to leave it as default and assign a
specific value for big tables ?
2)When the autovacuum reaches the cost_limit while trying to vacuum a
specific table, it wait nap_time seconds and then it continue to work on
the same table ?
3)So in case I have a table that keeps growing (not fast because I set the
vacuum_scale_factor to 0 and the autovacuum_vacuum_threshold to 1). If
the table keep growing it means I should try to increase the cost right ?
Do you see any other option ? The table represent sessions of my system so
basically from my point of view I should have almost the same amount of
sessions every day and the table shouldn't grow dramatically but before
changing the vacuum threshold/factor it happened. As I mentioned in my
first comment there is a byte column and therefore the toasted table is the
problematic here.

‫בתאריך יום ה׳, 7 בפבר׳ 2019 ב-0:34 מאת ‪David Rowley‬‏ <‪
david.row...@2ndquadrant.com‬‏>:‬

> On Thu, 7 Feb 2019 at 02:34, Mariel Cherkassky
>  wrote:
> > As I said, I set the next settings for the toasted table :
> >
> >  alter table orig_table set (toast.autovacuum_vacuum_scale_factor = 0);
> >
> >   alter table orig_table set (toast.autovacuum_vacuum_threshold  =1);
>
> These settings don't control how fast auto-vacuum runs, just when it
> should run.
>
> > Can you explain a little bit more why you decided that the autovacuum
> spent it time on sleeping ?
>
> Yeah, if you look at the following settings.
>
>  vacuum_cost_limit   | 200
>  vacuum_cost_page_dirty  | 20
>  vacuum_cost_page_hit| 1
>  vacuum_cost_page_miss   | 10
>  autovacuum_vacuum_cost_delay| 20ms
>
> I've tagged on the default setting for each of these.  Both vacuum and
> auto-vacuum keep score of how many points they've accumulated while
> running. 20 points for dirtying a page, 10 for a read that's not found
> to be in shared_buffers, 1 for reading a buffer from shared buffers.
> When vacuum_cost_limit points is reached (or
> autovacuum_vacuum_cost_limit if not -1) auto-vacuum sleeps for
> autovacuum_vacuum_cost_delay, normal manual vacuums sleep for
> vacuum_cost_delay.
>
> In one of the log entries you saw:
>
> > buffer usage: 4808221 hits, 6404148 misses, 6152603 dirtied
> > avg read rate: 2.617 MiB/s, avg write rate: 2.514 MiB/s
> > system usage: CPU 148.65s/70.06u sec elapsed 19119.55 sec
>
> Doing a bit of maths to see how much time that vacuum should have slept
> for:
>
> postgres=# select (4808221 * 1 + 6404148 * 10 + 6152603 * 20) / 200.0
> * 20 / 1000;
>   ?column?
> 
>  19190.1761
>
> That's remarkably close to the actual time of 19119.55 sec. If you do
> the same for the other 3 vacuums then you'll see the same close match.
>
> > I see the autovacuum statistics from the logs, how can I check that the
> workers are busy very often ?
>
> It would be nice if there was something better, but periodically doing:
>
> SELECT count(*) FROM pg_stat_activity where query like 'autovacuum%';
>
> will work.
>
> > My vacuum limit is 200 right now, basically If vacuum runs on my toasted
> table and reached 200 but it didnt finish to clean all the dead tuples,
> after the nap, should it continue cleaning it or wait until the
> vacuum_threshold hit again ?
>
> You're confusing nap time is something else, Maybe you're confusing
> that with speed of vacuum?  Napping is just the time auto-vacuum will
> wait between checking for new tables to work on.  Having the
> auto-vacuum run so slowly is a probable cause of still having dead
> tuples after the vacuum... likely because they became dead after
> vacuum started.
>
> I'd recommend reading the manual or Tomas Vondra's blog about vacuum
> costs. It's not overly complex, once you understand what each of the
> vacuum settings does.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: Transaction size and Wal2Json

2019-02-07 Thread Peter Eisentraut
On 07/02/2019 11:38, Mai Peng wrote:
> We use the plugin Wal2Json in order to catch every modification on database. 
> We’ve got an issue : WAL were growing very fast, the state of 
> pg_stat_replication still on ‘catchup’ , an error:pg_recvlogical: unexpected 
> termination of replication stream: ERROR: out of memory DETAIL: Cannot 
> enlarge string buffer. It seems Wal2Json can not handle very big transactions 
> ( more than 1 Gb).
> How could we measure the size of a transaction ?
> Could we increase this limitation ? 

You should send a bug report to wal2json.

It's plausible that some naive coding would run into the limitation that
you describe, but a bit of effort can probably solve it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Transaction size and Wal2Json

2019-02-07 Thread Mai Peng
Hello,

We use the plugin Wal2Json in order to catch every modification on database. 
We’ve got an issue : WAL were growing very fast, the state of 
pg_stat_replication still on ‘catchup’ , an error:pg_recvlogical: unexpected 
termination of replication stream: ERROR: out of memory DETAIL: Cannot enlarge 
string buffer. It seems Wal2Json can not handle very big transactions ( more 
than 1 Gb).
How could we measure the size of a transaction ?
Could we increase this limitation ? 
Thank you 
Mai