Re: Postgresql BUG / Help Needed

2024-02-01 Thread Jehan-Guillaume de Rorthais
On Thu, 1 Feb 2024 08:28:45 -0500
Johnathan Tiamoh  wrote:

> Thank you Laurenz!
> 
> Is there a way of preventing this from happening ?

This could either come from a hardware issue, or easily from your own actions
or procedures.

Did you have some difficulties with your instance lately? exceptional
intervention? backup restoration? failover?




Re: PAF with Pacemaker

2023-12-01 Thread Jehan-Guillaume de Rorthais
Hi,

On Thu, 30 Nov 2023 19:07:34 +
Vijaykumar Patil  wrote:

> I have two postgres server one is primary and other one replica, I have setup
> replication and configured pacemaker and corosync.
> 
> But still I'm facing issue while creating resource. It is showing invalid
> parameters.
> 
> [root@scrbtrheldbaas001 heartbeat]# pcs status
> Cluster name: pg_cluster
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: scrbtrheldbaas001 (version 2.1.6-8.el8-6fdc9deea29) -
> partition with quorum
>   * Last updated: Thu Nov 30 19:04:29 2023 on scrbtrheldbaas001
>   * Last change:  Thu Nov 30 13:41:53 2023 by root via cibadmin on
> scrbtrheldbaas002
>   * 2 nodes configured
>   * 2 resource instances configured
> 
> Node List:
>   * Online: [ scrbtrheldbaas001 scrbtrheldbaas002 ]
> 
> Full List of Resources:
>   * Clone Set: pgsqld-clone [pgsqld] (promotable):
> * Stopped (invalid parameter): [ scrbtrheldbaas001 scrbtrheldbaas002 ]

Side note: make sure to setup fencing and/or watchdog.

> My postgres version is 15.3 but still  is searching recover.conf . please
> find below log.

It does not search for the recovery.conf for v15. In fact, if you setup a
recovery.conf with postgres v15, PAF errors immediately with the appropriate
error message:

https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L1350

> ...
> Nov 30 13:43:32 scrbtrheldbaas002.crb.apmoller.net
> pacemaker-controld[1114133]:  notice: Result of probe operation for pgsqld on
> scrbtrheldbaas002: invalid parameter (Recovery template file must contain
> "standby_mode = on")

This is the real error. But this error is only checked for v11 and before. So
now I wonder what version of PAF you are actually using? Is it up to date or a
very old one?

Or maybe the agent failed to parse correctly your actual version from
$PGDATA/PG_VERSION ?

https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L637

++




Re: Postgresql HA cluster

2023-10-18 Thread Jehan-Guillaume de Rorthais
Hi Jason,

On Tue, 17 Oct 2023 19:59:00 +
Jason Grammenos  wrote:

[...]
> If you have 2 PostgreSQL nodes hooked up to a Load balancer (haproxy), and
> you move take node1 out of load balancing, you now have connections on node1
> and connections on node2, as the Load balancer drains the connections off
> node1 and over to node2. How does PostgreSQL handle this scenario when there
> are writes happening on both nodes?

As you write about "writes happening on both nodes", you describe a
Multi-primary architecture that PostgreSQL does not support.

A standby is in "recovery mode", receiving and applying WALs directly from
the primary (or archives).

A standby will only accept read-only (RO) queries (providing you set
"hot_standby=on") as it can not produce WAL itself. Any write attempt raises
an ERROR to the client. This is a really strict state, not an advisory one. You
just can't write on a standby, even as a superuser.

So, in the situation you describe, as long as the standby is not promoted to
primary, any write (RW) query on it will just fails hard.

Obviously, you should definitely NOT promote a standby while another primary is
still alive in the cluster. You would end up with a bad split brain scenario.

About haproxy, you should use tcp-check (eg. on a Systemd socket) or http-check
(eg. on Patroni API) to decide on which server the primary is hosted. It
avoids this transition period where both nodes have RW connections.

> If instead you have 2 PostgreSQL nodes behind pacemaker (controlling a
> floating ip), what happens when you initiate a failover and move the floating
> ip? You want the connections to drain off node1 and move to node2. Again in
> this scenario, both nodes would be sustaining writes at some point in time.

No, both nodes would not be sustaining writes in the same time. Two primaries
can not leave in the same cluster in the same time, this would be a split brain
scenario. If this happen, you have to pick one and rewind or restore it from
PITR, losing the transactions it wrote in the meantime. Of course, you can also
compare both nodes and tries to recover missing transaction before restoring
one of them, but it takes a lot of time...That's why clusters (not just for
PostgreSQL) tries hard to avoid split brain scenario using fencing, quorum,
watchdog, etc.

Before moving the IP address, you have to demote your primary as standby.
Unfortunately, this can not be achieved without restarting the primary
PostgreSQL as a standby one on its node. But because demote == restart as
standby, you effectively end all clients sessions. At this point, all your
nodes will be in standby mode, without primary to feed them.

So, during a switchover Pacemaker will:

* demote your current primary (ending all sessions)
* promote the other standby as primary
* move the IP address to the new primary (colocation rule in Pacemaker between
  the IP and the promoted role).

As soon as a primary appears in the cluster and the IP address (or haproxy)
points on it, all other standbies nodes will start replicating from it.

> How does the write only replica get changed out of write mode during this
> failover?

"write only"? Do you speak about the primary in RW mode or about a standby in
RO mode? 

For a primary: a PostgreSQL restart.
For a standby: promotion happen live, all existing sessions are kept and can
then issue write queries.

> How does the primary node get switched to read only after the
> connection drain is complete?

A PostgreSQL restart.

> Overall, I am trying to understand what it looks like operationally to run a
> 2 node postgresql "cluster" and how patching both nodes would work (and
> mentally comparing it to what I currently do with mysql).
> 
> You recommended that primary-standby could be sufficient and is much simpler,
> so I am simply trying to wrap my head around what exactly running it would
> look like. If primary standby is simple enough to failover, patch , reboot,
> maintain, etc. Then you could be correct that master->master may not be
> needed.

And if you have a strong, safe and high available storage, a shared storage
cluster for the primary is even simpler IMHO (because you move all the
complexity off your hands to the storage guys ones).

For details, I recommend reading this chapter in documentation:
https://www.postgresql.org/docs/current/high-availability.html

Regards,




Re: Postgresql HA cluster

2023-10-16 Thread Jehan-Guillaume de Rorthais
On Fri, 13 Oct 2023 19:21:46 +
Laura Smith  wrote:

> --- Original Message ---
> On Friday, October 13th, 2023 at 14:10, Jehan-Guillaume de Rorthais
>  wrote:
> 
> > But really, double check first why a simple primary-standby architecture
> > doesn't meet your needs. The simpler the architecture is, the better. Even
> > from the application point of view.
> >   
> 
> 
> From my perspective I do wonder why Postgres doesn't have an equivalent to
> MySQL Group Replication.
> 
> Although you can run MySQL GR as multi-primary, most people run it as
> primary-standby.
> 
> However the difference with Postgres is that MySQL Group does leader
> election. Whilst Postgres failover/failback is a highly manual affair.

PostgreSQL core only cares about primary-standby replication.

Auto-failover must involved various components way outside of the scope of
PostgreSQL itself: the system, the network, sometime the storage, a quorum
mechanism, sometime some fencing, etc.

There's various auto-failover, non manual, solutions in PostgreSQL ecosystems,
they just all live outside of the core.

Regards,




Re: Postgresql HA cluster

2023-10-13 Thread Jehan-Guillaume de Rorthais
On Fri, 13 Oct 2023 12:02:53 +
Jason Grammenos  wrote:

> Thank you for the feedback,
> 
> I have used pacemaker for other purposes previously so am a little familiar
> with it. 

So you might be familiar with shared-storage cluster, that are the simpler one
you could deploy (baring you have a reliable HA storage available). But it's
not a multi-primary cluster.

> It appears that in this scenario pacemaker is being used to manage a
> floating ip as well as deal with split brain scenarios.

There's also two different resource agents dealing with PostgreSQL itself:
pgsql and PAF. Both handle multi-status differently from the administration
point of view.

> What isn’t clear is how effective master-> master replication is being 
> accomplished.

There's no master-master in PostgreSQL core. There's few external solutions out
there though, but double check you real needs, the real primary-standby
capacity to answer you needs, and the various constraints M-M imply before
rushing there.

> Postgresql streaming replication to the best of my limited knowledge only
> replicates in one direction, from the active to the standby servers. The
> issue this presents to me is that once you failover from the active to the
> standby (or one of the standby’s depending on how many you have) none of the
> data written on the standby is replicated back to the formerly active server.

It depend if this is a "controlled failover" (aka. "switchover") or a real
failover triggered by some failure. If this is a controlled failover, you can
hook back your old primary as a standby with no trouble. PAF even handle this
for you.

Moreover, even with a failure scenario, there's some solutions around to quickly
fix your old primary data and get it back in production quickly as a standby
(pg_rewind, PITR/pgbackrest, etc).

You just have to plan for failure and write you procedures accordingly to get
the cluster back on feet quickly after a failover.

> Let us say that I have only 2 postgresql servers (absolute minimum number)
> and I want to patch server A. Ideally, I would use a load balancer (or other
> failover mechanism like pacemaker) and repoint the floating ip to server B.
> Now traffic would “drain” down off server A, and slowly (or rapidly) move to
> B. During the move some clients would still be writing to A and some clients
> would be writing to B.

This doesn't exist as PostgreSQL has no multi-primary solution in core. You can
do rolling upgrade, but you'll have to pause the production during the
switchover between the primary and the standby.

> In the above scenario, I do not understand how streaming replication would
> handle the part of the scenario when there are clients writing to A and B.

It will not.

> It would seem that something like `pgpool-ii` or `pgEdge` would be required, 
> but
> with my limited knowledge it is unclear if or which would be appropriate.

External multi-primary solution exists, pgpool-II, Bucardo, BDR, etc. But
you'll have to ask and evaluate these thrid party solutions yourself. 

But really, double check first why a simple primary-standby architecture doesn't
meet your needs. The simpler the architecture is, the better. Even from the
application point of view.

Regards,




Re: postgreSQL UPPER Method is converting the character "µ" into "M"

2023-09-06 Thread Jehan-Guillaume de Rorthais
On Wed, 6 Sep 2023 19:34:40 +0530
Sai Teja  wrote:

> In my local it is windows OS and locale is English_united_states.1252 and
> in local it is converting as expected ( testµ into TESTµ)

I'm not familiar with PostgreSQL under Windows. Maybe collation from locale
"English_united_states.1252" is available in PostgreSQL? Could you try creating
a database/table/column using this collation?

Or just look at available collations using:

select collname from pg_collation

++




Re: Postresql HA 2 nodes

2023-05-17 Thread Jehan-Guillaume de Rorthais
On Tue, 16 May 2023 23:30:42 +0200
Marcello Lorenzi  wrote:

> Hi everyone,
> we're looking for the best solution for a 2-nodes cluster in HA with
> Postegresql 15. after some checks we are noticed about pgpool for the
> management of balancing and automatic failover. Can it be considered a
> viable and manageable solution for production?

Have a look at this recent discussion which discuss various solutions,
pros and cons:

https://www.postgresql.org/message-id/flat/AM9P251MB033074BFEBB9AA40955B5B7598929%40AM9P251MB0330.EURP251.PROD.OUTLOOK.COM

Regard,




Re: Patroni vs pgpool II

2023-04-07 Thread Jehan-Guillaume de Rorthais
On Fri, 07 Apr 2023 21:16:04 +0900 (JST)
Tatsuo Ishii  wrote:

> >> If node 1 hangs and once it is recognized as "down" by other nodes, it will
> >> not be used without manual intervention. Thus the disaster described above
> >> will not happen in pgpool.  
> > 
> > Ok, so I suppose **all** connections, scripts, softwares, backups,
> > maintenances and admins must go through Pgpool to be sure to hit the
> > correct primary.
> > 
> > This might be acceptable in some situation, but I wouldn't call that an
> > anti-split-brain solution. It's some kind of «software hiding the rogue node
> > behind a curtain and pretend it doesn't exist anymore»  
> 
> You can call Pgpool-II whatever you like.

I didn't mean to be rude here. Please, accept my apologies if my words offended
you.

I consider "proxy-based" fencing architecture fragile because you just don't
know what is happening on your rogue node as long as a meatware is coming along
to deal with it. Moreover, you must trust your scripts, configurations,
procedures, admins, applications, users, replication, network, Pgpool, etc to
not fail on you in the meantime...

In the Pacemaker world, where everything MUST be **predictable**, the only way
to predict the state of a rogue node is to fence it from the cluster. Either cut
it from the network, shut it down or set up the watchdog so it reset itself if
needed. At the end, you know your old primary is off or idle or screaming in
the void with no one to hear it. It can't harm your other nodes, data or apps
anymore, no matter what.

> Important thing for me (and probably for users) is, if it can solve user's
> problem or not.

In my humble (and biased) opinion, Patroni, PAF or shared storage cluster are
solving user's problem in regard with HA. All with PROs and CONs. All rely on
strong, safe, well known and well developed clustering concepts.

Some consider they are complex pieces of software to deploy and maintain, but
this is because HA is complex. No miracle here.

Solutions like Pgpool or Repmgr are trying hard to re-implement HA concepts
but left most of this complexity and safety to the user discretion.
Unfortunately, this is not the role of the user to deal with such things. This
kind of architecture probably answer a need, a gray zone, where it is good
enough. I've seen similar approach in the past with pgbouncer + bash scripting
calling themselves "fencing" solution [1]. I'm fine with it as far as people
are clear about the limitations.

Kind regards,

[1] eg.
https://www.postgresql.eu/events/pgconfeu2016/sessions/session/1348-ha-with-repmgr-barman-and-pgbouncer/




Re: Patroni vs pgpool II

2023-04-07 Thread Jehan-Guillaume de Rorthais
On Fri, 07 Apr 2023 18:04:05 +0900 (JST)
Tatsuo Ishii  wrote:

> > And I believe that's part of what Cen was complaining about:
> > 
> > «
> >   It is basically a daemon glued together with scripts for which you are 
> >   entirely responsible for. Any small mistake in failover scripts and 
> >   cluster enters  a broken state.
> > »
> > 
> > If you want to build something clean, including fencing, you'll have to
> > handle/dev it by yourself in scripts  
> 
> That's a design decision. This gives maximum flexibility to users.

Sure, no problem with that. But people has to realize that the downside is that
it left the whole complexity and reliability of the cluster in the hands of
the administrator. And these are much more complicated and racy than 
a simple promote node.

Even dealing with a simple vIP can become a nightmare if not done correctly.

> Please note that we provide step-by-step installation/configuration
> documents which has been used by production systems.
> 
> https://www.pgpool.net/docs/44/en/html/example-cluster.html

These scripts rely on SSH, which is really bad. What if you have a SSH failure
in the mix? 

Moreover, even if SSH wouldn't be a weakness by itself, the script it doesn't
even try to shutdown the old node or stop the old primary.

You can add to the mix that both Pgpool and SSH rely on TCP for availability
checks and actions. You better have very low TCP timeout/retry...

When a service lose quorum on a resource, it is supposed to shutdown as fast as
possible... Or even self-fence itself using a watchdog device if the shutdown
action doesn't return fast enough.

> >> However I am not sure STONITH is always mandatory.  
> > 
> > Sure, it really depend on how much risky you can go and how much complexity
> > you can afford. Some cluster can leave with a 10 minute split brain where
> > some other can not survive a 5s split brain.
> >   
> >> I think that depends what you want to avoid using fencing. If the purpose
> >> is to avoid having two primary servers at the same time, Pgpool-II achieve
> >> that as described above.  
> > 
> > How could you be so sure?
> > 
> > See https://www.alteeve.com/w/The_2-Node_Myth
> > 
> > «
> >   * Quorum is a tool for when things are working predictably
> >   * Fencing is a tool for when things go wrong  
> 
> I think the article does not apply to Pgpool-II.

It is a simple example using NFS. The point here is that when things are
getting unpredictable, Quorum is just not enough. So yes, it does apply to
Pgpool.

Quorum is nice when nodes can communicate with each others, when they have
enough time and/or minimal load to complete actions correctly. 

My point is that a proper cluster with a anti-split-brain solution required
need both quorum and fencing.

> [...]
> > Later, node 1 recovers from its hang.  
> 
> Pgpool-II does not allow an automatic recover. 

This example neither. There's no automatic recover. It just state that node 1
was unable to answer in a timely fashion, just enough for a new quorum to be
formed and elect a new primary. But node 1 was not dead, and when node 1 is
able to answer, boom.

Service being muted for some period of time is really common. There's various
articles/confs feedback about cluster failing-over wrongly because of eg. a high
load on the primary... Last one was during the fosdem iirc.

> If node 1 hangs and once it is recognized as "down" by other nodes, it will
> not be used without manual intervention. Thus the disaster described above
> will not happen in pgpool.

Ok, so I suppose **all** connections, scripts, softwares, backups, maintenances
and admins must go through Pgpool to be sure to hit the correct primary.

This might be acceptable in some situation, but I wouldn't call that an
anti-split-brain solution. It's some kind of «software hiding the rogue node
behind a curtain and pretend it doesn't exist anymore»

Regards,




Re: Patroni vs pgpool II

2023-04-07 Thread Jehan-Guillaume de Rorthais
On Fri, 07 Apr 2023 13:16:59 +0900 (JST)
Tatsuo Ishii  wrote:

> >> > But, I heard PgPool is still affected by Split brain syndrome.
> >> 
> >> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
> >> number of nodes must be odd) are configured, a split brain can be
> >> avoided.  
> > 
> > Split brain is a hard situation to avoid. I suppose OP is talking about
> > PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
> > avoid that.  
> 
> Ok, "split brain" means here that there are two or more PostgreSQL
> primary serves exist.
> 
> Pgpool-II's watchdog has a feature called "quorum failover" to avoid
> the situation. To make this work, you need to configure 3 or more
> Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there
> are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal
> is to avoid that both pg0 and pg1 become primary servers.
> 
> Pgpool-II periodically monitors PostgreSQL healthiness by checking
> whether it can reach to the PostgreSQL servers. Suppose w0 and w1
> detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely,
> i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1,
> pg0 are in a network A, but w2 and pg1 in different network B. A and B
> cannot reach each other).
> 
> In this situation if w2 promotes pg1 because w0 seems to be down, then
> the system ends up with two primary servers: split brain.
> 
> With quorum failover is enabled, w0, w1, and w2 communicate each other
> to vote who is correct (if it cannot communicate, it regards other
> watchdog is down). In the case above w0 and w1 are majority and will
> win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the
> primary. On the other hand, since wg2 looses, and it gives up
> promoting pg1, thus the split brain is avoided.
> 
> Note that in the configuration above, clients access the cluster via
> VIP. VIP is always controlled by majority watchdog, clients will not
> access pg1 because it is set to down status by w0 and w1.
> 
> > To avoid split brain, you need to implement a combinaison of quorum and
> > (self-)fencing.
> > 
> > Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
> > with the (hardware) watchdog. You can also implement node fencing through
> > the "pre_promote" script to fence the old primary node before promoting the
> > new one.
> > 
> > If you need HA with a high level of anti-split-brain security, you'll not be
> > able to avoid some sort of fencing, no matter what.
> > 
> > Good luck.  
> 
> Well, if you define fencing as STONITH (Shoot The Other Node in the
> Head), Pgpool-II does not have the feature.

And I believe that's part of what Cen was complaining about:

«
  It is basically a daemon glued together with scripts for which you are 
  entirely responsible for. Any small mistake in failover scripts and 
  cluster enters  a broken state.
»

If you want to build something clean, including fencing, you'll have to
handle/dev it by yourself in scripts

> However I am not sure STONITH is always mandatory.

Sure, it really depend on how much risky you can go and how much complexity you
can afford. Some cluster can leave with a 10 minute split brain where some other
can not survive a 5s split brain.

> I think that depends what you want to avoid using fencing. If the purpose is
> to avoid having two primary servers at the same time, Pgpool-II achieve that
> as described above.

How could you be so sure?

See https://www.alteeve.com/w/The_2-Node_Myth

«
  * Quorum is a tool for when things are working predictably
  * Fencing is a tool for when things go wrong
»

Regards,




Re: Patroni vs pgpool II

2023-04-06 Thread Jehan-Guillaume de Rorthais
On Wed, 05 Apr 2023 16:50:15 +0900 (JST)
Tatsuo Ishii  wrote:

> > But, I heard PgPool is still affected by Split brain syndrome.  
> 
> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
> number of nodes must be odd) are configured, a split brain can be
> avoided.

Split brain is a hard situation to avoid. I suppose OP is talking about
PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
avoid that.

To avoid split brain, you need to implement a combinaison of quorum and
(self-)fencing.

Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
with the (hardware) watchdog. You can also implement node fencing through the
"pre_promote" script to fence the old primary node before promoting the new one.

If you need HA with a high level of anti-split-brain security, you'll not be
able to avoid some sort of fencing, no matter what.

Good luck.




Re: Patroni vs pgpool II

2023-04-04 Thread Jehan-Guillaume de Rorthais
On Mon, 3 Apr 2023 06:33:46 +
Inzamam Shafiq  wrote:
[...]
> Can someone please suggest what is one (Patroni vs PGPool II) is best for
> achieving HA/Auto failover, Load balancing for DB servers. Along with this,
> can you please share the company/client names using these tools for large PG
> databases?

Load balancing is best achieved from the application side.

The most popular auto failover solution is Patroni.

Other solutions are involving Pacemaker to either:

* build a shared storage cluster with a standalone instance moving from node to
  node (but this can include standbys)
* build a cluster with a promotable resource using eg. the PAF resource agent,
  that will decide where to start the standbys and which one to promote.

No matter the solution you pick, be prepared to learn and train. A lot.




Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-09 Thread Jehan-Guillaume de Rorthais
On Sun, 5 Feb 2023 17:14:44 -0800
Peter Geoghegan  wrote:

...
> The OP should see the Postgres ICU docs for hints on how to use these
> facilities to make a custom collation that matches whatever their
> requirements are:
> 
> https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGING

As you are talking about ICU customization to match whatever the requirement we
want, we were wondering if this would be that easy with ICU to build/create such
custom and odd collation (and not just move numbers after latin)? Even
being able to order letter by letter?

For the record, I helped on an issue last week to sort data using the ebcdic
order. ICU was just a pain, especially with this buggy and annoying bug when
sorting numbers after letters[1] and the fact that it takes whatever we feed it
with without complaining for badly formed or impossible collation. We just gave
up.

One of our recommendation was to write a glibc collation file, built/installed
it using localedef and "import" it in PostgreSQL using
pg_import_system_collations(). The customer actually did it and it works like a
charm.

Regards,

[1] remember ?
https://www.postgresql.org/message-id/flat/20200903105727.064665ce%40firost#0a6e89e58eec7679391c829231a7b3ea




Re: prevent WAL replication to fill filesystem

2021-09-08 Thread Jehan-Guillaume de Rorthais



Le 8 septembre 2021 06:07:15 GMT+02:00, Ninad Shah  a 
écrit :
>These suggestions are appropriate. However, if you are ready to lose your
>replica to keep production running, there are a couple of methods.
>
>1) Set archive_command to "/bin/true"
>2) rename .ready files in archive_status to .done.

You do not need to rename files in archive_status, especially after setting 
/bin/true. The normal production, if running, will soon enough clean 
everything, quickly.

A good advice is to never do writes in pg_wal yoursef (mv, rm, edit, etc).




Re: prevent WAL replication to fill filesystem

2021-08-31 Thread Jehan-Guillaume de Rorthais
On Tue, 31 Aug 2021 10:53:45 +0200
Laurenz Albe  wrote:

> On Tue, 2021-08-31 at 10:36 +0200, basti wrote:
> >  have a old PG 9.6 with WAL replication.
> > for some reason the rsync was failed and the filesystem was filled by
> > pg_xlog files.
> > As result PG stops working.
> > 
> > Is there a way to prevent WAL replication to fill filesystem by xlog
> > files in case of failure?  

No. 

You have to size your partition big enough to handle few days of normal
WAL production, monitor the archiving success and react quickly to fix things
when it fails.

> Upgrade to v13 and set "max_slot_wal_keep_size".

How this would prevent the failure if $OP doesn't (want to) use slots ? He
speaks about rsync, so it might be purely archiving error, not streaming.

Regards,




Re: Cluster fencing tool/software for PostgreSQL cluster with streaming replication

2021-08-25 Thread Jehan-Guillaume de Rorthais
On Mon, 16 Aug 2021 11:31:23 +0100
Vikas Sharma  wrote:

> Hi,
> 
> I am planning for an enterprise grade PostgreSQL cluster and so looking for
> the tool/softwares which will do the cluster management or fencing to avoid
> split brain.

Look at Pacemaker, pro support and devel by both RedHat and Suse.

Depending on the stack you want to build, you might want to use the pgsql
official resource agent or PAF (https://clusterlabs.github.io/PAF/). This
specific agent might not be supported by RH/Suse though.

Regards,




Re: PostgreSQL reference coffee mug

2021-08-07 Thread Jehan-Guillaume de Rorthais
On Sat, 7 Aug 2021 20:44:41 +0200
Matthias Apitz  wrote:

> El día sábado, agosto 07, 2021 a las 08:06:14p. m. +0200, Karsten Hilbert
> escribió:
> 
> > Am Fri, Aug 06, 2021 at 08:09:03PM +0200 schrieb Matthias Apitz:
> >   
> > > The prototype is ready.  
> > 
> > Nice. Now the elephant needs to fade into the background.  
> 
> It is already in the background (done with libreoffice calc). See the
> attached PDF. Only the black lines of the Elephant logo are a problem.
> Don't know how to solve this.

Agree with Matthias here, the elephant needs to fade, not just be in the
background. reading black on dark-ish-blue background is quite uncomfortable.

Or maybe make it small, below the last line?

Regards,




Re: query issue

2021-06-15 Thread Jehan-Guillaume de Rorthais
On Tue, 15 Jun 2021 19:16:41 +0530
Atul Kumar  wrote:

> hi,
> 
> I have an RDS instance with 2GB of RAM, 1 CPU, instance class - t2.small.
> 
> If you need any more info please let me know.
> 
> and as you shared I need to tweak
> random_page_cost/seq_page_cost/effective_cache_size So please suggest
> which parameter value I need to increase or decrease as I am known
> well with these parameters.

First, did you test with "SELECT _id" instead of "SELECT *" ?

About rand_page_costs/effective_cache_size, the fine manual already give some
explanations and tips:
https://www.postgresql.org/docs/current/runtime-config-query.html

With such a low setup, I'm not sure what you can expect though. What is the
concurrency? How many lines in total? The table size?




Re: query issue

2021-06-15 Thread Jehan-Guillaume de Rorthais
On Tue, 15 Jun 2021 16:12:11 +0530
Atul Kumar  wrote:

> Hi,
> 
> I have postgres 10 running on RDS instance.
> 
> I have query below:
[...]
> 
> So my doubt is initially when I run this query it takes around 42
> seconds to complete but later after few minutes it completes in 2-3
> seconds.
> 
> I tried to create indexes on table for columns score & "updatedAt"
> DESC seperately but found no proper satisfied solution.
> 
> So please help me telling what I am exactly missing here ?

The worst part of your plan is the Bitmap Heap Scan, where the plan is actually
fetching the rows from the table. The bitmap index scan and sort are fast.
There's not much to do about them.

This query need to fetch 3882 rows from your table. So either the fetching part
of the plan is really, really slow (IO/CPU bound), or the simple filter, on only
~4k, is really slow (CPU bound).

You might want to avoid "SELECT *" and only specify the fields you really need.
Try first with only "SELECT _id", just to compare. You have an average row size
of 1.3k that the executor need to fetch and carry all the way to the result set.
This can cost a lot of useless IO and CPU.

You might want to tweak random_page_cost/seq_page_cost/effective_cache_size to
find out if an index scan would do a better job, but I'm unsure because I lack
of informations about your data and system.

Regards,




Re: bottom / top posting

2021-06-12 Thread Jehan-Guillaume de Rorthais



Le 11 juin 2021 19:47:09 GMT+02:00, Nikolay Samokhvalov  
a écrit :
>My thoughts:
>https://twitter.com/samokhvalov/status/1403408281389789189.
>Apologies for top-posting.

So now, we can have a thread split over two different places and tools...




Re: How to pass a parameter in a query to postgreSQL 12

2021-06-11 Thread Jehan-Guillaume de Rorthais
On Wed, 9 Jun 2021 14:51:46 -0500
Hassan Camacho Cadre  wrote:

> Hello
> 
> 
> I recently installed a postgreSQL v12, in previous version 8.3 in all my
> queries I pass parameters using the character :
> 
> SELECT
> 
>   public.tabla.id
> 
> FROM
> 
>   public.tabla
> 
> WHERE
> 
>   public.tabla.id = :a
> 
> In the new version when I try to make this query it sends me an error
> 
> ERROR syntax error at or near ":"
> [...]
> I am executing this query on the query editor of pgadmin 4


The :varname syntaxe is only understood by psql, which parse it and replace
it with the value BEFORE sending the query to postgres. pgAdmin doesn't know
this syntaxe.

If you need to parameterize a query in pure SQL, use PREPARE/EXECUTE.

Regards,




Re: strange behavior of WAL files

2021-06-04 Thread Jehan-Guillaume de Rorthais
On Fri, 4 Jun 2021 15:39:30 +0530
Atul Kumar  wrote:

> HI,
> 
> We have a centos 6 enviornment where postgres 9.6 is running on it.
> 
> We have strange behavior of WAL files of pg_xlog directory
> 
> As we have set archive_command to archive WAL files at different
> location and the archive_command is working fine.
> 
> So strange behavior is :
> 
> We have a WAL file say for example "000136CD00E2" of
> 01.06.2021 (1st June 2021) that is getting archive successfully at the
> archive location and once it is archived, this file with same name
> (000136CD00E2) is getting generated with the latest
> timestamp (as today is 04.06.2021).and all old WAL files are behaving
> in same manner.

What is you archive_command?

I'm not sure I understand correctly, but keep in mind your
archive_command must be "read only". Do not remove the WAL file after archiving
it.

Regards,




Re: looking for a installation package to Using GSSAPI with Postgres12 for windows

2021-04-13 Thread Jehan-Guillaume de Rorthais
Hello,

On Tue, 13 Apr 2021 08:10:06 +
"LE MENTEC, SANDRINE"  wrote:

> Dear postgres community,
> 
> I am currently working on Postgres 12.5 on a windows server 2016. I need to
> use Kerberos token for the authentication on the database.
> 
> To do so, I am looking for an installation executable file for windows with
> the GSSAPI included. The basic one provided by EDB does not seem to include
> this features.

I had the same problem and fixed it by using sspi.

If you want to authenticate against a microsoft domain, try to replace "gss"
with "sspi", and maybe set "compat_realm=0" if needed. It will try to use
Kerberos, or fallback on NTLM.

See: https://www.postgresql.org/docs/12/sspi-auth.html

Regards,




Re: MultiXactId wraparound and last aggressive vacuum time

2021-04-06 Thread Jehan-Guillaume de Rorthais
On Mon, 5 Apr 2021 17:28:06 -0400
Michael Schanne  wrote:

> After a closer reading of the documentation, it appears the database should
> stop accepting new transactions before a wraparound would ever occur.  If
> so, then the only possible explanations for this multixactid wraparound
> error would be data corruption, or a bug in postgresql.  The exact version
> I'm using is 9.6.10, which is quite a few versions behind the latest
> (9.6.21), but I skimmed through the release notes of the later versions and
> did not see any bugfixes in this area.  That would leave data corruption as
> the only explanation.  Is my reasoning correct here?

I didn't checked the changelog, and you should definitely run 9.6.21, but I
believe your reasoning is correct anyway. A bug might be possible, but I would
bet a coin on the corruption.

You might want to compare this number with the value reported by: 

  pg_controldata $PGDATA|grep NextMultiXactId

Backup your cluster, then, try to isolate the table(s) and block(s) where the
corruption occurs and check at them using eg. pageinspect.

> I'm willing to upgrade but I would need to justify it somehow, so if I am
> missing something please let me know.

you can justify the upgrade using this load of reasons:
https://why-upgrade.depesz.com/show?from=9.6.10=9.6.21

Regards,




Re: questions about wraparound

2021-04-06 Thread Jehan-Guillaume de Rorthais
On Tue, 06 Apr 2021 10:46:08 +0200
Laurenz Albe  wrote:

> On Sat, 2021-04-03 at 15:22 +0200, Luca Ferrari wrote:
> > why having a TransactionId that is 32 bits
> > in depth while it is exposed (thru txid_current()) as a 64 bits value?
> > I mean, having 64 bits would reduce the need for anti-wrap arpund
> > vacuum. I suspect the usage of 32 bits is both for compatibility and
> > tuple header size, but I'm just guessing.  
> 
> Because there are two of these transaction IDs stored on each tuple
> (xmin and xmax) to determine its visibility.  The overhead of 8 bytes
> per tuples for visibility is already pretty high.
> 
> Another downside is that changing this would prevent the use of
> pg_upgrade for upgrading, as the on-disk format changes.

Indeed. Compatibility and size.

About the txid_current() format. It is showing the 64bit format used
internally. It is split in two parts:

* lower part is classical XID on 32 bits, stored in tuples header
* higher part is the "epoch" of the xid space, ie. the number of time the XID
  looped. This is not stored in tuples

Look for macros EpochFromFullTransactionId and XidFromFullTransactionId in
"include/access/transam.h".

Given txid_current() returning eg. 100:

  =# select txid_current();
   txid_current 
  --
 100

The epoch would be 2:

  $ echo $((100 >> 32))
  2

The 32bits XID stored in tuples header would be 1410065408:

  $ echo $((100 % 2**32))
  1410065408

When looking at the NextXID in the controldata file, you would find:

  $ pg_controldata $PGDATA|grep NextXID
  Latest checkpoint's NextXID:  2:1410065408

Regards,




Re: questions about wraparound

2021-04-02 Thread Jehan-Guillaume de Rorthais
On Thu, 18 Mar 2021 09:56:16 +0100
Luca Ferrari  wrote:
[...]
> Therefore my question is: shouldn't autovacuum be able to freeze other
> tables/databases? I mean, the wraparound problem in this scenario will
> cause problems, but I was expecting different numbers for different
> tables/databases.

In fact, when an autovacuum worker is spawned, here is how it chooses what
database to process:

1. look for any database needing a vacuum to prevent a wraparound.
2. same with multi-transaction
3. other autovacuum considerations

So as long as there's a database in desperate need for a vacuum to prevent a
wraparound, a worker will try to process it first, again and again.

Because of your long-running transaction, the xid horizon forbid to update the
rel/datfrozenxid. So next autovacuum round will keep trying to process the same
database, ignoring others.

Look at the comment in function "do_stat_worker()" in autovacuum.c for more
details:
https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/postmaster/autovacuum.c#n1207

When looping over the database list, as soon as "for_xid_wrap" is true, any
other database is ignored. Then a new worker is popped from the freeWorkers,
init'ed with the database to freeze and started. So as far as I understand the
code (I might easily be wrong), all the workers will keep trying to process the
same database again and again without considering other ones. All because of
your really-long living xact.

Regards,




Re: questions about wraparound

2021-04-01 Thread Jehan-Guillaume de Rorthais
Hi Luca,

On Mon, 22 Mar 2021 08:56:46 +0100
Luca Ferrari  wrote:

> I can confirm that freezing a template database is done by means of setting
> it age to zero.
[...]
> and here it is the situation after a restart:
> 
> testdb=> select datname, age( datfrozenxid ) from pg_database;  
>   datname  |   age
> ---+--
>  postgres  | 1234
>  backupdb  | 5000
>  template1 |0
>  template0 |0
>  testdb|0
>  pgbench   | 5000

The difference between a "vacuum" and "vacuum freeze" is whether the vacuum
process must scan non-frozen blocks as well, according to the visibility map.
The later is called "aggressive" vacuum because it scan all blocks, even the
clean ones, as far as they are not already marked as frozen in the visibility
map.

Whatever the vacuum you are launching, if the process find a rows older than
vacuum_freeze_min_age, it freezes it. Agressive vacuum is not different and
respect vacuum_freeze_min_age as well. That's why your oldest row in each
database is 5000 after a "vacuum freeze" (aka. aggressive vacuum, aka.
"vacuum to avoid wraparound").

Try to temporary set vacuum_freeze_min_age=4500 and freeze_table_age=0, then
run a simple vacuum on your database. This will effectively freeze you database
and set its age to the oldest row: your new vacuum_freeze_min_age, 4500.

Considering the template databases, the default vacuum_freeze_min_age is forced
to 0 in source code. That's why you find a different age between template
databases and others after a freeze.

In regard with the databases age moving altogether, even when only one of them
is receiving writes. The XID space is shared between all the database. In other
words, a transaction can not be used in two different database, unless they
apply on shared relations (those in pg_global tablespace) and probably cloned
ones from templates.
So if a database alone is consuming XIDs all other are getting older and older
and will eventually need a vacuum.

> I'm not able to find this behavior in the documentation however,

I don't know if it is explained somewhere in doc, I couldn't find it either.
But you can find this information in function "do_autovacuum()" in
src/backend/postmaster/autovacuum.c:

  /*
   * Find the pg_database entry and select the default freeze ages. We use
   * zero in template and nonconnectable databases, else the system-wide
   * default.
   */
  tuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
  if (!HeapTupleIsValid(tuple))
  elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
  dbForm = (Form_pg_database) GETSTRUCT(tuple);
  
  if (dbForm->datistemplate || !dbForm->datallowconn)
  {
  default_freeze_min_age = 0;
  default_freeze_table_age = 0;
  default_multixact_freeze_min_age = 0;
  default_multixact_freeze_table_age = 0;
  }
  else
  {
  default_freeze_min_age = vacuum_freeze_min_age;
  default_freeze_table_age = vacuum_freeze_table_age;
  default_multixact_freeze_min_age = vacuum_multixact_freeze_min_age;
  default_multixact_freeze_table_age = vacuum_multixact_freeze_table_age;
  }

> and still don't understand why a template database should have a different
> behavior (at least, I can imagine only to reduce the future workload of
> vacuuming a template database).

I suspect that it helps creating database with already frozen blocs, from a full
frozen template.

Regards,




Re: SELECT is faster on SQL Server

2021-03-19 Thread Jehan-Guillaume de Rorthais
On Fri, 19 Mar 2021 14:28:27 +0200
Frank Millman  wrote:

> [...]
> Execution of my main query has improved from 50ms to 33ms. Sql Server 
> takes 25ms, but this is much better than it was.
> 
> [...]
> 
> Here is the new EXPLAIN ANALYSE -
> 
>QUERY PLAN
> 
>   Merge Left Join  (...) (actual time=1.566..1.581 rows=5 loops=1)

1.581ms to output the very last row of this plan. This is in contradiction with
the 33ms you are referencing above.

What do I miss here? Maybe your 33ms comes yet from another set of data? Could
you share an explain analyze actually showing this 33ms total execution time?




Re: Replication sequence

2021-02-16 Thread Jehan-Guillaume de Rorthais
On Tue, 16 Feb 2021 13:10:54 +0100
Paolo Saudin  wrote:

> Il giorno mar 16 feb 2021 alle ore 10:51 Jehan-Guillaume de Rorthais <
>  [...]  
> 
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> 
> Thank you very much!
> So in case the primary server crashes, and the backup one gets live, it
> will eventually have different sequences.

No. You will just have a gap in the sequence, on both side.

And if you decided to promote your standby to production, you'll have to resync
the old primary anyway.

> I suppose there is no way to
> avoid this, isnt'it?

You can't avoid gap.




Re: Replication sequence

2021-02-16 Thread Jehan-Guillaume de Rorthais
On Mon, 15 Feb 2021 18:55:14 +0100
Paolo Saudin  wrote:

> Hi all,
> I have two servers, a primary and a secondary one with a streaming replica
> setup.
> Today I noticed that some sequences are not lined-up, the replica ones are
> well ahead, while the records number is the same. How is it possible?

This is because sequences are not WAL logged on every nextval() call, but by
batch of sequence cache+32 values. As standbys are fed with WAL stream from
primary, their sequences appears jump, then stall, then jump, etc.

See:
  
https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/sequence.c?id=f672df5fdd22dac14c98d0a0bf5bbaa6ab17f8a5#n52

  
https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/sequence.c?id=f672df5fdd22dac14c98d0a0bf5bbaa6ab17f8a5#n661

I'm not sure if this is documented somewhere... ?




Re: PostgreSQL HA

2020-12-28 Thread Jehan-Guillaume de Rorthais
On Sat, 26 Dec 2020 05:51:22 +0530
venkata786 k  wrote:

> Hi Team,
> 
> Could you please share postgresql's best active(R/W)-active(R/W) (multi
> master replication) solutions.
> My Team is thinking about implementing active-active replacing master-slave.

Symmetric replication bring a lot of complexity in various layers:
application, architecture, backups.

Why do you need symmetric replication? Do you have any issues with asymmetric
replication?




Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Jehan-Guillaume de Rorthais
On Tue, 29 Sep 2020 16:22:18 +0300
Олег Самойлов  wrote:
[...]
> > In regards with keepalive parameters, I am a bit surprised. According to the
> > source code, parameters defaults are:
> > 
> >  keepalives=1
> >  keepalives_idle=1
> >  keepalives_interval=1
> >  keepalives_count=1
> > 
> > But I just had a quick look there, so I probably miss something.  

I did miss something. See bellow.

> According to the official documentation, if keepalive parameters are not
> specified, then used default value from the OS.
> https://www.postgresql.org/docs/12/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

This settings are related to the server side.

The one you are setting in primary_conninfo are related to the client side:

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS

> Cite:  A value of 0 (the default) selects the operating system's default.
> 
> I don't know what is the default values for the CentOS 7.

sysctl -ar net.ipv4.tcp_keepalive_

> I can only assert that adding keepalive is solved issue with Postgres-STOP
> test and looked like problems with ForkBomb too. And keep in mind, I still
> use PostgreSQL 11. May be 12 or 13 something changed.

Sorry, there was some misunderstanding of the source code on my side. The
"keepalives" parameter is enabled by default on client side, but if you don't
set keepalives_idle, keepalives_interval and keepalives_count, they fallback to
system default ones which are 7200, 75 and 9 (on Debian and CentOS). So more
than 2 hours.

Regards,




Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Jehan-Guillaume de Rorthais
On Thu, 24 Sep 2020 15:22:46 +0300
Олег Самойлов  wrote:

> Hi, Jehan.
> 
> > On 9 Sep 2020, at 18:19, Jehan-Guillaume de Rorthais 
> > wrote:
> > 
> > On Mon, 7 Sep 2020 23:46:17 +0300
> > Олег Самойлов  wrote:
> >   
> >>> [...]  
> >>>>>> 10:30:55.965 FATAL:  terminating walreceiver process dpue to
> >>>>>> administrator cmd 10:30:55.966 LOG:  redo done at 0/1600C4B0
> >>>>>> 10:30:55.966 LOG:  last completed transaction was at log time
> >>>>>> 10:25:38.76429 10:30:55.968 LOG:  selected new timeline ID: 4
> >>>>>> 10:30:56.001 LOG:  archive recovery complete
> >>>>>> 10:30:56.005 LOG:  database system is ready to accept connections  
> >>>>>   
> >>>>>> The slave with didn't reconnected replication, tuchanka3c. Also I
> >>>>>> separated logs copied from the old master by a blank line:
> >>>>>> 
> >>>>>> [...]
> >>>>>> 
> >>>>>> 10:20:25.168 LOG:  database system was interrupted; last known up at
> >>>>>> 10:20:19 10:20:25.180 LOG:  entering standby mode
> >>>>>> 10:20:25.181 LOG:  redo starts at 0/1198
> >>>>>> 10:20:25.183 LOG:  consistent recovery state reached at 0/11000A68
> >>>>>> 10:20:25.183 LOG:  database system is ready to accept read only
> >>>>>> connections 10:20:25.193 LOG:  started streaming WAL from primary at
> >>>>>> 0/1200 on tl 3 10:25:05.370 LOG:  could not send data to client:
> >>>>>> Connection reset by peer 10:26:38.655 FATAL:  terminating walreceiver
> >>>>>> due to timeout 10:26:38.655 LOG:  record with incorrect prev-link
> >>>>>> 0/1200C4B0 at 0/1600C4D8  
> >>>>> 
> >>>>> This message appear before the effective promotion of tuchanka3b. Do you
> >>>>> have logs about what happen *after* the promotion?  
> >>>> 
> >>>> This is end of the slave log. Nothing. Just absent replication.
> >>> 
> >>> This is unusual. Could you log some more details about replication
> >>> tryouts to your PostgreSQL logs? Set log_replication_commands and lower
> >>> log_min_messages to debug ?
> >> 
> >> Sure, this is PostgreSQL logs for the cluster tuchanka3.
> >> Tuchanka3a is an old (failed) master.  
> > 
> > According to your logs:
> > 
> > 20:29:41 tuchanka3a: freeze
> > 20:30:39 tuchanka3c: wal receiver timeout (default 60s timeout)
> > 20:30:39 tuchanka3c: switched to archives, and error'ed (expected)
> > 20:30:39 tuchanka3c: switched to stream again (expected)
> > no more news from this new wal receiver 
> > 20:34:21 tuchanka3b: promoted
> > 
> > I'm not sure where your floating IP is located at 20:30:39, but I suppose it
> > is still on tuchanka3a as the wal receiver don't hit any connection error
> > and tuchanka3b is not promoted yet.  
> 
> I think so.
> 
> > 
> > So at this point, I suppose the wal receiver is stuck in libpqrcv_connect
> > waiting for frozen tuchanka3a to answer, with no connection timeout. You
> > might track tcp sockets on tuchanka3a to confirm this.  
> 
> I don't know how to do this.

Use ss, see its manual page. Hare is an example, using standard 5432 pgsql port:

  ss -tapn 'dport = 5432 or sport = 5432'

Look for Local and Peer addresses and their status.

> > To avoid such a wait, try to add eg. connect_timeout=2 to your
> > primary_conninfo parameter. See:
> > https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
> >   
> 
> Nope, this was not enough. But I went further and I added tcp keepalive
> options too. So now paf file, for instance in tuchanka3c, is:
> 
> # recovery.conf for krogan3, pgsqlms pacemaker module
> primary_conninfo = 'host=krogan3 user=replicant application_name=tuchanka3c
> connect_timeout=5 keepalives=1 keepalives_idle=1 keepalives_interval=3
> keepalives_count=3' recovery_target_timeline = 'latest' standby_mode = 'on'
> 
> And now the problem with PostgreSQL-STOP is solved. But I surprised, why this
> was needed? I though that wal_receiver_timeout must be enough for this case.

Because wal_receiver_timeout apply on already established and streaming
connections, when the server end streaming becomes silent.

The timeout you have happen during the connection establishment, where
connect_timeout takes effect.

In regards with keepalive parameters, I am a bit surprised. According to the
source code, parameters defaults are:

  keepalives=1
  keepalives_idle=1
  keepalives_interval=1
  keepalives_count=1

But I just had a quick look there, so I probably miss something.

Regards,




Re: BUG? Slave don't reconnect to the master

2020-09-09 Thread Jehan-Guillaume de Rorthais
On Mon, 7 Sep 2020 23:46:17 +0300
Олег Самойлов  wrote:

> [...]  
> >>> why did you add "monitor interval=15"? No harm, but it is redundant with
> >>> "monitor interval=16 role=Master" and "monitor interval=17
> >>> role=Slave".
> >> 
> >> I can't remember clearly. :) Look what happens without it.
> >> 
> >> + pcs -f configured_cib.xml resource create krogan2DB ocf:heartbeat:pgsqlms
> >> bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan2
> >> recovery_template=/var/lib/pgsql/krogan2.paf meta master notify=true
> >> resource-stickiness=10
> >> Warning: changing a monitor operation interval from 15 to 16 to make the
> >> operation unique
> >> Warning: changing a monitor operation interval from 16 to 17 to make the
> >> operation unique  
> > 
> > Something fishy here. This command lack op monitor settings. Pacemaker don't
> > add any default monitor operation with default interval if you don't give
> > one at resource creation.
> > 
> > If you create such a resource with no monitoring, the cluster will
> > start/stop it when needed, but will NOT check for its health. See: 
> > 
> > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-resource-monitoring.html
> >   
> 
> May be. But keep in mind, that I uses `pcs`, I do not edit the xml file
> directly. And I use too old pacemaker, the default package of CentOS 7 is
> pacemaker-1.1.21-4.el7.x86_64, while you link of documentation is for
> Pacemaker 2.0.

It's the same behavior between both 2.0 and 1.1, but ...(see bellow)

> >> So trivial monitor always exists by default with interval 15.  
> > 
> > nope.  
> 
> This is not true for CentOS 7. I removed my monitor options, for this example.
> 
> pcs cluster cib original_cib.xml
> cp original_cib.xml configured_cib.xml
> pcs -f configured_cib.xml resource create krogan3DB ocf:heartbeat:pgsqlms
> bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/krogan3
> recovery_template=/var/lib/pgsql/krogan3.paf meta master notify=true
> resource-stickiness=10

I tried your command, and indeed, pcs creates the missing monitor operation
with a default interval of 15. This is surprising, it's the first time I cross
these warning messages. Thanks for this information, I wasn't aware of this pcs
behavior.

But anyway, it's not recommended to create your resources without specifying
interval and timeout for each operations. See PAF docs. Just create the two
monitor operations related to both roles and you'll not have these warnings.

> > [...]
> > OK, I understand now. If you want to edit an existing resource, use "pcs
> > resource update". Make sure read the pcs manual about how to use it to
> > edit/remove/add operations on a resource.  
> 
> This is not so easy. To edit existed resource I must to know the "interval"
> of this resource, but in this case I am not sure what the interval will be
> for the monitor operation of the master role. :) Because
> >> 
> >> Warning: changing a monitor operation interval from 15 to 16 to make the
> >> operation unique
> >> Warning: changing a monitor operation interval from 16 to 17 to make the
> >> operation unique  
> 
> I am not sure in what order and what it will be. Thats why I configured as I
> configured. This just works.

Now we know where these warnings comes from, you have a solution (set both
of them explicitly)

> >> Looked like the default timeout 10 was not enough for the "master".  
> > 
> > It's written in PAF doc. See:
> > https://clusterlabs.github.io/PAF/configuration.html#resource-agent-actions
> > 
> > Do not hesitate to report or submit some enhancements to the doc if
> > needed.  
> 
> May be the documentation was improved. Thanks that you have pointed me on
> that. After moving to CentOS 8 I will check with recommended parameters
> according to the documentation. 

You can do it right now with CentOS 7. They are the same.

> > [...]
>  10:30:55.965 FATAL:  terminating walreceiver process dpue to
>  administrator cmd 10:30:55.966 LOG:  redo done at 0/1600C4B0
>  10:30:55.966 LOG:  last completed transaction was at log time
>  10:25:38.76429 10:30:55.968 LOG:  selected new timeline ID: 4
>  10:30:56.001 LOG:  archive recovery complete
>  10:30:56.005 LOG:  database system is ready to accept connections
> >>>   
>  The slave with didn't reconnected replication, tuchanka3c. Also I
>  separated logs copied from the old master by a blank line:
>  
>  [...]
>  
>  10:20:25.168 LOG:  database system was interrupted; last known up at
>  10:20:19 10:20:25.180 LOG:  entering standby mode
>  10:20:25.181 LOG:  redo starts at 0/1198
>  10:20:25.183 LOG:  consistent recovery state reached at 0/11000A68
>  10:20:25.183 LOG:  database system is ready to accept read only
>  connections 10:20:25.193 LOG:  started streaming WAL from primary at
>  0/1200 on tl 3 10:25:05.370 LOG:  could not send data to client:
>  Connection reset by peer 10:26:38.655 FATAL:  

Re: Tuchanka

2020-09-03 Thread Jehan-Guillaume de Rorthais
On Wed, 2 Sep 2020 20:33:09 +0300
Олег Самойлов  wrote:

> The software is rather outdated. It works with PostgreSQL 11 and CentOS 7.
> The next step will be upgrading to CentOS 8 and PostgreSQL 12. Please tell
> me, is it useful and worth to continue? Where is better announce it? May be
> somewhere exists special mailing list for such things.

I answered on Pacemaker mailing list with a couple of links and questions
to discuss before answering if it useful to continue.

Thanks!




Re: BUG? Slave don't reconnect to the master

2020-08-21 Thread Jehan-Guillaume de Rorthais
On Thu, 20 Aug 2020 15:16:10 +0300
Олег Самойлов  wrote:
[...]
> >> Almost works fine, but sometimes, rather rare, I detected that a slave
> >> don't reconnect to the new master after a failure. First case is
> >> PostgreSQL-STOP, when I `kill` by STOP signal postgres on the master to
> >> simulate freeze. The slave don't reconnect to the new master with errors
> >> in log:
> >> 
> >> 18:02:56.236 [3154] FATAL:  terminating walreceiver due to timeout
> >> 18:02:56.237 [1421] LOG:  record with incorrect prev-link 0/1600DDE8 at
> >> 0/1A00DE10  
> > 
> > Do you have more logs from both side of the replication?  
> 
> There was nothing special. Just errors from previous tests.

OK

> > How do you build your standbys?  
> Okey, I'll show you all configs. Linux Centos 7, PostgreSQL 11 from the
> official postgres repository. # rpm -q corosync
> corosync-2.4.5-4.el7.x86_64
> # rpm -q pacemaker
> pacemaker-1.1.21-4.el7.x86_64
> # rpm -q sbd
> sbd-1.4.0-15.el7.x86_64
> 
> The cluster designed for three datacenter with good connection between in one
> city , so it must survive a crush of one datacenter. So stonith is not used,
> instead I use quorum and sbd based watchdog. Name of nodes: tuchanka3a,
> tuchanka3b, tuchanka3c. Name for float ip: krogan3 for the master; krogan3s1,
> krogan3s2 for slaves.
> 
> postgresql common conf (not default options):
> 
> ident_file = '/var/lib/pgsql/pg_ident.conf'
> hba_file = '/var/lib/pgsql/pg_hba.conf'
> listen_addresses = '*'
> log_filename = 'postgresql.%F.log'
> wal_keep_segments = 1
> restart_after_crash = off 
> wal_receiver_timeout=0 # in case PostgreSQL-STOP wal_receiver_timeout is
> default 60s, of cause shared_buffers = 32MB
> max_wal_size=80MB

Based on setup per node, you can probably add
'synchronous_commit=remote_write' in the common conf.

> [...]
> pacemaker config, specific for this cluster:
> [...]

why did you add "monitor interval=15"? No harm, but it is redundant with
"monitor interval=16 role=Master" and "monitor interval=17 role=Slave".

By the way, nice catch to exclude RO IP when master score is
negative!

> Test PostgreSQL-STOP (wal_receiver_timeout is default 60s):
> killall -STOP postgres
> 
> PostgreSQL logs:
> From old master, tuchanka 3a:
> [...]

OK

> All errors are from former test, which was applied to slaves. (The host for
> tests choosen randomly).
> 
> New master tuchanka3b, some logs are copied from the old master on database
> restoration after previous test by pg_basebackup. I separated them by blank
> line.

You should probably consider putting your logs outside of your PGDATA, or
even better: send your PostgreSQL/Pacemaker logs to a dedicated syslog node
outside. Adding the hostname in the prefix of each log line might probably help
a lot.

> [...]
> 10:24:55.906 LOG:  entering standby mode
> 10:24:55.908 LOG:  redo starts at 0/1528
> 10:24:55.909 LOG:  consistent recovery state reached at 0/15002300
> 10:24:55.910 LOG:  database system is ready to accept read only connections
> 10:24:55.928 LOG:  started streaming WAL from primary at 0/1600 on tl 3
> 10:26:37.308 FATAL:  terminating walreceiver due to timeout

Timeout because of SIGSTOP on primary here.

> 10:26:37.308 LOG:  invalid record length at 0/1600C4D8: wanted 24, got 0
> 10:30:55.965 LOG:  received promote request

Promotion from Pacemaker here.

What happened during more than 4 minutes between the timeout and the promotion?

> 10:30:55.965 FATAL:  terminating walreceiver process due to administrator cmd
> 10:30:55.966 LOG:  redo done at 0/1600C4B0
> 10:30:55.966 LOG:  last completed transaction was at log time 10:25:38.76429
> 10:30:55.968 LOG:  selected new timeline ID: 4
> 10:30:56.001 LOG:  archive recovery complete
> 10:30:56.005 LOG:  database system is ready to accept connections

> The slave with didn't reconnected replication, tuchanka3c. Also I separated
> logs copied from the old master by a blank line:
> 
> [...]
> 
> 10:20:25.168 LOG:  database system was interrupted; last known up at 10:20:19
> 10:20:25.180 LOG:  entering standby mode
> 10:20:25.181 LOG:  redo starts at 0/1198
> 10:20:25.183 LOG:  consistent recovery state reached at 0/11000A68
> 10:20:25.183 LOG:  database system is ready to accept read only connections
> 10:20:25.193 LOG:  started streaming WAL from primary at 0/1200 on tl 3
> 10:25:05.370 LOG:  could not send data to client: Connection reset by peer
> 10:26:38.655 FATAL:  terminating walreceiver due to timeout
> 10:26:38.655 LOG:  record with incorrect prev-link 0/1200C4B0 at 0/1600C4D8

This message appear before the effective promotion of tuchanka3b. Do you have
logs about what happen *after* the promotion?

Reading at this error, it seems like record at 0/1600C4D8 references the
previous one in WAL 0/1200. So the file referenced as 0/16 have either
corrupted data or was 0/12 being recycled, but not zeroed correctly, as v11
always do no matter what (no wal_init_zero there).

That's why I'm wondering how you 

Re: BUG? Slave don't reconnect to the master

2020-08-19 Thread Jehan-Guillaume de Rorthais
On Tue, 18 Aug 2020 13:48:41 +0300
Олег Самойлов  wrote:

> Hi all.
> 
> I found some strange behaviour of postgres, which I recognise as a bug. First
> of all, let me explain situation.
> 
> I created a "test bed" (not sure how to call it right), to test high
> availability clusters based on Pacemaker and PostgreSQL. The test bed consist
> of 12 virtual machines (on VirtualBox) runing on a MacBook Pro and formed 4
> HA clusters with different structure. And all 4 HA cluster constantly tested
> in loop: simulated failures with different nature, waited for rising
> fall-over, fixing, and so on. For simplicity I'll explain only one HA
> cluster.
> This is 3 virtual machines, with master on one, and sync and async
> slaves on other. The PostgreSQL service is provided by float IPs pointed to
> working master and slaves. Slaves are connected to the master float IP too.
> When the pacemaker detects a failure, for instance, on the master, it promote
> a master on other node with lowest latency WAL and switches float IPs, so the
> third node keeping be a sync slave. My company decided to open this project
> as an open source, now I am finishing formality.

As the maintainer of PAF[1], I'm looking forward to discover it :)
Do not hesitate to ping me offlist as well in regard with Pacemaker and
resource agents.

> Almost works fine, but sometimes, rather rare, I detected that a slave don't
> reconnect to the new master after a failure. First case is PostgreSQL-STOP,
> when I `kill` by STOP signal postgres on the master to simulate freeze. The
> slave don't reconnect to the new master with errors in log:
> 
> 18:02:56.236 [3154] FATAL:  terminating walreceiver due to timeout
> 18:02:56.237 [1421] LOG:  record with incorrect prev-link 0/1600DDE8 at
> 0/1A00DE10

Do you have more logs from both side of the replication?
How do you build your standbys?

> What is strange that error about incorrect WAL is risen  after the
> termination of connection.

This is because the first message comes from the walreceiver itself (3154),
which receive and write WAL, and the other one comes from the startup process
(1421) which wait and replay WAL.

> Well, this can be workarouned by turning off wal
> receiver timeout. Now PostgreSQL-STOP works fine, but the problem is still
> exists with other test. ForkBomb simulates an out of memory situation. In
> this case a slave sometimes don't reconnect to the new master too, with
> errors in log:
> 
> 10:09:43.99 [1417] FATAL:  could not receive data from WAL stream: server
> closed the connection unexpectedly This probably means the server terminated
> abnormally before or while processing the request.
> 10:09:43.992 [1413] LOG:  invalid record length at 0/D8014278: wanted 24, got
> 0

I suspect the problem is somewhere else. The first message here is probably
related to your primary being fenced, the second one is normal. After your
IP moved to the recently promoted primary, your standby are supposed to
reconnect with no problem.

> The last error message (last row in log) was observed different, btw.
> 
> What I expect as right behaviour. The PostgreSQL slave must reconnect to the
> master IP (float IP) after the wal_retrieve_retry_interval.

In my own experience with PAF, it just works like what you describe.


Regards,

[1] https://clusterlabs.github.io/PAF/




Re: Clustering solution ?

2020-07-16 Thread Jehan-Guillaume de Rorthais
On Wed, 15 Jul 2020 15:54:03 + (UTC)
Laurent FAILLIE  wrote:

> Hello all,
> thanks all for your replies : I started to learn Patroni.Anyway, any
> alternative with my customer want's also LoadBalancing ? Thanks
> Laurent

Same comment than my previous one here. Climb one step at a time.

First, create relevant benchmarks as close as possible from the target
application. Optimize your queries, setup, and lastly your hardware. If your
architecture does not keep up with the load, then you can consider load
balancing. And really, load balancing is hard to achieve for SGBD. It often
requires some changes on application side...or even dealing with it on
application side all together.

I've seen too many complex architectures, with too many layers, hard to
administer...for nothing. Keep it simple as long as you can.

Good luck.

Regards,

-- 
Jehan-Guillaume de Rorthais
Dalibo




Re: Clustering solution ?

2020-07-13 Thread Jehan-Guillaume de Rorthais
On Thu, 9 Jul 2020 13:53:46 + (UTC)
Laurent FAILLIE  wrote:

> Hello,
> I've been asked by one of my customer to know which are the possible
> clustering solution for PostgreSQL ? Active/passive ?Active/active ?
> if possible free or not too expensive.

Start first by setting up safe and indestructible backups. Look at PITR
backups. Look at pg_backrest to help.

As soon as you have correct backups, you have DR.

Dig some more, optimize, and your PITR backup will be quite fast to restore.

If your RTO is not met with that, dig some more and then, we can talk about
HA, patroni and friends.

Regards,




Re: Safe switchover

2020-07-13 Thread Jehan-Guillaume de Rorthais
On Fri, 10 Jul 2020 10:05:52 +1000
James Sewell  wrote:

> Hi all,
> 
> I’m trying to work out a procedure for a safe zero data loss switchover
> under (high) load, which allows the old master to be reconnected without
> the use of pgrewind.
> 
> Would the following be sane?
> 
> - open connection to database
> - smart shutdown master
> - terminate all other connections
> - wait for shutdown (archiving will finish)
> - Identify last archived WAL file name (ensure it’s on backup server.)
> - wait till a standby has applied this WAL
> - promote that standby
> - attach old master to new master

During graceful shutdown (smart or fast), the primary is waiting for standby to
catchup with replication before the full stop. But nothing will save you if the
designated standby disconnect by itself during the shutdown procedure.

I usually use:

- shutdown the primary
- use pg_controldata to check its last redo checkpoint 
  and/or use pg_waldump to find its shutdown checkpoint
- check logs on standby and replication status and make sure it received the
  shutdown checkpoint (pg_waldump on the standby and/or last received lsn)
- promote the standby
- setup the old primary as standby
- start the old primary as new standby.

Regards,




Re: Replication: slave server has 3x size of production server?

2020-02-25 Thread Jehan-Guillaume de Rorthais
On Sat, 22 Feb 2020 19:23:05 +
Edson Richter  wrote:
[...]
> Actually, standby server is sending wals to a backup (barman) server:
> 
> archive_mode = always   # enables archiving; off, on, or always
> (change requires restart) archive_command = 'rsync -e "ssh -2 -C -p 2022" -az
> %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
> 
> 
> The files are about 7 months old.

Did you check the return code of your archive_command? 

Did you check the log produced by your archive_command and postmaster?

How many files with ".ready" extension in "$PGDATA/pg_xlog/archive_status/"?

Can you confirm there's no missing WAL between the older one and
the newer one in "$PGDATA/pg_xlog" in alphanum order?




Re: what to do after a failover

2020-01-09 Thread Jehan-Guillaume de Rorthais
On Thu, 9 Jan 2020 06:55:18 -0500
Rita  wrote:

> Thanks for the response.
> I am using Postgresql 11.
> I want something simple and I have a strong preference toward using stock
> tools. After the promotion and the original master comes online, I was
> thinking of doing a pg_basebackup to sync. Any thoughts about that? 

If you can afford that, this is the cleanest and easiest procedure you could
find.

Note that pg_basebackup need an empty PGDATA, so it will have to transfert the
whole instance from new promoted primary to the original one.

Regards,




Re: logical replication - negative bitmapset member not allowed

2019-11-09 Thread Jehan-Guillaume de Rorthais
On Sat, 9 Nov 2019 09:18:21 +0100
Peter Eisentraut  wrote:

> On 2019-11-07 16:18, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 7 Nov 2019 16:02:21 +0100
> > Peter Eisentraut  wrote:
> >   
> >> On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:  
> >>>> I have simplified your reproduction steps from the previous message to a
> >>>> test case, and I can confirm that your proposed fix addresses the
> >>>> issue.  
> >>>
> >>> Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
> >>> DISABLE/ENABLE is useful in the test case?  
> >>
> >> Turns out it's not necessary.  Attached is an updated patch that
> >> simplifies the test even further and moves it into the
> >> 008_diff_schema.pl file.  
> > 
> > OK. No further comments on my side.  
> 
> Committed and backpatched.  Thanks!

I'm glad to help!

Thanks,




Re: logical replication - negative bitmapset member not allowed

2019-11-07 Thread Jehan-Guillaume de Rorthais
On Thu, 7 Nov 2019 16:02:21 +0100
Peter Eisentraut  wrote:

> On 2019-11-05 17:05, Jehan-Guillaume de Rorthais wrote:
> >> I have simplified your reproduction steps from the previous message to a
> >> test case, and I can confirm that your proposed fix addresses the issue.  
> > 
> > Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
> > DISABLE/ENABLE is useful in the test case?  
> 
> Turns out it's not necessary.  Attached is an updated patch that 
> simplifies the test even further and moves it into the 
> 008_diff_schema.pl file.

OK. No further comments on my side.

Thanks,




Re: logical replication - negative bitmapset member not allowed

2019-11-05 Thread Jehan-Guillaume de Rorthais
On Tue, 5 Nov 2019 16:02:51 +0100
Peter Eisentraut  wrote:

> On 2019-10-25 17:38, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 10 Oct 2019 15:15:46 +0200
> > Jehan-Guillaume de Rorthais  wrote:
> > 
> > [...]
> >> Here is a script to reproduce it under version 10, 11 and 12:
> > 
> > I investigated on this bug while coming back from pgconf.eu. Bellow what I
> > found so far.
> 
> I have simplified your reproduction steps from the previous message to a 
> test case, and I can confirm that your proposed fix addresses the issue. 

Thanks for the feedback and the test case. I wonder if ALTER SUBSCRIPTION
DISABLE/ENABLE is useful in the test case?

Is it something recommended during DDL on logically replicated relation? If
yes, I suppose we should update the first point of the restriction chapter in
documentation:
https://www.postgresql.org/docs/11/logical-replication-restrictions

Regards,




Re: logical replication - negative bitmapset member not allowed

2019-10-25 Thread Jehan-Guillaume de Rorthais
On Thu, 10 Oct 2019 15:15:46 +0200
Jehan-Guillaume de Rorthais  wrote:

[...]
> Here is a script to reproduce it under version 10, 11 and 12:

I investigated on this bug while coming back from pgconf.eu. Bellow what I found
so far.

The message "negative bitmapset member not allowed" comes from
logicalrep_rel_open().

Every field that are unknown, dropped or generated are mapped to remote attnum
-1. See backend/replication/logical/relation.c:

if (attr->attisdropped || attr->attgenerated)
{
entry->attrmap[i] = -1;
continue;
}

attnum = logicalrep_rel_att_by_name(remoterel, NameStr(attr->attname));

Note that logicalrep_rel_att_by_name returns -1 on unknown fields.

Later in the same function, we check if fields belonging to some PK or unique
index appears in remote keys as well:

while ((i = bms_next_member(idkey, i)) >= 0)
{
[...]
if (!bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
{
entry->updatable = false;
break;
}
}

However, before checking if the local attribute belong to the remote keys,
it should check if it actually mapped to a remote one. In other words, I
suppose we should check entry->attrmap[attnum] > 0 before calling
bms_is_member().

The trivial patch would be:

-if (!bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
+if (entry->attrmap[attnum] < 0 ||
+!bms_is_member(entry->attrmap[attnum], remoterel->attkeys))
 {
 entry->updatable = false;
 break;
 }

I tested with the attached scenario and it sound to work correctly.

Note that while trying to fix this bug, I found a segment fault while compiling
with asserts. You might want to review/test without --enable-cassert. I will
report in another thread as this seems not related to this bug or fix.


negative_bitmap_error-fix.bash
Description: Binary data


Re: logical replication - negative bitmapset member not allowed

2019-10-10 Thread Jehan-Guillaume de Rorthais
Hello,

On Thu, 4 Apr 2019 23:37:04 +0200
Peter Eisentraut  wrote:

> On 2019-04-01 23:43, Alvaro Herrera wrote:
> > Maybe the replica identity of a table got set to a unique index on oid?
> > Or something else involving system columns?  (If replication is
> > otherwise working, the I suppose there's a separate publication that's
> > having the error; the first thing to isolate would be to see what tables
> > are involved in that publication).  
> 
> Looking through the code, the bms_add_member() call in
> logicalrep_read_attrs() does not use the usual
> FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
> problem.
> 
> However, I can't quite reproduce this.  There are various other checks
> that prevent this scenario, but it's plausible that with a bit of
> whacking around you could hit this error message.

Here is a script to reproduce it under version 10, 11 and 12:


# env
PUB=/tmp/pub
SUB=/tmp/sub
unset PGPORT PGHOST PGDATABASE PGDATA
export PGUSER=postgres

# cleanup
kill %1
pg_ctl -w -s -D "$PUB" -m immediate stop; echo $?
pg_ctl -w -s -D "$SUB" -m immediate stop; echo $?
rm -r "$PUB" "$SUB"

# cluster
initdb -U postgres -N "$PUB" &>/dev/null; echo $?
initdb -U postgres -N "$SUB" &>/dev/null; echo $?
echo "wal_level=logical" >> "$PUB"/postgresql.conf
echo "port=5433" >> "$SUB"/postgresql.conf
pg_ctl -w -s -D $PUB -l "$PUB"-"$(date +%FT%T)".log start; echo $?
pg_ctl -w -s -D $SUB -l "$SUB"-"$(date +%FT%T)".log start; echo $?
pgbench -p 5432 -qi 
pg_dump -p 5432 -s | psql -qXp 5433

# fake activity
pgbench -p 5432 -T 300 -c 2 &

# replication setup
psql -p 5432 -Xc "CREATE PUBLICATION prov FOR ALL TABLES"
psql -p 5433 -Xc "CREATE SUBSCRIPTION sub
  CONNECTION 'port=5432'
  PUBLICATION prov"

# wait for the streaming
unset V;
while [ "$V" != "streaming" ]; do sleep 1
V=$(psql -AtXc "SELECT 'streaming'
FROM pg_stat_replication WHERE state='streaming'")
done

# trigger the error message
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub DISABLE"
psql -p 5433 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5432 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub ENABLE"


Regards,