Re: Synchronous commit behavior during network outage

2021-07-12 Thread Jeff Davis
On Fri, 2021-07-09 at 23:10 +0500, Andrey Borodin wrote: > In my experience SIGTERM coped fine so far. OK. I don't think ignoring SIGTERM in the way my patch does it is a great solution, and it's not getting much support, so I think I'll back away from that idea. I had a separate discussion with

Re: Synchronous commit behavior during network outage

2021-07-09 Thread Andrey Borodin
> 3 июля 2021 г., в 23:44, Jeff Davis написал(а): > > On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote: >>> But until you've disabled sync rep, the primary will essentially be >>> down for writes whether using this new feature or not. Even if you >>> can >>> terminate some backends to

Re: Synchronous commit behavior during network outage

2021-07-03 Thread Jeff Davis
On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote: > > But until you've disabled sync rep, the primary will essentially be > > down for writes whether using this new feature or not. Even if you > > can > > terminate some backends to try to free space, the application will > > just > > make

Re: Synchronous commit behavior during network outage

2021-07-03 Thread Andrey Borodin
> 3 июля 2021 г., в 01:15, Jeff Davis написал(а): > > On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote: >> If the failover happens due to unresponsive node we cannot just turn >> off sync rep. We need to have some spare connections for that (number >> of stuck backends will skyrocket

Re: Synchronous commit behavior during network outage

2021-07-02 Thread Jeff Davis
On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote: > If the failover happens due to unresponsive node we cannot just turn > off sync rep. We need to have some spare connections for that (number > of stuck backends will skyrocket during network partitioning). We > need available descriptors

Re: Synchronous commit behavior during network outage

2021-07-02 Thread Andrey Borodin
> 2 июля 2021 г., в 10:59, Jeff Davis написал(а): > > On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote: >>> My patch also covers the backend termination case. Is there a >>> reason >>> you left that case out? >> >> Yes, backend termination is used by HA tool before rewinding the >>

Re: Synchronous commit behavior during network outage

2021-07-02 Thread Jeff Davis
On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote: > > My patch also covers the backend termination case. Is there a > > reason > > you left that case out? > > Yes, backend termination is used by HA tool before rewinding the > node. Can't you just disable sync rep first (using ALTER SYSTEM

Re: Synchronous commit behavior during network outage

2021-06-30 Thread Andrey Borodin
> 29 июня 2021 г., в 23:35, Jeff Davis написал(а): > > On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote: >>> 29 июня 2021 г., в 03:56, Jeff Davis >>> написал(а): >>> >>> The patch may be somewhat controversial, so I'll wait for feedback >>> before documenting it properly. >> >> The

Re: Synchronous commit behavior during network outage

2021-06-29 Thread Jeff Davis
On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote: > > 29 июня 2021 г., в 03:56, Jeff Davis > > написал(а): > > > > The patch may be somewhat controversial, so I'll wait for feedback > > before documenting it properly. > > The patch seems similar to [0]. But I like your wording :) > I'd

Re: Synchronous commit behavior during network outage

2021-06-29 Thread Andrey Borodin
> 29 июня 2021 г., в 03:56, Jeff Davis написал(а): > > The patch may be somewhat controversial, so I'll wait for feedback > before documenting it properly. The patch seems similar to [0]. But I like your wording :) I'd be happy if we go with any version of these idea. Best regards, Andrey

Re: Synchronous commit behavior during network outage

2021-06-28 Thread Jeff Davis
On Tue, 2021-04-20 at 14:19 -0700, SATYANARAYANA NARLAPURAM wrote: > One idea here is to make the backend ignore query > cancellation/backend termination while waiting for the synchronous > commit ACK. This way client never reads the data that was never > flushed remotely. The problem with this

Re: Synchronous commit behavior during network outage

2021-05-20 Thread Ondřej Žižka
On 06/05/2021 06:09, Andrey Borodin wrote: I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help prevent visibility of commited-locall-but-canceled transactions? Hello Andrey, The initial request (for us) was to have a geo cluster

Re: Synchronous commit behavior during network outage

2021-05-05 Thread Andrey Borodin
Thanks for reviewing Ondřej! > 26 апр. 2021 г., в 22:01, Ondřej Žižka написал(а): > > Hello Andrey, > > I went through the thread for your patch and seems to me as an acceptable > solution... > > > The only case patch does not handle is sudden backend crash - Postgres will > > recover

Re: Synchronous commit behavior during network outage

2021-04-26 Thread Ondřej Žižka
Hello Andrey, I went through the thread for your patch and seems to me as an acceptable solution... > The only case patch does not handle is sudden backend crash - Postgres will recover without a restart. We also use a HA tool (Patroni). If the whole machine fails, it will find a new

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Andrey Borodin
Hi Ondrej! > 19 апр. 2021 г., в 22:19, Ondřej Žižka написал(а): > > Do you think, that would be possible to implement a process that would solve > this use case? > Thank you > Ondrej > Feel free to review patch fixing this at [0]. It's classified as "Server Features", but I'm sure it's a

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Ondřej Žižka
Hello, > You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait types to detect this. I tried to see this this wait_event_type Client or IPC and wait_event Client_Read or SyncRep. In which situation I can see the SYNC_REP_WAIT_FLUSH value? > You should consider these as in

Re: Synchronous commit behavior during network outage

2021-04-21 Thread SATYANARAYANA NARLAPURAM
> > This can be an option for us in our case. But there also needs to be a > process how to detect these "stuck commits" and how to invalidate/remove > them, because in reality, if the app/user would not see the change in the > database, it/he/she will try to insert/delete it again. If it just

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Pavel Stehule
st 21. 4. 2021 v 9:51 odesílatel Laurenz Albe napsal: > On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote: > > tecmint=# select * from a; --> LAN on sync replica is OK > > id > > > >1 > > (1 row) > > > > tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and > >

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Laurenz Albe
On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote: > tecmint=# select * from a; --> LAN on sync replica is OK > id > >1 > (1 row) > > tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and > insert is waiting. During this time kill the background process on the

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Aleksander Alekseev
Hi Timas, > > Thanks for the report. It seems to be a clear violation of what is > > promised in the docs. Although it's unlikely that someone implemented > > an application which deals with important data and "pressed Ctr+C" as > > it's done in psql. So this might be not such a critical issue

Re: Synchronous commit behavior during network outage

2021-04-21 Thread Ondřej Žižka
Hello Satyanarayana, This can be an option for us in our case. But there also needs to be a process how to detect these "stuck commits" and how to invalidate/remove them, because in reality, if the app/user would not see the change in the database, it/he/she will try to insert/delete it

Re: Synchronous commit behavior during network outage

2021-04-20 Thread SATYANARAYANA NARLAPURAM
One idea here is to make the backend ignore query cancellation/backend termination while waiting for the synchronous commit ACK. This way client never reads the data that was never flushed remotely. The problem with this approach is that your backends get stuck until your commit log record is

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka
I am sorry, I forgot mentioned, that in the second situation I added a primary key to the table. Ondrej On 20/04/2021 18:49, Ondřej Žižka wrote: Hello Aleksander, Thank you for the reaction. This was tested on version 13.2. There are also other possible situations with the same setup and

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka
Hello Maksim, I know your post [1]. That thread is why there we performed more tests (see another my email in this thread). We are trying to somehow implement RPO=0 solution using PostgreSQL. Knowing this... Would be possible to build RPO=0 solution with PostgreSQL? Ondrej On 20/04/2021

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka
Hello Aleksander, Thank you for the reaction. This was tested on version 13.2. There are also other possible situations with the same setup and similar issue: - When the background process on server fails On postgresql1: tecmint=# select * from a; --> LAN on sync replica

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin
On 20.04.2021 19:38, Tomas Vondra wrote: On 4/20/21 6:23 PM, Aleksander Alekseev wrote: Hi Ondřej, Thanks for the report. It seems to be a clear violation of what is promised in the docs. Although it's unlikely that someone implemented an application which deals with important data and

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin
Hi! This is a known issue with synchronous replication [1]. You might inject into unmodified operation some dummy modification to overcome the negative sides of such partially committing without source code patching. On 20.04.2021 19:23, Aleksander Alekseev wrote: Although it's unlikely

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Tomas Vondra
On 4/20/21 6:23 PM, Aleksander Alekseev wrote: > Hi Ondřej, > > Thanks for the report. It seems to be a clear violation of what is > promised in the docs. Although it's unlikely that someone implemented > an application which deals with important data and "pressed Ctr+C" as > it's done in

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Aleksander Alekseev
Hi Ondřej, Thanks for the report. It seems to be a clear violation of what is promised in the docs. Although it's unlikely that someone implemented an application which deals with important data and "pressed Ctr+C" as it's done in psql. So this might be not such a critical issue after all. BTW

Synchronous commit behavior during network outage

2021-04-19 Thread Ondřej Žižka
Hello all, I would like to know your opinion on the following behaviour I see for PostgreSQL setup with synchronous replication. This behaviour happens in a special use case. In this use case, there are 2 synchronous replicas with the following config (truncated): - 2 nodes -