Re: Synchronous commit behavior during network outage

2021-07-12 Thread Jeff Davis
On Fri, 2021-07-09 at 23:10 +0500, Andrey Borodin wrote:
> In my experience SIGTERM coped fine so far.

OK. I don't think ignoring SIGTERM in the way my patch does it is a
great solution, and it's not getting much support, so I think I'll back
away from that idea.

I had a separate discussion with Andres, and he made a distinction
between explicit vs. implicit actions. For instance, an explicit
SIGTERM or SIGINT should not be ignored (or the functions that cause
those to happen); but if we are waiting for sync rep then it might be
OK to ignore a cancel caused by statement_timeout or a termination due
to a network disconnect.

Separately, I'm taking a vacation. Since there are two versions of the
patch floating around, I will withdraw mine.

Regards,
Jeff Davis






Re: Synchronous commit behavior during network outage

2021-07-09 Thread Andrey Borodin



> 3 июля 2021 г., в 23:44, Jeff Davis  написал(а):
> 
> On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote:
>>> But until you've disabled sync rep, the primary will essentially be
>>> down for writes whether using this new feature or not. Even if you
>>> can
>>> terminate some backends to try to free space, the application will
>>> just
>>> make new connections that will get stuck the same way.
>> 
>> Surely I'm talking about terminating postmaster, not individual
>> backends. But postmaster will need to terminate each running query.
>> We surely need to have a way to stop whole instance without making
>> any single query. And I do not like kill -9 for this purpose.
> 
> kill -6 would suffice.
SIGABRT is expected to generate a core dump, isn't it? Node failover is 
somewhat expected state in HA system.

> 
> I see the point that you don't want this to interfere with an
> administrative shutdown. But it seems like most shutdowns will need to
> escalate to SIGABRT for cases where things are going badly wrong (low
> memory, etc.) anyway. I don't see a better solution here.
In my experience SIGTERM coped fine so far.

> I don't fully understand why you'd be concerned about cancellation but
> not concerned about similar problems with termination, but if you think
> two GUCs are important I can do that.
I think 2 GUCs is a better solution than 1 GUC disabling both cancelation and 
termination.
It would be great if some other HA tool developers would chime in.

Thanks!

Best regards, Andrey Borodin.



Re: Synchronous commit behavior during network outage

2021-07-03 Thread Jeff Davis
On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote:
> > But until you've disabled sync rep, the primary will essentially be
> > down for writes whether using this new feature or not. Even if you
> > can
> > terminate some backends to try to free space, the application will
> > just
> > make new connections that will get stuck the same way.
> 
> Surely I'm talking about terminating postmaster, not individual
> backends. But postmaster will need to terminate each running query.
> We surely need to have a way to stop whole instance without making
> any single query. And I do not like kill -9 for this purpose.

kill -6 would suffice.

I see the point that you don't want this to interfere with an
administrative shutdown. But it seems like most shutdowns will need to
escalate to SIGABRT for cases where things are going badly wrong (low
memory, etc.) anyway. I don't see a better solution here.

I don't fully understand why you'd be concerned about cancellation but
not concerned about similar problems with termination, but if you think
two GUCs are important I can do that.

Regards,
Jeff Davis







Re: Synchronous commit behavior during network outage

2021-07-03 Thread Andrey Borodin



> 3 июля 2021 г., в 01:15, Jeff Davis  написал(а):
> 
> On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote:
>> If the failover happens due to unresponsive node we cannot just turn
>> off sync rep. We need to have some spare connections for that (number
>> of stuck backends will skyrocket during network partitioning). We
>> need available descriptors and some memory to fork new backend. We
>> will need to re-read config. We need time to try after all.
>> At some failures we may lack some of these.
> 
> I think it's a good point that, when things start to go wrong, they can
> go very wrong very quickly.
> 
> But until you've disabled sync rep, the primary will essentially be
> down for writes whether using this new feature or not. Even if you can
> terminate some backends to try to free space, the application will just
> make new connections that will get stuck the same way.
Surely I'm talking about terminating postmaster, not individual backends. But 
postmaster will need to terminate each running query.
We surely need to have a way to stop whole instance without making any single 
query. And I do not like kill -9 for this purpose.

> 
> You can avoid the "fork backend" problem by keeping a connection always
> open from the HA tool, or by editing the conf to disable sync rep and
> issuing SIGHUP instead. Granted, that still takes some memory.
> 
>> Partial degradation is already hard task. Without ability to easily
>> terminate running Postgres HA tool will often resort to SIGKILL.
> 
> When the system is really wedged as you describe (waiting on sync rep,
> tons of connections, and low memory), what information do you expect
> the HA tool to be able to collect, and what actions do you expect it to
> take?
HA tool is not going to collect anything. It just calls pg_ctl stop [0] or it's 
equivalent.

> 
> Presumably, you'd want it to disable sync rep at some point to get back
> online. Where does SIGTERM fit into the picture?

HA tool is going to terminate running instance, rewind it, switch to new 
timeline and enroll into cluster again as standby.

> 
>>> If you don't handle the termination case, then there's still a
>>> chance
>>> for the transaction to become visible to other clients before its
>>> replicated.
>> 
>> Termination is admin command, they know what they are doing.
>> Cancelation is part of user protocol.
> 
> From the pg_terminate_backend() docs: "This is also allowed if the
> calling role is a member of the role whose backend is being terminated
> or the calling role has been granted pg_signal_backend", so it's not
> really an admin command. Even for an admin, it might be hard to
> understand why terminating a backend could result in losing a visible
> transaction.
Ok, I see backend termination is not described as admin command.
We cannot prevent user from doing stupid things, they are able to delete their 
data anyway.

> I'm not really seeing two use cases here for two GUCs. Are you sure you
> want to disable only cancels but allow termination to proceed?

Yes, I'm sure. I had been running production with disabled termination for some 
weeks. cluster reparation was much slower. For some reason kill-9-ed instances 
were successfully rewound much less often. But maybe I've done something wrong.

If we can stop whole instance the same way as we did without activating 
proposed GUC - there is no any problem.

Thanks!

Best regards, Andrey Borodin.

[0] 
https://github.com/zalando/patroni/blob/master/patroni/postgresql/postmaster.py#L155



Re: Synchronous commit behavior during network outage

2021-07-02 Thread Jeff Davis
On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote:
> If the failover happens due to unresponsive node we cannot just turn
> off sync rep. We need to have some spare connections for that (number
> of stuck backends will skyrocket during network partitioning). We
> need available descriptors and some memory to fork new backend. We
> will need to re-read config. We need time to try after all.
> At some failures we may lack some of these.

I think it's a good point that, when things start to go wrong, they can
go very wrong very quickly.

But until you've disabled sync rep, the primary will essentially be
down for writes whether using this new feature or not. Even if you can
terminate some backends to try to free space, the application will just
make new connections that will get stuck the same way.

You can avoid the "fork backend" problem by keeping a connection always
open from the HA tool, or by editing the conf to disable sync rep and
issuing SIGHUP instead. Granted, that still takes some memory.

> Partial degradation is already hard task. Without ability to easily
> terminate running Postgres HA tool will often resort to SIGKILL.

When the system is really wedged as you describe (waiting on sync rep,
tons of connections, and low memory), what information do you expect
the HA tool to be able to collect, and what actions do you expect it to
take?

Presumably, you'd want it to disable sync rep at some point to get back
online. Where does SIGTERM fit into the picture?

> > If you don't handle the termination case, then there's still a
> > chance
> > for the transaction to become visible to other clients before its
> > replicated.
> 
> Termination is admin command, they know what they are doing.
> Cancelation is part of user protocol.

>From the pg_terminate_backend() docs: "This is also allowed if the
calling role is a member of the role whose backend is being terminated
or the calling role has been granted pg_signal_backend", so it's not
really an admin command. Even for an admin, it might be hard to
understand why terminating a backend could result in losing a visible
transaction.

I'm not really seeing two use cases here for two GUCs. Are you sure you
want to disable only cancels but allow termination to proceed?

Regards,
Jeff Davis






Re: Synchronous commit behavior during network outage

2021-07-02 Thread Andrey Borodin



> 2 июля 2021 г., в 10:59, Jeff Davis  написал(а):
> 
> On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
>>> My patch also covers the backend termination case. Is there a
>>> reason
>>> you left that case out?
>> 
>> Yes, backend termination is used by HA tool before rewinding the
>> node.
> 
> Can't you just disable sync rep first (using ALTER SYSTEM SET
> synchronous_standby_names=''), which will unstick the backend, and then
> terminate it?
If the failover happens due to unresponsive node we cannot just turn off sync 
rep. We need to have some spare connections for that (number of stuck backends 
will skyrocket during network partitioning). We need available descriptors and 
some memory to fork new backend. We will need to re-read config. We need time 
to try after all.
At some failures we may lack some of these.

Partial degradation is already hard task. Without ability to easily terminate 
running Postgres HA tool will often resort to SIGKILL.

> 
> If you don't handle the termination case, then there's still a chance
> for the transaction to become visible to other clients before its
> replicated.
Termination is admin command, they know what they are doing.
Cancelation is part of user protocol.

BTW can we have two GUCs? So that HA tool developers will decide on their own 
which guaranties they provide?

> 
>> There is one more caveat we need to fix: we should prevent instant
>> recovery from happening.
> 
> That can already be done with the restart_after_crash GUC.

Oh, I didn't know it, we will use it. Thanks!


Best regards, Andrey Borodin.



Re: Synchronous commit behavior during network outage

2021-07-02 Thread Jeff Davis
On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
> > My patch also covers the backend termination case. Is there a
> > reason
> > you left that case out?
> 
> Yes, backend termination is used by HA tool before rewinding the
> node.

Can't you just disable sync rep first (using ALTER SYSTEM SET
synchronous_standby_names=''), which will unstick the backend, and then
terminate it?

If you don't handle the termination case, then there's still a chance
for the transaction to become visible to other clients before its
replicated.

> There is one more caveat we need to fix: we should prevent instant
> recovery from happening.

That can already be done with the restart_after_crash GUC.

Regards,
Jeff Davis






Re: Synchronous commit behavior during network outage

2021-06-30 Thread Andrey Borodin



> 29 июня 2021 г., в 23:35, Jeff Davis  написал(а):
> 
> On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
>>> 29 июня 2021 г., в 03:56, Jeff Davis 
>>> написал(а):
>>> 
>>> The patch may be somewhat controversial, so I'll wait for feedback
>>> before documenting it properly.
>> 
>> The patch seems similar to [0]. But I like your wording :)
>> I'd be happy if we go with any version of these idea.
> 
> Thank you, somehow I missed that one, we should combine the CF entries.
> 
> My patch also covers the backend termination case. Is there a reason
> you left that case out?
Yes, backend termination is used by HA tool before rewinding the node. 
Initially I was considering termination as PANIC and got a ton of coredumps 
during failovers on drills.

There is one more caveat we need to fix: we should prevent instant recovery 
from happening. HA tool must know that our process was restarted. 
Consider following scenario:
1. Node A is primary with sync rep.
2. A is going through network partitioning, somewhere node B is promoted.
3. All backends of A are stuck in sync rep, until HA tool discovers A is failed 
node.
4. One backend crashes with segfault in some buggy extension or OOM or whatever
5. Postgres server is doing restartless crash recovery making 
local-but-not-replicated data visible.

We should prevent 5 also as we prevent cancels. HA tool will discover 
postmaster fail and will recheck in coordinatino system that it can raise up 
Postgres locally.

Thanks!

Best regards, Andrey Borodin.



Re: Synchronous commit behavior during network outage

2021-06-29 Thread Jeff Davis
On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
> > 29 июня 2021 г., в 03:56, Jeff Davis 
> > написал(а):
> > 
> > The patch may be somewhat controversial, so I'll wait for feedback
> > before documenting it properly.
> 
> The patch seems similar to [0]. But I like your wording :)
> I'd be happy if we go with any version of these idea.

Thank you, somehow I missed that one, we should combine the CF entries.

My patch also covers the backend termination case. Is there a reason
you left that case out?

Regards,
Jeff Davis






Re: Synchronous commit behavior during network outage

2021-06-29 Thread Andrey Borodin



> 29 июня 2021 г., в 03:56, Jeff Davis  написал(а):
> 
> The patch may be somewhat controversial, so I'll wait for feedback
> before documenting it properly.

The patch seems similar to [0]. But I like your wording :)
I'd be happy if we go with any version of these idea.

Best regards, Andrey Borodin.


[0]https://commitfest.postgresql.org/33/2402/





Re: Synchronous commit behavior during network outage

2021-06-28 Thread Jeff Davis
On Tue, 2021-04-20 at 14:19 -0700, SATYANARAYANA NARLAPURAM wrote:
> One idea here is to make the backend ignore query
> cancellation/backend termination while waiting for the synchronous
> commit ACK. This way client never reads the data that was never
> flushed remotely. The problem with this approach is that your
> backends get stuck until your commit log record is flushed on the
> remote side. Also, the client can see the data not flushed remotely
> if the server crashes and comes back online. You can prevent the
> latter case by making a SyncRepWaitForLSN before opening up the
> connections to the non-superusers. I have a working prototype of this
> logic, if there is enough interest I can post the patch.

I didn't see a patch here yet, so I wrote a simple one for
consideration (attached).

The problem exists for both cancellation and termination requests. The
patch adds a GUC that makes SyncRepWaitForLSN keep waiting. It does not
ignore the requests; for instance, a termination request will still be
honored when it's done waiting for sync rep.

The idea of this GUC is not to wait forever (obviously), but to allow
the administrator (or an automated network agent) to be in control of
the logic:

If the primary is non-responsive, the administrator can decide to fail
over, knowing that all visible transactions on the primary are durable
on the standby (because any transaction that didn't make it to the
standby also didn't release locks yet). If the standby is non-
responsive, the administrator can intervene with something like:

ALTER SYSTEM SET synchronous_standby_names = '';
SELECT pg_reload_conf();

which will disable sync rep, allowing the primary to complete the query
and continue on without the standby; but in that case the admin must be
sure not to fail over until there's a new standby fully caught-up.

The patch may be somewhat controversial, so I'll wait for feedback
before documenting it properly.

Regards,
Jeff Davis

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index bdbc9ef844b..4ca96e59bb6 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -88,6 +88,7 @@
 
 /* User-settable parameters for sync rep */
 char	   *SyncRepStandbyNames;
+bool		synchronous_replication_interrupt = true;;
 
 #define SyncStandbysDefined() \
 	(SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
@@ -150,6 +151,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	char	   *new_status = NULL;
 	const char *old_status;
 	int			mode;
+	bool		termination_warning_emitted = false;
+	bool		cancel_warning_emitted = false;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -263,13 +266,24 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		 */
 		if (ProcDiePending)
 		{
-			ereport(WARNING,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
-	 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
-			whereToSendOutput = DestNone;
-			SyncRepCancelWait();
-			break;
+			if (synchronous_replication_interrupt) {
+ereport(WARNING,
+		(errcode(ERRCODE_ADMIN_SHUTDOWN),
+		 errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
+		 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+whereToSendOutput = DestNone;
+SyncRepCancelWait();
+break;
+			}
+			else if (!termination_warning_emitted)
+			{
+ereport(WARNING,
+		(errcode(ERRCODE_ADMIN_SHUTDOWN),
+		 errmsg("received request to terminate connection while waiting for synchronous replication"),
+		 errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+		 errhint("Set synchronous_standby_names='' and reload the configuration to allow termination to proceed.")));
+termination_warning_emitted = true;
+			}
 		}
 
 		/*
@@ -280,12 +294,24 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		 */
 		if (QueryCancelPending)
 		{
-			QueryCancelPending = false;
-			ereport(WARNING,
-	(errmsg("canceling wait for synchronous replication due to user request"),
-	 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
-			SyncRepCancelWait();
-			break;
+			if (synchronous_replication_interrupt)
+			{
+QueryCancelPending = false;
+ereport(WARNING,
+		(errmsg("canceling wait for synchronous replication due to user request"),
+		 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+SyncRepCancelWait();
+break;
+			}
+			else if (!cancel_warning_emitted)
+			{
+ereport(WARNING,
+		

Re: Synchronous commit behavior during network outage

2021-05-20 Thread Ondřej Žižka

On 06/05/2021 06:09, Andrey Borodin wrote:

I could not understand your reasoning about 2 and 4 nodes. Can you please 
clarify a bit how 4 node setup can help prevent visibility of 
commited-locall-but-canceled transactions?

Hello Andrey,

The initial request (for us) was to have a geo cluster with 2 locations 
where would be possible to have 2 sync replicas even in case of failure 
of one location. This means to have 2 nodes in every location (4 
together). If one location fails completely (broken network connection), 
Patroni will choose the working location (5 node etcd in 3 locations to 
ensure this).


In the initial state, there is 1 sync replica in each location and one 
async replica in each location using as a source the sync replica in its 
location.

Let's have the following initial situation:
1) Nodes pg11 and pg12 are in one location nodes pg21 and pg22 are in 
another location.

2) Nodes pg11 and pg21 are in sync replica
3) Node pg12 is an async replica from pg11
4) Node pg22 is an async replica from pg21
5) Master is pg11.

When the commited-locally-but-canceled situation happens and there is a 
problem only with node pg21 (not with the network between nodes), the 
async replica pg12 will receive the local commit from pg11 just after 
the local commit on pg11 even if the cancellation happens. So there will 
be a situation when the commit is present on both pg11 and pg12. If the 
pg11 fails, the transaction already exists on pg12 and this node will be 
selected as a new leader (latest LSN).


There is a period between the time it is committed and the time it will 
have been sent to the async replica when we can lose data, but I expect 
this in milliseconds (maybe less).


It will not prevent visibility but will ensure, that the data would not 
be lost and in that case, data can be visible on the leader even if they 
are not present on the sync replica because there is ensured the 
continuity of the data persistence in the async replica.


I hope I explained it understandably.

Regards
Ondrej





Re: Synchronous commit behavior during network outage

2021-05-05 Thread Andrey Borodin
Thanks for reviewing Ondřej!

> 26 апр. 2021 г., в 22:01, Ondřej Žižka  написал(а):
> 
> Hello Andrey,
> 
> I went through the thread for your patch and seems to me as an acceptable 
> solution...
> 
> > The only case patch does not handle is sudden backend crash - Postgres will 
> > recover without a restart.
> 
> We also use a HA tool (Patroni). If the whole machine fails, it will find a 
> new master and it should be OK. We use a 4 node setup (2 sync replicas and 1 
> async from every replica). If there is an issue just with sync replica (async 
> operated normally) and the master fails completely in this situation, it will 
> be solved by Patroni (the async replica become another sync), but if it is 
> just the backend process, the master will not failover and changes will be 
> still visible...
> 
> If the sync replica outage is temporal it will be solved itself when the node 
> will establish a replication slot again... If the outage is "long", Patroni 
> will remove the "old" sync replica from the cluster and the async replica 
> reading from the master would be new sync. So yes... In 2 node setup, this 
> can be an issue, but in 4 node setup, this seems to me like a solution.
> The only situation I can imagine is a situation when the client connections 
> use a different network than the replication network and the replication 
> network would be down completely, but the client network will be up. In that 
> case, the master can be an "isolated island" and if it fails, we can lose the 
> changed data.
It is, in fact, very common type of network partition.

> Is this situation also covered in your model: "transaction effects should not 
> be observable on primary until requirements of synchronous_commit are 
> satisfied."
Yes. If synchronous_commit_cancelation = off, no backend crash occurs and HA 
tool does not start PostgreSQL service when in doubt that other primary may 
exists.

> Do you agree with my thoughts?
I could not understand your reasoning about 2 and 4 nodes. Can you please 
clarify a bit how 4 node setup can help prevent visibility of 
commited-locall-but-canceled transactions?

I do not think we can classify network partitions as "temporal" and "long". Due 
to the distributed nature of the system network partitions are eternal and 
momentary. Simultaneously. And if the node A can access node B and node C, this 
neither implies B can access C, nor B can access A.

> Maybe would be possible to implement it into PostgreSQL with a note in 
> documentation, that a multinode (>=3 nodes) cluster is necessary.
PostgreSQL does not provide and fault detection and automatic failover. 
Documenting anything wrt failover is the responsibility of HA tool.

Thanks!

Best regards, Andrey Borodin.






Re: Synchronous commit behavior during network outage

2021-04-26 Thread Ondřej Žižka

Hello Andrey,

I went through the thread for your patch and seems to me as an 
acceptable solution...


> The only case patch does not handle is sudden backend crash - 
Postgres will recover without a restart.


We also use a HA tool (Patroni). If the whole machine fails, it will 
find a new master and it should be OK. We use a 4 node setup (2 sync 
replicas and 1 async from every replica). If there is an issue just with 
sync replica (async operated normally) and the master fails completely 
in this situation, it will be solved by Patroni (the async replica 
become another sync), but if it is just the backend process, the master 
will not failover and changes will be still visible...


If the sync replica outage is temporal it will be solved itself when the 
node will establish a replication slot again... If the outage is "long", 
Patroni will remove the "old" sync replica from the cluster and the 
async replica reading from the master would be new sync. So yes... In 2 
node setup, this can be an issue, but in 4 node setup, this seems to me 
like a solution.
The only situation I can imagine is a situation when the client 
connections use a different network than the replication network and the 
replication network would be down completely, but the client network 
will be up. In that case, the master can be an "isolated island" and if 
it fails, we can lose the changed data.
Is this situation also covered in your model: "transaction effects 
should not be observable on primary until requirements of 
synchronous_commit are satisfied."


Do you agree with my thoughts?

Maybe would be possible to implement it into PostgreSQL with a note in 
documentation, that a multinode (>=3 nodes) cluster is necessary.


Regards
Ondrej

On 22/04/2021 05:55, Andrey Borodin wrote:


Hi Ondrej!


19 апр. 2021 г., в 22:19, Ondřej Žižka  написал(а):

Do you think, that would be possible to implement a process that would solve 
this use case?
Thank you
Ondrej


Feel free to review patch fixing this at [0]. It's classified as "Server 
Features", but I'm sure it's a bug fix.

Yandex.Cloud PG runs with this patch for more than half a year. Because we 
cannot afford loosing data in HA clusters.

It's somewhat incomplete solution, because PG restart or crash recovery will 
make waiting transactions visible. But we protect from this on HA tool's side.

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/33/2402/





Re: Synchronous commit behavior during network outage

2021-04-21 Thread Andrey Borodin
Hi Ondrej!

> 19 апр. 2021 г., в 22:19, Ondřej Žižka  написал(а):
> 
> Do you think, that would be possible to implement a process that would solve 
> this use case?
> Thank you
> Ondrej
> 

Feel free to review patch fixing this at [0]. It's classified as "Server 
Features", but I'm sure it's a bug fix.

Yandex.Cloud PG runs with this patch for more than half a year. Because we 
cannot afford loosing data in HA clusters.

It's somewhat incomplete solution, because PG restart or crash recovery will 
make waiting transactions visible. But we protect from this on HA tool's side.

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/33/2402/



Re: Synchronous commit behavior during network outage

2021-04-21 Thread Ondřej Žižka

Hello,

> You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait 
types to detect this.


I tried to see this this wait_event_type Client or IPC and wait_event 
Client_Read or SyncRep. In which situation I can see the 
SYNC_REP_WAIT_FLUSH value?


> You should consider these as in doubt transactions and the client 
should retry. Again, this can happen in a normal server crash case too. 
For example, a transaction committed on the server and before sending 
the acknowledgement crashed.  *The client should know how to handle 
these cases.*


I have just a light knowledge of the in-doubt transaction. Need to study 
more about it, but in real world the client is mostly 'stupid' and does 
expect only COMMIT or ROLLBACK. Nothing between.


> There is a third problem that I didn't talk about in this thread 
where the async clients (including logical decoding and replication 
clients) can get ahead of the new primary and there is no easier way to 
undo those changes. For this problem, we need to implement some protocol 
in the WAL sender where it sends the log to the consumer only up to the 
flush LSN of the standby/quorum replicas. This is something I am working 
on right now.


We setup and architecture where are 4 nodes and Patroni as a cluster 
manager. Two nodes are sync an each sync node has 1 async. In case 
something like this happen (e.g. network to sync replica fails and user 
press the CTRL+C), the async replica receives the transaction and apply 
it. If the outage is longer than some time (30s by default), management 
software checks the LSN and create a new sync replica from the ASYNC 
replica.


Ondrej

You should consider these as in doubt transactions and the client should 
retry. Again, this can happen in a normal server crash case too. For 
example, a transaction committed on the server and before sending the 
acknowledgement crashed.  The client should know how to handle these cases

On 21/04/2021 09:20, SATYANARAYANA NARLAPURAM wrote:


This can be an option for us in our case. But there also needs to
be a process how to detect these "stuck commits" and how to
invalidate/remove them, because in reality, if the app/user would
not see the change in the database, it/he/she will try to
insert/delete it again. If it just stuck without management, it
will create a queue which can cause, that in the queue there will
be 2 similar inserts/deletes which can again cause issues (like
with the primary key I mentioned before).


 This shouldn't be a problem as the previous transaction is still 
holding the locks and the new transaction is blocked behind this. 
Outside of the sync replication, this can happen today too with 
glitches/timeouts/ retries between the client and the server. Am I 
missing something?



So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in
coordination with the infrastructure team disconnects all clients
and prevent new ones to create a new connection).

You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait 
types to detect this.


- DBA will recognize, that there is an issue in communication
between the primary and the sync replica (caused the issue with
the propagation of commits)
- DBA will see that there are some commits that are in the "stuck
state"
- DBA removes these stuck commits. Note: Because the client never
received a confirmation about the successful commit -> changes in
the DB client tried to perform can't be considered as successful.


You should consider these as in doubt transactions and the client 
should retry. Again, this can happen in a normal server crash case 
too. For example, a transaction committed on the server and before 
sending the acknowledgement crashed.  The client should know how to 
handle these cases.


- DBA and infrastructure team restore the communication between
server nodes to be able to propagate commits from the primary node
to sync replica.
- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but
I would say, that this is an acceptable solution. Would your patch
be able to perform that?

My patch handles ignoring the cancel events. I ended up keeping the 
other logic (blocking super user connections in the 
client_authentication_hook.


There is a third problem that I didn't talk about in this thread where 
the async clients (including logical decoding and replication clients) 
can get ahead of the new primary and there is no easier way to undo 
those changes. For this problem, we need to implement some protocol in 
the WAL sender where it sends the log to the consumer only up to the 
flush LSN of the standby/quorum replicas. This is something I am 
working on right now.




Re: Synchronous commit behavior during network outage

2021-04-21 Thread SATYANARAYANA NARLAPURAM
>
> This can be an option for us in our case. But there also needs to be a
> process how to detect these "stuck commits" and how to invalidate/remove
> them, because in reality, if the app/user would not see the change in the
> database, it/he/she will try to insert/delete it again. If it just stuck
> without management, it will create a queue which can cause, that in the
> queue there will be 2 similar inserts/deletes which can again cause issues
> (like with the primary key I mentioned before).
>

 This shouldn't be a problem as the previous transaction is still holding
the locks and the new transaction is blocked behind this. Outside of the
sync replication, this can happen today too with glitches/timeouts/ retries
between the client and the server. Am I missing something?


So the process should be in this case:
>
> - DBA receives information, that write operations stuck (DBA in
> coordination with the infrastructure team disconnects all clients and
> prevent new ones to create a new connection).
>
You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait types
to detect this.


> - DBA will recognize, that there is an issue in communication between the
> primary and the sync replica (caused the issue with the propagation of
> commits)
> - DBA will see that there are some commits that are in the "stuck state"
> - DBA removes these stuck commits. Note: Because the client never received
> a confirmation about the successful commit -> changes in the DB client
> tried to perform can't be considered as successful.
>

You should consider these as in doubt transactions and the client should
retry. Again, this can happen in a normal server crash case too. For
example, a transaction committed on the server and before sending the
acknowledgement crashed.  The client should know how to handle these cases.

- DBA and infrastructure team restore the communication between server
> nodes to be able to propagate commits from the primary node to sync replica.
> - DBA and infrastructure team allows new connections to the database
>
> This approach would require external monitoring and alerting, but I would
> say, that this is an acceptable solution. Would your patch be able to
> perform that?
>
My patch handles ignoring the cancel events. I ended up keeping the other
logic (blocking super user connections in the client_authentication_hook.

There is a third problem that I didn't talk about in this thread where the
async clients (including logical decoding and replication clients) can get
ahead of the new primary and there is no easier way to undo those changes.
For this problem, we need to implement some protocol in the WAL sender
where it sends the log to the consumer only up to the flush LSN of the
standby/quorum replicas. This is something I am working on right now.


Re: Synchronous commit behavior during network outage

2021-04-21 Thread Pavel Stehule
st 21. 4. 2021 v 9:51 odesílatel Laurenz Albe 
napsal:

> On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote:
> > tecmint=# select * from a; --> LAN on sync replica is OK
> >   id
> > 
> >1
> > (1 row)
> >
> > tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and
> > insert is waiting. During this time kill the background process on the
> > PostgreSQL server for this session
> > WARNING:  canceling the wait for synchronous replication and terminating
> > connection due to administrator command
> > DETAIL:  The transaction has already committed locally, but might not
> > have been replicated to the standby.
> > server closed the connection unexpectedly
> >  This probably means the server terminated abnormally
> >  before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> >
> > tecmint=# select * from a;
> >   id
> > 
> >1
> >2
> > (2 rows)
>
> It is well known that synchronous replication is sublect to that problem,
> since it doesn't use the two-phase commit protocol.
>
> What surprises me is that this is a warning.
> In my opinion it should be an error.
>

yes, the an error has more sense

Regards

Pavel


> Yours,
> Laurenz Albe
>
>
>
>


Re: Synchronous commit behavior during network outage

2021-04-21 Thread Laurenz Albe
On Tue, 2021-04-20 at 18:49 +0100, Ondřej Žižka wrote:
> tecmint=# select * from a; --> LAN on sync replica is OK
>   id
> 
>1
> (1 row)
> 
> tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and 
> insert is waiting. During this time kill the background process on the 
> PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and terminating 
> connection due to administrator command
> DETAIL:  The transaction has already committed locally, but might not 
> have been replicated to the standby.
> server closed the connection unexpectedly
>  This probably means the server terminated abnormally
>  before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> 
> tecmint=# select * from a;
>   id
> 
>1
>2
> (2 rows)

It is well known that synchronous replication is sublect to that problem,
since it doesn't use the two-phase commit protocol.

What surprises me is that this is a warning.
In my opinion it should be an error.

Yours,
Laurenz Albe





Re: Synchronous commit behavior during network outage

2021-04-21 Thread Aleksander Alekseev
Hi Timas,

> > Thanks for the report. It seems to be a clear violation of what is
> > promised in the docs. Although it's unlikely that someone implemented
> > an application which deals with important data and "pressed Ctr+C" as
> > it's done in psql. So this might be not such a critical issue after
> > all. BTW what version of PostgreSQL are you using?
> >
>
> Which part of the docs does this contradict?

The documentation to synchronous_commit = remote_apply explicitly states [1]:

"""
When set to remote_apply, commits will wait until replies from the
current synchronous standby(s) indicate they have received the commit
record of the transaction and applied it, so that it has become
visible to queries on the standby(s), and also written to durable
storage on the standbys.
"""

Here commit on the master happened before receiving replies from the standby(s).

[1]: 
https://www.postgresql.org/docs/13/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

-- 
Best regards,
Aleksander Alekseev




Re: Synchronous commit behavior during network outage

2021-04-21 Thread Ondřej Žižka

Hello Satyanarayana,

This can be an option for us in our case. But there also needs to be a 
process how to detect these "stuck commits" and how to invalidate/remove 
them, because in reality, if the app/user would not see the change in 
the database, it/he/she will try to insert/delete it again. If it just 
stuck without management, it will create a queue which can cause, that 
in the queue there will be 2 similar inserts/deletes which can again 
cause issues (like with the primary key I mentioned before).


So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in 
coordination with the infrastructure team disconnects all clients and 
prevent new ones to create a new connection).
- DBA will recognize, that there is an issue in communication between 
the primary and the sync replica (caused the issue with the propagation 
of commits)

- DBA will see that there are some commits that are in the "stuck state"
- DBA removes these stuck commits. Note: Because the client never 
received a confirmation about the successful commit -> changes in the DB 
client tried to perform can't be considered as successful.
- DBA and infrastructure team restore the communication between server 
nodes to be able to propagate commits from the primary node to sync replica.

- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but I 
would say, that this is an acceptable solution. Would your patch be able 
to perform that?


Thank you
Ondrej


On 20/04/2021 22:19, SATYANARAYANA NARLAPURAM wrote:
One idea here is to make the backend ignore query cancellation/backend 
termination while waiting for the synchronous commit ACK. This way 
client never reads the data that was never flushed remotely. The 
problem with this approach is that your backends get stuck until your 
commit log record is flushed on the remote side. Also, the client can 
see the data not flushed remotely if the server crashes and comes back 
online. You can prevent the latter case by making a SyncRepWaitForLSN 
before opening up the connections to the non-superusers. I have a 
working prototype of this logic, if there is enough interest I can 
post the patch.






On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka > wrote:


I am sorry, I forgot mentioned, that in the second situation I
added a
primary key to the table.

Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:
> Hello Aleksander,
>
> Thank you for the reaction. This was tested on version 13.2.
>
> There are also other possible situations with the same setup and
> similar issue:
>
> -
> When the background process on server fails
>
> On postgresql1:
> tecmint=# select * from a; --> LAN on sync replica is OK
>  id
> 
>   1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is
DOWN
> and insert is waiting. During this time kill the background
process on
> the PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and
> terminating connection due to administrator command
> DETAIL:  The transaction has already committed locally, but
might not
> have been replicated to the standby.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> tecmint=# select * from a;
>  id
> 
>   1
>   2
> (2 rows)
>
> tecmint=# ---> LAN on sync replica is still DOWN
>
> The potgres session will restore after the background process
failed.
> When you run select on master, it still looks OK. But data is still
> not replicated on the sync replica. If we lost the master now, we
> would lost this data as well.
>
> **
> Another case
> **
>
> Kill the client process.
>
> tecmint=# select * from a;
>  id
> 
>   1
>   2
>   3
> (3 rows)
> tecmint=#    --> Disconnect the sync replica now.
LAN on
> replica is DOWN
> tecmint=# insert into a values (4); --> Kill the client process
> Terminated
> xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432
-d tecmint
> Password for user postgres:
> psql (13.2 (Debian 13.2-1.pgdg100+1))
> Type "help" for help.
>
> tecmint=# select * from a;
>  id
> 
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> replica ON.
>
> --
>
> Result from sync replica after the LAN is again UP:
> tecmint=# select * from a;
>  id
> 
>   1
>  

Re: Synchronous commit behavior during network outage

2021-04-20 Thread SATYANARAYANA NARLAPURAM
One idea here is to make the backend ignore query cancellation/backend
termination while waiting for the synchronous commit ACK. This way client
never reads the data that was never flushed remotely. The problem with this
approach is that your backends get stuck until your commit log record is
flushed on the remote side. Also, the client can see the data not flushed
remotely if the server crashes and comes back online. You can prevent the
latter case by making a SyncRepWaitForLSN before opening up the connections
to the non-superusers. I have a working prototype of this logic, if there
is enough interest I can post the patch.





On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka 
wrote:

> I am sorry, I forgot mentioned, that in the second situation I added a
> primary key to the table.
>
> Ondrej
>
>
> On 20/04/2021 18:49, Ondřej Žižka wrote:
> > Hello Aleksander,
> >
> > Thank you for the reaction. This was tested on version 13.2.
> >
> > There are also other possible situations with the same setup and
> > similar issue:
> >
> > -
> > When the background process on server fails
> >
> > On postgresql1:
> > tecmint=# select * from a; --> LAN on sync replica is OK
> >  id
> > 
> >   1
> > (1 row)
> >
> > tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN
> > and insert is waiting. During this time kill the background process on
> > the PostgreSQL server for this session
> > WARNING:  canceling the wait for synchronous replication and
> > terminating connection due to administrator command
> > DETAIL:  The transaction has already committed locally, but might not
> > have been replicated to the standby.
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> > (2 rows)
> >
> > tecmint=# ---> LAN on sync replica is still DOWN
> >
> > The potgres session will restore after the background process failed.
> > When you run select on master, it still looks OK. But data is still
> > not replicated on the sync replica. If we lost the master now, we
> > would lost this data as well.
> >
> > **
> > Another case
> > **
> >
> > Kill the client process.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> > tecmint=#--> Disconnect the sync replica now. LAN on
> > replica is DOWN
> > tecmint=# insert into a values (4); --> Kill the client process
> > Terminated
> > xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d
> tecmint
> > Password for user postgres:
> > psql (13.2 (Debian 13.2-1.pgdg100+1))
> > Type "help" for help.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> >
> > tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> > replica ON.
> >
> > --
> >
> > Result from sync replica after the LAN is again UP:
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> >   4
> > (4 rows)
> >
> >
> > In this situation, try to insert the number 4 again to the table.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> >
> > tecmint=# insert into a values (4);
> > ERROR:  duplicate key value violates unique constraint "a_pkey"
> > DETAIL:  Key (id)=(4) already exists.
> > tecmint=#
> >
> > This is really strange... Application can be confused, It is not
> > possible to insert record, which is not there, but some systems which
> > use the sync node as a read replica maybe already read that record
> > from the sync replica database and done some steps which can cause
> > issues and can be hard to track.
> >
> > If I say, that it would be hard to send the CTRL+C to the database
> > from the client, I need to say, that the 2 situations I described here
> > can happen in real.
> >
> > What do you think?
> >
> > Thank you and regards
> > Ondrej
> >
> > On 20/04/2021 17:23, Aleksander Alekseev wrote:
> >> Hi Ondřej,
> >>
> >> Thanks for the report. It seems to be a clear violation of what is
> >> promised in the docs. Although it's unlikely that someone implemented
> >> an application which deals with important data and "pressed Ctr+C" as
> >> it's done in psql. So this might be not such a critical issue after
> >> all. BTW what version of PostgreSQL are you using?
> >>
> >>
> >> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka
> >>  wrote:
> >>> Hello all,
> >>> I would like to know your opinion on the following behaviour I see
> >>> for PostgreSQL setup with synchronous replication.
> >>>
> >>> This behaviour happens in a special use case. In this use case,
> >>> there are 2 synchronous replicas with the following config (truncated):
> >>>
> >>> - 2 nodes
> >>> - synchronous_standby_names='*'
> >>> - synchronous_commit=remote_apply
> >>>
> >>>
> >>> With 

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka
I am sorry, I forgot mentioned, that in the second situation I added a 
primary key to the table.


Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:

Hello Aleksander,

Thank you for the reaction. This was tested on version 13.2.

There are also other possible situations with the same setup and 
similar issue:


-
When the background process on server fails

On postgresql1:
tecmint=# select * from a; --> LAN on sync replica is OK
 id

  1
(1 row)

tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN 
and insert is waiting. During this time kill the background process on 
the PostgreSQL server for this session
WARNING:  canceling the wait for synchronous replication and 
terminating connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
tecmint=# select * from a;
 id

  1
  2
(2 rows)

tecmint=# ---> LAN on sync replica is still DOWN

The potgres session will restore after the background process failed. 
When you run select on master, it still looks OK. But data is still 
not replicated on the sync replica. If we lost the master now, we 
would lost this data as well.


**
Another case
**

Kill the client process.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)
tecmint=#    --> Disconnect the sync replica now. LAN on 
replica is DOWN

tecmint=# insert into a values (4); --> Kill the client process
Terminated
xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
Password for user postgres:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# --> Number 4 is not there. Now switch the LAN on sync 
replica ON.


--

Result from sync replica after the LAN is again UP:
tecmint=# select * from a;
 id

  1
  2
  3
  4
(4 rows)


In this situation, try to insert the number 4 again to the table.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# insert into a values (4);
ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(4) already exists.
tecmint=#

This is really strange... Application can be confused, It is not 
possible to insert record, which is not there, but some systems which 
use the sync node as a read replica maybe already read that record 
from the sync replica database and done some steps which can cause 
issues and can be hard to track.


If I say, that it would be hard to send the CTRL+C to the database 
from the client, I need to say, that the 2 situations I described here 
can happen in real.


What do you think?

Thank you and regards
Ondrej

On 20/04/2021 17:23, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka 
 wrote:

Hello all,
I would like to know your opinion on the following behaviour I see 
for PostgreSQL setup with synchronous replication.


This behaviour happens in a special use case. In this use case, 
there are 2 synchronous replicas with the following config (truncated):


- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between 
master and replica):

-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been 
applied to replica.

INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press 
CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user 
request
DETAIL:  The transaction has already committed locally, but might 
not have been replicated to the standby.

INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for 
synchronous replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has 
already committed locally, but might not have been replicated to the 
standby.


postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press 
CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user 
request
DETAIL:  The transaction has already committed locally, but might 

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka

Hello Maksim,

I know your post [1]. That thread is why there we performed more tests 
(see another my email in this thread). We are trying to somehow 
implement RPO=0 solution using PostgreSQL. Knowing this... Would be 
possible to build RPO=0 solution with PostgreSQL?


Ondrej

On 20/04/2021 18:51, Maksim Milyutin wrote:

Hi!


This is a known issue with synchronous replication [1]. You might 
inject into unmodified operation some dummy modification to overcome 
the negative sides of such partially committing without source code 
patching.



On 20.04.2021 19:23, Aleksander Alekseev wrote:

Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql.



Some client libraries have feature to cancel session that has similar 
effect to "Ctrl+C" from psql after specified by client deadline 
expiration [2]. Hence, this case might be quite often when application 
interacts with database.



On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka 
 wrote:


 From the synchronous_commit=remote_write level and "higher", I would 
expect, that when the remote application (doesn't matter if flush, 
write or apply) would not be applied I would not receive a 
confirmation about the commit (even with a warning). Something like, 
if there is no commit from sync replica, there is no commit on 
primary and if someone performs the steps above, the whole 
transaction will not send a confirmation.



The warning have to be accounted here and performed commit have not to 
be treated as *successful*.



1. 
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru


2. 
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com








Re: Synchronous commit behavior during network outage

2021-04-20 Thread Ondřej Žižka

Hello Aleksander,

Thank you for the reaction. This was tested on version 13.2.

There are also other possible situations with the same setup and similar 
issue:


-
When the background process on server fails

On postgresql1:
tecmint=# select * from a; --> LAN on sync replica is OK
 id

  1
(1 row)

tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and 
insert is waiting. During this time kill the background process on the 
PostgreSQL server for this session
WARNING:  canceling the wait for synchronous replication and terminating 
connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
tecmint=# select * from a;
 id

  1
  2
(2 rows)

tecmint=# ---> LAN on sync replica is still DOWN

The potgres session will restore after the background process failed. 
When you run select on master, it still looks OK. But data is still not 
replicated on the sync replica. If we lost the master now, we would lost 
this data as well.


**
Another case
**

Kill the client process.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)
tecmint=#    --> Disconnect the sync replica now. LAN on 
replica is DOWN

tecmint=# insert into a values (4); --> Kill the client process
Terminated
xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
Password for user postgres:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# --> Number 4 is not there. Now switch the LAN on sync replica ON.

--

Result from sync replica after the LAN is again UP:
tecmint=# select * from a;
 id

  1
  2
  3
  4
(4 rows)


In this situation, try to insert the number 4 again to the table.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# insert into a values (4);
ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(4) already exists.
tecmint=#

This is really strange... Application can be confused, It is not 
possible to insert record, which is not there, but some systems which 
use the sync node as a read replica maybe already read that record from 
the sync replica database and done some steps which can cause issues and 
can be hard to track.


If I say, that it would be hard to send the CTRL+C to the database from 
the client, I need to say, that the 2 situations I described here can 
happen in real.


What do you think?

Thank you and regards
Ondrej

On 20/04/2021 17:23, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:

Hello all,
I would like to know your opinion on the following behaviour I see for 
PostgreSQL setup with synchronous replication.

This behaviour happens in a special use case. In this use case, there are 2 
synchronous replicas with the following config (truncated):

- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between master and 
replica):
-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been applied to 
replica.
INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous 
replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already 
committed locally, but might not have been replicated to the standby.

postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
INSERT 0 1
postgres=# insert into a values (2); --LAN down, waiting for sync replica, 
second attempt, cancel it as well 

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin



On 20.04.2021 19:38, Tomas Vondra wrote:


On 4/20/21 6:23 PM, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


Which part of the docs does this contradict?



I think, Aleksandr refers to the following phrase in docs:

"The guarantee we offer is that the application will not receive 
explicit acknowledgment of the successful commit of a transaction until 
the WAL data is known to be safely received by all the synchronous 
standbys." [1]


And IMO confusing here regards to the notion of `successful commit`. 
Does warning attached to received commit message make it not 
*successful*? I think we have to explicitly mention cases about 
cancellation and termination session in docs to avoid ambiguity in 
understanding of phrase above.



1. 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA


--
Regards,
Maksim Milyutin





Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin

Hi!


This is a known issue with synchronous replication [1]. You might inject 
into unmodified operation some dummy modification to overcome the 
negative sides of such partially committing without source code patching.



On 20.04.2021 19:23, Aleksander Alekseev wrote:

Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql.



Some client libraries have feature to cancel session that has similar 
effect to "Ctrl+C" from psql after specified by client deadline 
expiration [2]. Hence, this case might be quite often when application 
interacts with database.




On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:

 From the synchronous_commit=remote_write level and "higher", I would expect, 
that when the remote application (doesn't matter if flush, write or apply) would not be 
applied I would not receive a confirmation about the commit (even with a warning). 
Something like, if there is no commit from sync replica, there is no commit on primary 
and if someone performs the steps above, the whole transaction will not send a 
confirmation.



The warning have to be accounted here and performed commit have not to 
be treated as *successful*.



1. 
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru


2. 
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com



--
Regards,
Maksim Milyutin





Re: Synchronous commit behavior during network outage

2021-04-20 Thread Tomas Vondra



On 4/20/21 6:23 PM, Aleksander Alekseev wrote:
> Hi Ondřej,
> 
> Thanks for the report. It seems to be a clear violation of what is
> promised in the docs. Although it's unlikely that someone implemented
> an application which deals with important data and "pressed Ctr+C" as
> it's done in psql. So this might be not such a critical issue after
> all. BTW what version of PostgreSQL are you using?
> 

Which part of the docs does this contradict?

With Ctrl+C the application *did not* receive confirmation - the commit
was interrupted before fully completing. In a way, it's about the same
situation as if a regular commit was interrupted randomly. It might have
happened before the commit log got updated, or maybe right after it,
which determines the outcome.

What I find a bit strange is that this inserts 1, 2, 2, 2 locally, and
yet we end up with just two rows with 2 (before the update). I don't see
why a network outage should have such consequence.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Synchronous commit behavior during network outage

2021-04-20 Thread Aleksander Alekseev
Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:
>
> Hello all,
> I would like to know your opinion on the following behaviour I see for 
> PostgreSQL setup with synchronous replication.
>
> This behaviour happens in a special use case. In this use case, there are 2 
> synchronous replicas with the following config (truncated):
>
> - 2 nodes
> - synchronous_standby_names='*'
> - synchronous_commit=remote_apply
>
>
> With this setup run the following steps (LAN down - LAN between master and 
> replica):
> -
> postgres=# truncate table a;
> TRUNCATE TABLE
> postgres=# insert into a values (1); -- LAN up, insert has been applied to 
> replica.
> INSERT 0 1
> Vypnu LAN na serveru se standby:
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> There will be warning that commit was performed only locally:
> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous 
> replication due to user request
> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already 
> committed locally, but might not have been replicated to the standby.
>
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# insert into a values (2); --LAN down, waiting for sync replica, 
> second attempt, cancel it as well (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, 
> cancel it (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> UPDATE 2
> postgres=# update a set n=3 where n=2; -- run the same update,because data 
> from the previous attempt was commited on master, it is sucessfull, but no 
> changes
> UPDATE 0
> postgres=# select * from a;
>  n
> ---
>  1
>  3
>  3
> (3 rows)
> postgres=#
> 
>
> Now, there is only value 1 in the sync replica table (no other values), data 
> is not in sync. This is expected, after the LAN restore, data will come sync 
> again, but if the main/primary node will fail and we failover to replica 
> before the LAN is back up or the storage for this node would be destroyed and 
> data would not sync to replica before it, we will lose data even if the 
> client received successful commit (with a warning).
> From the synchronous_commit=remote_write level and "higher", I would expect, 
> that when the remote application (doesn't matter if flush, write or apply) 
> would not be applied I would not receive a confirmation about the commit 
> (even with a warning). Something like, if there is no commit from sync 
> replica, there is no commit on primary and if someone performs the steps 
> above, the whole transaction will not send a confirmation.
>
> This can cause issues if the application receives a confirmation about the 
> success and performs some follow-up steps e.g. create a user account and 
> sends a request to the mail system to create an account or create a VPN 
> account. If the scenario above happens, there can exist a VPN account that 
> does not have any presence in the central database and can be a security 
> issue.
>
> I hope I explained it sufficiently. :-)
>
> Do you think, that would be possible to implement a process that would solve 
> this use case?
>
> Thank you
> Ondrej



-- 
Best regards,
Aleksander Alekseev




Synchronous commit behavior during network outage

2021-04-19 Thread Ondřej Žižka

Hello all,
I would like to know your opinion on the following behaviour I see for 
PostgreSQL setup with synchronous replication.


This behaviour happens in a special use case. In this use case, there 
are 2 synchronous replicas with the following config (truncated):


- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between master 
and replica):

-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been applied 
to replica.

INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press CTRL+C)

^CCancel request sent
*WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

*INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for 
synchronous replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has 
already committed locally, but might not have been replicated to the 
standby.


postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

INSERT 0 1
postgres=# insert into a values (2); --LAN down, waiting for sync 
replica, second attempt, cancel it as well (CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

INSERT 0 1
postgres=# update a set n=3 where n=2; --LAN down, waiting for sync 
replica, cancel it (CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

UPDATE 2
postgres=# update a set n=3 where n=2; -- run the same update,because 
data from the previous attempt was commited on master, it is sucessfull, 
but no changes

UPDATE 0
postgres=# select * from a;
 n
---
 1
 3
 3
(3 rows)
postgres=#


Now, there is only value 1 in the sync replica table (no other values), 
data is not in sync. This is expected, after the LAN restore, data will 
come sync again, but if the main/primary node will fail and we failover 
to replica before the LAN is back up or the storage for this node would 
be destroyed and data would not sync to replica before it, we will lose 
data even if the client received successful commit (with a warning).
From the synchronous_commit=remote_write level and "higher", I would 
expect, that when the remote application (doesn't matter if flush, write 
or apply) would not be applied I would not receive a confirmation about 
the commit (even with a warning). Something like, if there is no commit 
from sync replica, there is no commit on primary and if someone performs 
the steps above, the whole transaction will not send a confirmation.


This can cause issues if the application receives a confirmation about 
the success and performs some follow-up steps e.g. create a user account 
and sends a request to the mail system to create an account or create a 
VPN account. If the scenario above happens, there can exist a VPN 
account that does not have any presence in the central database and can 
be a security issue.


I hope I explained it sufficiently. :-)

Do you think, that would be possible to implement a process that would 
solve this use case?


Thank you
Ondrej