Re: [HACKERS] FATAL: could not send end-of-streaming message to primary: no COPY in progress
At Wed, 20 Apr 2016 16:16:40 +0900, Fujii Masaowrote in > On Thu, Mar 31, 2016 at 9:15 AM, Thomas Munro > wrote: > > Hi hackers, > > > > If you shut down a primary server, a standby that is streaming from it > > says54: > > > > LOG: replication terminated by primary server > > DETAIL: End of WAL reached on timeline 1 at 0/14F4B68. > > FATAL: could not send end-of-streaming message to primary: no COPY in > > progress > > > > Isn't that FATAL ereport a bug? > > ISTM that the cause is that walsender exits and replication connection is > closed just after "COPY 0" is sent. That is, then after receiving "COPY 0", > walreceiver tries to send an end-of-copy message to the primary, but fails > because the connection has been already closed. Though the message is followed by repetitions of other FATAL messages, the message above itself seems a bit alarming. > > How is clean server shutdown supposed to work? > > One option is to make walsender wait for end-of-copy message from walreceiver > before it closes the connection and exits, after sending "COPY 0" message. > But one question is; how should walsender behave when walreceiver gets stuck > and cannot reply an end-of-copy message to walsender? Probably we need > the timeout (maybe we can use wal_sender_timeout here but not sure yet > if it's appropriate or not). -1. It is totally useless other than to avoid the FATAL message. > Another option is to prevent walreceiver from sending an end-of-copy message. > If "COPY 0" always means the exit of walsender and the termination of > the connection, there seems to be no need to send back an end-of-copy message. > I've not checked yet how this interferes with other replication logics, > though. Looking into walsender.c, walsender thinks "COPY 0" is a signal of its death coming just after, that is, proc_exit(0). On the other hand the comment at the beginning of walreceiver.c says that, * If the primary server ends streaming, but doesn't disconnect, walreceiver * goes into "waiting" mode, and waits for the startup process to give new * instructions. The startup process will treat that the same as * disconnection, and will rescan the archive/pg_xlog directory. But when the * startup process wants to try streaming replication again, it will just * nudge the existing walreceiver process that's waiting, instead of launching * a new one. If we assume this is an useful behavior and want to keep it, a termination after an end of XLOG streaming is just the same with that for psql. | FATAL: terminating connection due to administrator command | server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. Or, we should provide another command to inform a termination. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: could not send end-of-streaming message to primary: no COPY in progress
On Thu, Mar 31, 2016 at 9:15 AM, Thomas Munrowrote: > Hi hackers, > > If you shut down a primary server, a standby that is streaming from it says54: > > LOG: replication terminated by primary server > DETAIL: End of WAL reached on timeline 1 at 0/14F4B68. > FATAL: could not send end-of-streaming message to primary: no COPY in > progress > > Isn't that FATAL ereport a bug? ISTM that the cause is that walsender exits and replication connection is closed just after "COPY 0" is sent. That is, then after receiving "COPY 0", walreceiver tries to send an end-of-copy message to the primary, but fails because the connection has been already closed. > How is clean server shutdown supposed to work? One option is to make walsender wait for end-of-copy message from walreceiver before it closes the connection and exits, after sending "COPY 0" message. But one question is; how should walsender behave when walreceiver gets stuck and cannot reply an end-of-copy message to walsender? Probably we need the timeout (maybe we can use wal_sender_timeout here but not sure yet if it's appropriate or not). Another option is to prevent walreceiver from sending an end-of-copy message. If "COPY 0" always means the exit of walsender and the termination of the connection, there seems to be no need to send back an end-of-copy message. I've not checked yet how this interferes with other replication logics, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FATAL: could not send end-of-streaming message to primary: no COPY in progress
Hi hackers, If you shut down a primary server, a standby that is streaming from it says54: LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/14F4B68. FATAL: could not send end-of-streaming message to primary: no COPY in progress Isn't that FATAL ereport a bug? I haven't worked out the root cause but the immediate problem seems to be libpqrcv_endstreaming calls PQputCopyEnd which doesn't like the state that the libpq connection is in, namely PGASYNC_BUSY. That state seems to have been established by the call to walrcv_receive that returned -1 (end of copy). It doesn't happen in the similar case of promotion of the remote server. How is clean server shutdown supposed to work? It looks like walsender sends COPY 0 and then just hangs up. Meanwhile, walreceiver has to distinguish between that case and the the new timeline case which involves a further exchange of messages. Is an explicit message at the end of the copy stream saying either "goodbye" or "but wait, there's more" lacking here? Or is there some other way that walreceiver could distinguish between clean shutdown of remote server (no error necessary), unclean shutdown of remote server, and timeline negotiation? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers