Re: archive_command / pg_stat_archiver & documentation
On Wed, Mar 03, 2021 at 09:13:09PM +0800, Julien Rouhaud wrote: > I think that we should consider this as committed. It should, so done now. -- Michael signature.asc Description: PGP signature
Re: archive_command / pg_stat_archiver & documentation
On Wed, Mar 03, 2021 at 07:37:02AM -0500, David Steele wrote: > On 3/1/21 8:29 PM, Michael Paquier wrote: > > On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote: > > > Maybe this can be better addressed than with a link in the > > > documentation. The final outcome is that it can be difficult to > > > monitor the archiver state in such case. That's orthogonal to this > > > patch but maybe we can add a new "archiver_start" timestamptz column > > > in pg_stat_archiver, so monitoring tools can detect a problem if it's > > > too far away from pg_postmaster_start_time() for instance? > > > > There may be other solutions as well. I have applied the doc patch > > for now. > > This was applied (except for a small part). Should we now consider this > committed? > I think that we should consider this as committed.
Re: archive_command / pg_stat_archiver & documentation
On 3/1/21 8:29 PM, Michael Paquier wrote: On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote: Maybe this can be better addressed than with a link in the documentation. The final outcome is that it can be difficult to monitor the archiver state in such case. That's orthogonal to this patch but maybe we can add a new "archiver_start" timestamptz column in pg_stat_archiver, so monitoring tools can detect a problem if it's too far away from pg_postmaster_start_time() for instance? There may be other solutions as well. I have applied the doc patch for now. This was applied (except for a small part). Should we now consider this committed? If not, can we get a new patch for the remaining changes? Regards, -- -David da...@pgmasters.net
Re: archive_command / pg_stat_archiver & documentation
Thanks ! Le mar. 2 mars 2021 à 04:10, Julien Rouhaud a écrit : > On Tue, Mar 2, 2021 at 9:29 AM Michael Paquier > wrote: > > > > On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote: > > > Maybe this can be better addressed than with a link in the > > > documentation. The final outcome is that it can be difficult to > > > monitor the archiver state in such case. That's orthogonal to this > > > patch but maybe we can add a new "archiver_start" timestamptz column > > > in pg_stat_archiver, so monitoring tools can detect a problem if it's > > > too far away from pg_postmaster_start_time() for instance? > > > > There may be other solutions as well. I have applied the doc patch > > for now. > > Thanks! >
Re: archive_command / pg_stat_archiver & documentation
On Tue, Mar 2, 2021 at 9:29 AM Michael Paquier wrote: > > On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote: > > Maybe this can be better addressed than with a link in the > > documentation. The final outcome is that it can be difficult to > > monitor the archiver state in such case. That's orthogonal to this > > patch but maybe we can add a new "archiver_start" timestamptz column > > in pg_stat_archiver, so monitoring tools can detect a problem if it's > > too far away from pg_postmaster_start_time() for instance? > > There may be other solutions as well. I have applied the doc patch > for now. Thanks!
Re: archive_command / pg_stat_archiver & documentation
On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote: > Maybe this can be better addressed than with a link in the > documentation. The final outcome is that it can be difficult to > monitor the archiver state in such case. That's orthogonal to this > patch but maybe we can add a new "archiver_start" timestamptz column > in pg_stat_archiver, so monitoring tools can detect a problem if it's > too far away from pg_postmaster_start_time() for instance? There may be other solutions as well. I have applied the doc patch for now. -- Michael signature.asc Description: PGP signature
Re: archive_command / pg_stat_archiver & documentation
On Mon, Mar 1, 2021 at 5:24 PM Benoit Lobréau wrote: > > I like the idea ! > > If it's not too complicated, I'd like to take a stab at it. Great! And it shouldn't be too complicated. Note that unfortunately this will likely not be included in pg14 as the last commitfest should begin today.
Re: archive_command / pg_stat_archiver & documentation
I like the idea ! If it's not too complicated, I'd like to take a stab at it. Le lun. 1 mars 2021 à 10:16, Julien Rouhaud a écrit : > On Mon, Mar 1, 2021 at 4:33 PM Benoit Lobréau > wrote: > > > > Le lun. 1 mars 2021 à 08:36, Michael Paquier a > écrit : > >> > >> On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote: > >> > Done here : https://commitfest.postgresql.org/32/3012/ > >> > >> Documenting that properly for the archive command, as already done for > >> restore_command, sounds good to me. I am not sure that there is much > >> point in doing a cross-reference to the archiving section for one > >> specific field of pg_stat_archiver. > > > > > > I wanted to add a warning that using pg_stat_archiver to monitor the > good health of the > > archiver comes with a caveat in the view documentation itself. But > couldn't find a concise > > way to do it. So I added a link. > > > > If you think it's unnecessary, that's ok. > > Maybe this can be better addressed than with a link in the > documentation. The final outcome is that it can be difficult to > monitor the archiver state in such case. That's orthogonal to this > patch but maybe we can add a new "archiver_start" timestamptz column > in pg_stat_archiver, so monitoring tools can detect a problem if it's > too far away from pg_postmaster_start_time() for instance? >
Re: archive_command / pg_stat_archiver & documentation
On Mon, Mar 1, 2021 at 4:33 PM Benoit Lobréau wrote: > > Le lun. 1 mars 2021 à 08:36, Michael Paquier a écrit : >> >> On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote: >> > Done here : https://commitfest.postgresql.org/32/3012/ >> >> Documenting that properly for the archive command, as already done for >> restore_command, sounds good to me. I am not sure that there is much >> point in doing a cross-reference to the archiving section for one >> specific field of pg_stat_archiver. > > > I wanted to add a warning that using pg_stat_archiver to monitor the good > health of the > archiver comes with a caveat in the view documentation itself. But couldn't > find a concise > way to do it. So I added a link. > > If you think it's unnecessary, that's ok. Maybe this can be better addressed than with a link in the documentation. The final outcome is that it can be difficult to monitor the archiver state in such case. That's orthogonal to this patch but maybe we can add a new "archiver_start" timestamptz column in pg_stat_archiver, so monitoring tools can detect a problem if it's too far away from pg_postmaster_start_time() for instance?
Re: archive_command / pg_stat_archiver & documentation
Le lun. 1 mars 2021 à 08:36, Michael Paquier a écrit : > On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote: > > Done here : https://commitfest.postgresql.org/32/3012/ > > Documenting that properly for the archive command, as already done for > restore_command, sounds good to me. I am not sure that there is much > point in doing a cross-reference to the archiving section for one > specific field of pg_stat_archiver. > I wanted to add a warning that using pg_stat_archiver to monitor the good health of the archiver comes with a caveat in the view documentation itself. But couldn't find a concise way to do it. So I added a link. If you think it's unnecessary, that's ok. > For the second paragraph, I would recommend to move that to a > different to outline this special case, leading to the > attached. > Good.
Re: archive_command / pg_stat_archiver & documentation
On Mon, Mar 1, 2021 at 3:36 PM Michael Paquier wrote: > > On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote: > > Done here : https://commitfest.postgresql.org/32/3012/ > > Documenting that properly for the archive command, as already done for > restore_command, sounds good to me. I am not sure that there is much > point in doing a cross-reference to the archiving section for one > specific field of pg_stat_archiver. Agreed. > For the second paragraph, I would recommend to move that to a > different to outline this special case, leading to the > attached. +1 > What do you think? LGTM!
Re: archive_command / pg_stat_archiver & documentation
On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote: > Done here : https://commitfest.postgresql.org/32/3012/ Documenting that properly for the archive command, as already done for restore_command, sounds good to me. I am not sure that there is much point in doing a cross-reference to the archiving section for one specific field of pg_stat_archiver. For the second paragraph, I would recommend to move that to a different to outline this special case, leading to the attached. What do you think? -- Michael diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 21094c6a9d..c5557d5444 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -639,6 +639,15 @@ test ! -f /mnt/server/archivedir/000100A90065 cp pg_wal/0 it will try again periodically until it succeeds. + +When the archive command is terminated by a signal (other than +SIGTERM that is used as part of a server +shutdown) or an error by the shell with an exit status greater than +125 (such as command not found), the archiver process aborts and gets +restarted by the postmaster. In such cases, the failure is +not reported in . + + The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to signature.asc Description: PGP signature
Re: archive_command / pg_stat_archiver & documentation
Done here : https://commitfest.postgresql.org/32/3012/ Le jeu. 25 févr. 2021 à 15:34, Julien Rouhaud a écrit : > On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau > wrote: > > > > Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud a > écrit : > >> > >> I thought that this behavior was documented, especially for the lack > >> of update of pg_stat_archiver. If it's not the case then we should > >> definitely fix that! > > > > I tried to do it in the attached patch. > > Building the doc worked fine on my computer. > > Great, thanks! Can you register it in the next commitfest to make > sure it won't be forgotten? >
Re: archive_command / pg_stat_archiver & documentation
On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau wrote: > > Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud a écrit : >> >> I thought that this behavior was documented, especially for the lack >> of update of pg_stat_archiver. If it's not the case then we should >> definitely fix that! > > I tried to do it in the attached patch. > Building the doc worked fine on my computer. Great, thanks! Can you register it in the next commitfest to make sure it won't be forgotten?
Re: archive_command / pg_stat_archiver & documentation
Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud a écrit : > Hi, > > On Wed, Feb 24, 2021 at 8:21 PM talk to ben wrote: > > > > The documentation describes how a return code > 125 on the > restore_command would prevent the server from starting [1] : > > > > " > > It is important that the command return nonzero exit status on failure. > The command will be called requesting files that are not present in the > archive; it must return nonzero when so asked. This is not an error > condition. An exception is that if the command was terminated by a signal > (other than SIGTERM, which is used as part of a database server shutdown) > or an error by the shell (such as command not found), then recovery will > abort and the server will not start up. > > " > > > > But, I dont see such a note on the archive_command side of thing. [2] > > > > It could happend in case the archive command is not checked beforehand > or if the archive command becomes unavailable while PostgreSQL is running. > rsync can also return 255 in some cases (bad ssh configuration or typos). > In this case a fatal error is emitted, the archiver stops and is restarted > by the postmaster. > > > > The view pg_stat_archiver is also not updated in this case. Is it on > purpose ? It could be problematic if someone uses it to check the archiver > process health. > > That's on purpose, see for instance that discussion: > https://www.postgresql.org/message-id/flat/55731BB8.1050605%40dalibo.com > Thanks for pointing that out, I should have checked. > > Should we document this ? (I can make a patch) > > I thought that this behavior was documented, especially for the lack > of update of pg_stat_archiver. If it's not the case then we should > definitely fix that! > I tried to do it in the attached patch. Building the doc worked fine on my computer. From 350cd7c47d09754ae21f30f260a86e187054257f Mon Sep 17 00:00:00 2001 From: benoit Date: Thu, 25 Feb 2021 12:08:03 +0100 Subject: [PATCH] Document archive_command failures in more details Document that, if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell with an exit status greater than 125 (such as command not found), then the archiver process will abort and the postmaster will restart it. In such cases, the failure will not be reported in pg_stat_archiver. --- doc/src/sgml/backup.sgml | 8 +++- doc/src/sgml/monitoring.sgml | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 3c8aaed0b6..94d5dcbdf0 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -636,7 +636,13 @@ test ! -f /mnt/server/archivedir/000100A90065 cp pg_wal/0 PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; -it will try again periodically until it succeeds. +it will try again periodically until it succeeds. +An exception is that if the command was terminated by +a signal (other than SIGTERM, which is used as +part of a database server shutdown) or an error by the shell with an exit +status greater than 125 (such as command not found), then the archiver +process will abort and the postmaster will restart it. In such cases, +the failure will not be reported in . diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 3513e127b7..391df3055b 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3251,7 +3251,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i failed_count bigint - Number of failed attempts for archiving WAL files + Number of failed attempts for archiving WAL files (See ) -- 2.25.4
Re: archive_command / pg_stat_archiver & documentation
Hi, On Wed, Feb 24, 2021 at 8:21 PM talk to ben wrote: > > The documentation describes how a return code > 125 on the restore_command > would prevent the server from starting [1] : > > " > It is important that the command return nonzero exit status on failure. The > command will be called requesting files that are not present in the archive; > it must return nonzero when so asked. This is not an error condition. An > exception is that if the command was terminated by a signal (other than > SIGTERM, which is used as part of a database server shutdown) or an error by > the shell (such as command not found), then recovery will abort and the > server will not start up. > " > > But, I dont see such a note on the archive_command side of thing. [2] > > It could happend in case the archive command is not checked beforehand or if > the archive command becomes unavailable while PostgreSQL is running. rsync > can also return 255 in some cases (bad ssh configuration or typos). In this > case a fatal error is emitted, the archiver stops and is restarted by the > postmaster. > > The view pg_stat_archiver is also not updated in this case. Is it on purpose > ? It could be problematic if someone uses it to check the archiver process > health. That's on purpose, see for instance that discussion: https://www.postgresql.org/message-id/flat/55731BB8.1050605%40dalibo.com > Should we document this ? (I can make a patch) I thought that this behavior was documented, especially for the lack of update of pg_stat_archiver. If it's not the case then we should definitely fix that!