On Wed, Dec 20, 2017 at 6:11 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Dec 19, 2017 at 11:50 PM, David Steele <da...@pgmasters.net>
> wrote:
> > On 12/19/17 4:56 AM, Magnus Hagander wrote:
> >> AFAICT, base backups running on the replication protocol are always
> >> reported as "idle" in pg_stat_activity. This seems to have been an
> >> oversight in the "include walsender backends in pg_stat_activity" in 10,
> >> which does include it for walsenders in general, just not for the ones
> >> sending base backups. (and was then improved on later with the "include
> >> all non-standard backends" patch).
> >>
> >> Unlike the regular walsender it also has to set it back to IDLE, since
> >> you can actually finish a base backup without disconnecting.
> >>
> >> PFA a patch that fixes this. I think this is bugfix-for-backpatch, I
> >> don't think it has a large risk of breaking things. Thoughts?
> >
> > +1 for this being a bug, albeit a minor one.
>
> +1 for adding calls to pgstat_report_activity() in WAL senders and
> tracking the activity of those processes.  Now I don't think that this
> is the correct location as what matters is tracking if replication
> commands are running or not, and not only BASE_BACKUP. So those calls
> should be instead in exec_replication_command().
>

Hmm. That does make sense. I guess I got stuck looking at the old code,
which is then *also* in the wrong place.

What about the attached?

Also, I noticed that the docs for exec_replication_command() says " *
Returns true if the cmd_string was recognized as WalSender command, false
 * if not.". But it doesn't actually do that, it ereport(ERROR):s if it's
not. There is one "return false", but it's after an ERROR.

So all other exit points of the procedure will be ERROR, in which case we
should set back to idle automatically.

Independent of this change, that function should perhaps be made to return
void.


>> Also, in setting this, there is no real way to differentiate between a
> >> regular walsender and a basebackup walsender, other than looking at the
> >> wait events. They're both listed as walsenders. Should there be?  (That
> >> might not be as easily backpatchable, so keeping that as a separate one)
> >
> > Maybe something like "walsender [backup]" or just "basebackup" since
> > walsender is pretty misleading?  It think it would be nice to be able to
> > tell them apart, though I don't think it should be back-patched.  People
> > might be relying on the name in the current versions.
>
> You can already do a join with pg_stat_replication.pid and look for
> the WAL sender state which is marked as "backup" in this case. So I am
> -1 for making the current code more complicated. Please note as well
> that pg_stat_activity.backend_type is set depending on the process
> type, not based on what the process is doing so you would need to
> invent a new logic.
>

Good point. I'm not sure why I didn't think of that.

It's still quite a bit weird that we call this process "walsender" when it
does other things as well. But the ship sailed on that many years ago,
changing that completely now would not be worth the breakage.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 6a252fcf45..ba39d07232 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1504,6 +1504,9 @@ exec_replication_command(const char *cmd_string)
 	initStringInfo(&reply_message);
 	initStringInfo(&tmpbuf);
 
+	/* Report to pgstat that this process is running */
+	pgstat_report_activity(STATE_RUNNING, NULL);
+
 	switch (cmd_node->type)
 	{
 		case T_IdentifySystemCmd:
@@ -1570,6 +1573,9 @@ exec_replication_command(const char *cmd_string)
 	/* Send CommandComplete message */
 	EndCommand("SELECT", DestRemote);
 
+	/* Report to pgstat that this process is now idle */
+	pgstat_report_activity(STATE_IDLE, NULL);
+
 	return true;
 }
 
@@ -2089,9 +2095,6 @@ WalSndLoop(WalSndSendDataCallback send_data)
 	last_reply_timestamp = GetCurrentTimestamp();
 	waiting_for_ping_response = false;
 
-	/* Report to pgstat that this process is running */
-	pgstat_report_activity(STATE_RUNNING, NULL);
-
 	/*
 	 * Loop until we reach the end of this timeline or the client requests to
 	 * stop streaming.

Reply via email to