At Thu, 7 Sep 2023 10:53:41 +0000, "Hayato Kuroda (Fujitsu)" 
<kuroda.hay...@fujitsu.com> wrote in 
> My first idea is that to move the checking part to above, but this may not 
> handle
> the case the postmaster is still alive (now sure this is real issue). Do we 
> have to
> add a new indicator which ensures the identity of processes for windows?
> Please tell me how you feel.

It doesn't seem to work as expected. We still lose the relationship
between the PID file and the launched postmaster.

> > Now, I
> > also recall that the processes spawned by pg_ctl on Windows make the
> > status handling rather tricky to reason about..
> 
> Did you say about the below comment? Currently I have no idea to make
> codes more proper, sorry.
> 
> ```
>                * On Windows, we may be checking the postmaster's parent 
> shell, but
>                * that's fine for this purpose.
> ```

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..d54cc6ab54 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -32,6 +32,9 @@
 #ifdef WIN32					/* on Unix, we don't need libpq */
 #include "pqexpbuffer.h"
 #endif
+#ifdef WIN32
+#include <tlhelp32.h>
+#endif
 
 
 typedef enum
@@ -425,6 +428,91 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
+#ifdef WIN32
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ */
+DWORD
+win32_find_postmaster_pid(int shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	DWORD p_pid = shell_pid;
+	PROCESSENTRY32 ppe;
+	DWORD pm_pid = 0;
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* Create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, shell_pid);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid > 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: lancher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* Check if the process is gone for any reason */
+	if (!shell_exists)
+	{
+		/* Report the condition as server start failure */
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
+#endif
+
 /*
  * Start the postmaster and return its PID.
  *
@@ -609,7 +697,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = win32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,21 +711,15 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
 				 * status line (this assumes a v10 or later server).
 				 */
 				char	   *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
-
+				
 				if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
 					strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
 				{

Reply via email to