On 17/03/17 16:07, Tom Lane wrote:
> Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes:
>> I have confirmed on jacana Petr's observation that adding a timeout to
>> the WaitLatchOrSocket cures the problem.
> Does anyone have a theory as to why that cures the problem?

I now have theory and even PoC patch for it.

The long wait of WaitLatchOrSocket happens after connection state
switches from CONNECTION_STARTED to CONNECTION_MADE in both cases the
return value of PQconnectPoll is PGRES_POLLING_WRITING so we wait for

Now the documentation for WSAEventSelect says "The FD_WRITE network
event is handled slightly differently. An FD_WRITE network event is
recorded when a socket is first connected with a call to the connect,
ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
when a socket is accepted with accept, AcceptEx, or WSAAccept function
and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
available. Therefore, an application can assume that sends are possible
starting from the first FD_WRITE network event setting and lasting until
a send returns WSAEWOULDBLOCK. After such a failure the application will
find out that sends are again possible when an FD_WRITE network event is
recorded and the associated event object is set."

But while PQconnectPoll does connect() before setting connection status
eventually happens, it does not do any writes in the code block that
FD_WRITE event is never recorded as per quoted documentation.

Then what remains is why it works in libpq. If you look at
pgwin32_select which is eventually called by libpq code, it actually
handles the situation by trying empty WSASend on any socket that is
supposed to wait for FD_WRITE and only calling the
WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
the WSASend succeeds it immediately returns ok.

So I did something similar in attached for WaitEventSetWaitBlock() and
it indeed solves the issue my windows test machine. Now the
coding/placement probably isn't the best (and there are no comments),
but maybe somebody will find proper place for this now that we know the

  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index ea7f930..300b866 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1401,6 +1401,37 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			WaitEventAdjustWin32(set, cur_event);
 			cur_event->reset = false;
+		if (cur_event->events & WL_SOCKET_WRITEABLE)
+		{
+			char		c;
+			WSABUF		buf;
+			DWORD		sent;
+			int			r;
+			buf.buf = &c;
+			buf.len = 0;
+			r = WSASend(cur_event->fd, &buf, 1, &sent, 0, NULL, NULL);
+			if (r == 0)
+			{
+				occurred_events->pos = cur_event->pos;
+				occurred_events->user_data = cur_event->user_data;
+				occurred_events->events = WL_SOCKET_WRITEABLE;
+				occurred_events->fd = cur_event->fd;
+				return 1;
+			}
+			else
+			{
+				if (WSAGetLastError() != WSAEWOULDBLOCK)
+					/*
+					 * Not completed, and not just "would block", so an error
+					 * occurred
+					 */
+					elog(ERROR, "failed writability check on socket: error code %u",
+						 WSAGetLastError());
+			}
+		}
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to