Analyzing locking state, lock occurs when backend wants to send data to stat
collector. So state is:
backend waits FD_WRITE event, stat collector waits FD_READ.
I suspect follow sequence of events in backend:
0 Let us work only with one socket, and socket associated with statically
defined event object in pgwin32_waitforsinglesocket.
1. pgwin32_send:WSASend fails with WSAEWOULDBLOCK ( or its equivalent )
2. socket s becomes writable and Windows signals event defined statically
in pgwin32_waitforsinglesocket.
3. pgwin32_waitforsinglesocket(): ResetEvent resets event
4. pgwin32_waitforsinglesocket(): WaitForMultipleObjectsEx waits indefinitely...
If I'm right, it's needed to move ResetEvent after WaitForMultipleObjectsEx. But
comment in pgwin32_select() says that we should send something before test
socket for FD_WRITE. pgwin32_send calls WSASend before
pgwin32_waitforsinglesocket(), but there is a call of
pgwin32_waitforsinglesocket in libpq/be-secure.c. So, attached patch adds call
of WSASend with void buffer.
It's a pity, but locking problem occurs only on SMP box and requires several
hours to reproduce. So we are in testing now.
What are opinions?
PS Backtraces
backend:
ntdll.dll!KiFastSystemCallRet
postgres.exe!pgwin32_waitforsinglesocket+0x197
postgres.exe!pgwin32_send+0xaf
postgres.exe!pgstat_report_waiting+0x1bd
postgres.exe!pgstat_report_tabstat+0xda
postgres.exe!PostgresMain+0x1040
postgres.exe!ClosePostmasterPorts+0x1bce
postgres.exe!SubPostmasterMain+0x1be
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49
logger:
ntdll.dll!KiFastSystemCallRet
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x54
postgres.exe!SysLoggerMain+0x422
postgres.exe!SubPostmasterMain+0x370
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49
bgwriter:
ntdll.dll!KiFastSystemCallRet
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x54
postgres.exe!BackgroundWriterMain+0x63a
postgres.exe!BootstrapMain+0x61f
postgres.exe!SubPostmasterMain+0x22c
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49
stat collector:
ntdll.dll!KiFastSystemCallRet
postgres.exe!pgwin32_select+0x4f3
postgres.exe!PgstatCollectorMain+0x32f
postgres.exe!SubPostmasterMain+0x32a
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
*** ./src/backend/port/win32/socket.c.orig Mon Oct 9 10:39:53 2006
--- ./src/backend/port/win32/socket.c Mon Oct 9 15:44:24 2006
***************
*** 132,137 ****
--- 132,159 ----
current_socket = s;
+ /*
+ * See comments about FD_WRITE and WSASelectEvent
+ * in pgwin32_select()
+ */
+ if ( (what & FD_WRITE) != 0 ) {
+ char c;
+ WSABUF buf;
+ DWORD sent;
+
+ buf.buf = &c;
+ buf.len = 0;
+ r = WSASend(s, &buf, 1, &sent, 0, NULL, NULL);
+
+ if (r == 0) /* Completed - means things are fine! */
+ return 1;
+ else if ( WSAGetLastError() != WSAEWOULDBLOCK )
+ {
+ TranslateSocketError();
+ return 0;
+ }
+ }
+
if (WSAEventSelect(s, waitevent, what) == SOCKET_ERROR)
{
TranslateSocketError();
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster