Analyzing locking state, lock occurs when backend wants to send data to stat collector. So state is:
backend waits FD_WRITE event, stat collector waits FD_READ.

I suspect follow sequence of events in backend:
0 Let us work only with one socket, and socket associated with statically
  defined event object in pgwin32_waitforsinglesocket.
1. pgwin32_send:WSASend fails with WSAEWOULDBLOCK ( or its equivalent )
2. socket s becomes writable and Windows signals event defined statically
   in pgwin32_waitforsinglesocket.
3. pgwin32_waitforsinglesocket(): ResetEvent resets event
4. pgwin32_waitforsinglesocket(): WaitForMultipleObjectsEx waits indefinitely...


If I'm right, it's needed to move ResetEvent after WaitForMultipleObjectsEx. But comment in pgwin32_select() says that we should send something before test socket for FD_WRITE. pgwin32_send calls WSASend before pgwin32_waitforsinglesocket(), but there is a call of pgwin32_waitforsinglesocket in libpq/be-secure.c. So, attached patch adds call of WSASend with void buffer.

It's a pity, but locking problem occurs only on SMP box and requires several hours to reproduce. So we are in testing now.

What are opinions?

PS Backtraces
backend:

ntdll.dll!KiFastSystemCallRet
postgres.exe!pgwin32_waitforsinglesocket+0x197
postgres.exe!pgwin32_send+0xaf
postgres.exe!pgstat_report_waiting+0x1bd
postgres.exe!pgstat_report_tabstat+0xda
postgres.exe!PostgresMain+0x1040
postgres.exe!ClosePostmasterPorts+0x1bce
postgres.exe!SubPostmasterMain+0x1be
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49


logger:

ntdll.dll!KiFastSystemCallRet
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x54
postgres.exe!SysLoggerMain+0x422
postgres.exe!SubPostmasterMain+0x370
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49


bgwriter:


ntdll.dll!KiFastSystemCallRet
kernel32.dll!WaitForSingleObject+0x12
postgres.exe!pg_usleep+0x54
postgres.exe!BackgroundWriterMain+0x63a
postgres.exe!BootstrapMain+0x61f
postgres.exe!SubPostmasterMain+0x22c
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49


stat collector:

ntdll.dll!KiFastSystemCallRet
postgres.exe!pgwin32_select+0x4f3
postgres.exe!PgstatCollectorMain+0x32f
postgres.exe!SubPostmasterMain+0x32a
postgres.exe!main+0x22b
postgres.exe+0x1237
postgres.exe+0x1288
kernel32.dll!RegisterWaitForInputIdle+0x49


--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/
*** ./src/backend/port/win32/socket.c.orig      Mon Oct  9 10:39:53 2006
--- ./src/backend/port/win32/socket.c   Mon Oct  9 15:44:24 2006
***************
*** 132,137 ****
--- 132,159 ----
  
        current_socket = s;
  
+       /*
+        * See comments about FD_WRITE and WSASelectEvent
+        * in pgwin32_select()
+        */
+       if ( (what & FD_WRITE) != 0 ) {
+               char    c;
+               WSABUF  buf;
+               DWORD   sent;
+ 
+               buf.buf = &c;
+               buf.len = 0;
+               r = WSASend(s, &buf, 1, &sent, 0, NULL, NULL);
+ 
+               if (r == 0)         /* Completed - means things are fine! */
+                       return 1;
+               else if ( WSAGetLastError() != WSAEWOULDBLOCK )
+               {
+                       TranslateSocketError();
+                       return 0;
+               }
+       }
+ 
        if (WSAEventSelect(s, waitevent, what) == SOCKET_ERROR)
        {
                TranslateSocketError();
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to