I wrote: > Last night I changed the stats collector process to use > WaitLatchOrSocket instead of a periodic forced wakeup to see whether > the postmaster has died. This morning I observe that several Windows > buildfarm members are showing regression test failures caused by > unexpected "pgstat wait timeout" warnings. Everybody else is fine.
> This suggests that there is something broken in the Windows > implementation of WaitLatchOrSocket. I wonder whether it also > tells us something we did not know about the underlying cause of > those messages. Not sure what though. Ideas? Can anyone who > knows Windows take another look at WaitLatchOrSocket? Anybody have any clues about that? If not, I think I'll have to revert the pgstat changes for beta1, which isn't really forward progress. I spent some time staring at the Windows WaitLatchOrSocket code myself. The only thing I could find that seemed wrong is that in the event array, we list the latch's event before pgwin32_signal_event. The Microsoft documentation I looked at says that if more than one event is ready, WaitforMultipleObjects reports the first such array member. This means that if the latch is already set when control gets here, signal handlers will not be serviced. That doesn't match what would happen on a Unix machine, so it seems like at least a violation of the POLA. Hence I think we oughta swap the order of those two array elements. (Same issue in PGSemaphoreLock, btw, and I'm suspicious of pgwin32_select.) I do not however see a way that that would explain the pgstat failures, because the stats collector's latch really shouldn't ever get set during normal regression test runs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers