On Sun, Jul 29, 2018 at 6:14 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > As a way of poking this thread, here are some more thoughts.
I am keen to move this forward, not only because it is something we need to get fixed, but also because I have some other pending patches in this area and I want this sorted out first. Here are some small fix-up patches for Andres's patchset: 1. Use FD_CLOEXEC instead of the non-portable Linuxism SOCK_CLOEXEC. 2. Fix the self-deadlock hazard reported by Dmitry Dolgov. Instead of the checkpoint trying to send itself a CKPT_REQUEST_SYN message through the socket (whose buffer may be full), I included the ckpt_started counter in all messages. When AbsorbAllFsyncRequests() drains the socket, it stops at messages with the current ckpt_started value. 3. Handle postmaster death while waiting. 4. I discovered that macOS would occasionally return EMSGSIZE for sendmsg(), but treating that just like EAGAIN seems to work the next time around. I couldn't make that happen on FreeBSD (I mention that because the implementation is somehow related). So handle that weird case on macOS only for now. Testing on other Unixoid systems would be useful. The case that produced occasional EMSGSIZE on macOS was: shared_buffers=1MB, max_files_per_process=32, installcheck-parallel. Based on man pages that seems to imply an error in the client code but I don't see it. (I also tried to use SOCK_SEQPACKET instead of SOCK_STREAM, but it's not supported on macOS. I also tried to use SOCK_DGRAM, but that produced occasional ENOBUFS errors and retrying didn't immediately succeed leading to busy syscall churn. This is all rather unsatisfying, since SOCK_STREAM is not guaranteed by any standard to be atomic, and we're writing messages from many backends into the socket so we're assuming atomicity. I don't have a better idea that is portable.) There are a couple of FIXMEs remaining, and I am aware of three more problems: * Andres mentioned to me off-list that there may be a deadlock risk where the checkpointer gets stuck waiting for an IO lock. I'm going to look into that. * Windows. Patch soon. * The ordering problem that I mentioned earlier: the patchset wants to keep the *oldest* fd, but it's really the oldest it has received. An idea Andres and I discussed is to use a shared atomic counter to assign a number to all file descriptors just before their first write, and send that along with it to the checkpointer. Patch soon. -- Thomas Munro http://www.enterprisedb.com
0001-Use-portable-close-on-exec-syscalls.patch
Description: Binary data
0002-Fix-deadlock-in-AbsorbAllFsyncRequests.patch
Description: Binary data
0003-Handle-postmaster-death-CFI-improve-error-messages-a.patch
Description: Binary data
0004-Handle-EMSGSIZE-on-macOS.-Fix-misleading-error-messa.patch
Description: Binary data