On Wed, Mar 6, 2019 at 6:07 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.mu...@gmail.com> writes: > > You can see that poll() already knew the other end had closed the > > socket. Since this is clearly timing... let's see, yeah, I can make > > it fail every time by adding sleep(1) before the comment "Send the > > startup packet.". I assume that'll work on any Linux machine? > > Great idea, but no cigar --- doesn't do anything for me except make > the ssl test really slow. (I tried it on RHEL6 and Fedora 28 and, just > for luck, current macOS.) What this seems to prove is that the thing > that's different about eelpout is the particular kernel it's running, > and that that kernel has some weird timing behavior in this situation. > > I've also been experimenting with reducing libpq's SO_SNDBUF setting > on the socket, with more or less the same idea of making the sending > of the startup packet slower. No joy there either. > > Annoying. I'd be happier about writing code to fix this if I could > reproduce it :-(
Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. But I just tried the sleep(1) trick on an x86 box running the same version of Debian, OpenSSL etc and it didn't work. So eelpout (a super cheap virtualised 4-core ARMv8 system rented from scaleway.com running Debian Buster with a kernel identifying itself as 4.9.23-std-1 and libc6 2.28-7) is indeed starting to look pretty weird. Let me know if you want to log in and experiment on that machine. -- Thomas Munro https://enterprisedb.com