Re: Rare SSL failures on eelpout

2019-03-19 Thread Thomas Munro
On Wed, Mar 20, 2019 at 8:31 AM Tom Lane wrote: > So I'm inclined to (1) commit the patch as-proposed in HEAD, and > (2) hack the ssl test cases in v11 as you suggested. If we see field > complaints about this, we can consider reverting (2) in favor of > a back-patch once v12 beta is over. This

Re: Rare SSL failures on eelpout

2019-03-19 Thread Tom Lane
Thomas Munro writes: > On Tue, Mar 19, 2019 at 9:11 AM Tom Lane wrote: >>> One thing that isn't real clear to me is how much timing sensitivity >>> there is in "when no more input data is available". Can we assume that >>> if we've gotten ECONNRESET or an allied error from a write, then any >>>

Re: Rare SSL failures on eelpout

2019-03-18 Thread Tom Lane
Thomas Munro writes: > Yeah, now that we understand this properly I agree this is unlikely to > bother anyone in real life. I just want to make the build farm green. > I wondered about ssl_max_protocol_version = 'TLSv1.2', but that GUC's > too new. Yeah; also, forcing that would reduce our test

Re: Rare SSL failures on eelpout

2019-03-18 Thread Thomas Munro
On Tue, Mar 19, 2019 at 12:44 PM Tom Lane wrote: > Thomas Munro writes: > > Shouldn't we also back-patch the one-line change adding > > pqHandleSendFailure()? > > As I said before, I don't like that patch: at best it's an abuse of > pqHandleSendFailure, because that function is only meant to be

Re: Rare SSL failures on eelpout

2019-03-18 Thread Thomas Munro
On Tue, Mar 19, 2019 at 12:25 PM Thomas Munro wrote: > 2. Linux, FreeBSD and Darwin gave slightly different error sequences > when writing after the remote connection was closed (though I suspect > they'd behave much the same way for a connection to a remote host), > but all allowed the

Re: Rare SSL failures on eelpout

2019-03-18 Thread Tom Lane
Thomas Munro writes: > On Tue, Mar 19, 2019 at 9:11 AM Tom Lane wrote: >> My current feeling is that this is OK to put in HEAD but I think the >> risk-reward ratio isn't very good for the back branches. Even with >> an OpenSSL version where this makes a difference, the problematic >> behavior

Re: Rare SSL failures on eelpout

2019-03-18 Thread Thomas Munro
On Tue, Mar 19, 2019 at 9:11 AM Tom Lane wrote: > I wrote: > > ... I don't like pqHandleSendFailure all that much: it has strong > > constraints on what state libpq has to be in, as a consequence of which > > it's called from a bunch of ad-hoc places, and can't be called from > > some others.

Re: Rare SSL failures on eelpout

2019-03-18 Thread Tom Lane
I wrote: > ... I don't like pqHandleSendFailure all that much: it has strong > constraints on what state libpq has to be in, as a consequence of which > it's called from a bunch of ad-hoc places, and can't be called from > some others. It's kind of accidental that it will work here. > I was

Re: Rare SSL failures on eelpout

2019-03-17 Thread Tom Lane
I wrote: > Thomas Munro writes: >> This was an intentional change in TLS1.3, reducing round trips by >> verifying the client certificate later. > Ugh. So probably we can reproduce it elsewhere if we use cutting-edge > OpenSSL versions. I installed OpenSSL 1.1.1a on my Mac laptop. I got

Re: Rare SSL failures on eelpout

2019-03-17 Thread Tom Lane
Thomas Munro writes: > On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro wrote: >> I opened a bug report with OpenSSL, let's see what they say: >> https://github.com/openssl/openssl/issues/8500 > This was an intentional change in TLS1.3, reducing round trips by > verifying the client certificate

Re: Rare SSL failures on eelpout

2019-03-17 Thread Thomas Munro
On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro wrote: > On Wed, Mar 6, 2019 at 9:21 AM Tom Lane wrote: > > Yeah, I've still been unable to reproduce even with the sleep idea, > > so eelpout is definitely looking like a special snowflake from here. > > In any case, there seems little doubt that

Re: Rare SSL failures on eelpout

2019-03-16 Thread Thomas Munro
On Wed, Mar 6, 2019 at 9:21 AM Tom Lane wrote: > Thomas Munro writes: > > Bleugh. I think this OpenSSL package might just be buggy on ARM. On > > x86, apparently the same version of OpenSSL and all other details of > > the test the same, I can see that SSL_connect() returns <= 0 > > (failure),

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 4:07 PM Shawn Debnath wrote: > On Wed, Mar 06, 2019 at 11:13:31AM +1300, Thomas Munro wrote: > > So... can anyone tell us what happens on Windows? > C:\Users\Shawn Debnath\Desktop>c:\Python27\python.exe tmunro-ssl-test.py > --client > Sending A... > 2 > Sending B... >

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 9:21 AM Tom Lane wrote: > The bug #15598 report is more troublesome, as we don't have a strong > reason to believe it's not common on Windows. However, I wonder whether > we can really do anything at all about that one. If I understand what > Andrew was hypothesizing in

Re: Rare SSL failures on eelpout

2019-03-05 Thread Tom Lane
Thomas Munro writes: > Bleugh. I think this OpenSSL package might just be buggy on ARM. On > x86, apparently the same version of OpenSSL and all other details of > the test the same, I can see that SSL_connect() returns <= 0 > (failure), and then we ask for that cert revoked message directly

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 7:05 AM Thomas Munro wrote: > On Wed, Mar 6, 2019 at 6:07 AM Tom Lane wrote: > > Annoying. I'd be happier about writing code to fix this if I could > > reproduce it :-( > > Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. Bleugh. I think this OpenSSL

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 6:07 AM Tom Lane wrote: > Thomas Munro writes: > > You can see that poll() already knew the other end had closed the > > socket. Since this is clearly timing... let's see, yeah, I can make > > it fail every time by adding sleep(1) before the comment "Send the > > startup

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Wed, Mar 6, 2019 at 3:33 AM Tom Lane wrote: > Thomas Munro writes: > > Disappointingly, that turned out to be just because 10 and earlier > > didn't care what the error message said. > > That is, you can reproduce the failure on old branches? That lets > out a half-theory I'd had, which was

Re: Rare SSL failures on eelpout

2019-03-05 Thread Tom Lane
Thomas Munro writes: > BTW, I went looking for other failures on the buildfarm I noticed that > even for eelpout it's only happening on master and REL_11_STABLE: Yeah, I'd noticed that. > Disappointingly, that turned out to be just because 10 and earlier > didn't care what the error message

Re: Rare SSL failures on eelpout

2019-03-05 Thread Thomas Munro
On Tue, Mar 5, 2019 at 11:35 AM Tom Lane wrote: > True. I've spent some time today running the ssl tests on various > machines here, without any luck reproducing. BTW, I went looking for other failures on the buildfarm I noticed that even for eelpout it's only happening on master and

Re: Rare SSL failures on eelpout

2019-03-04 Thread Thomas Munro
On Tue, Mar 5, 2019 at 11:35 AM Tom Lane wrote: > Thomas Munro writes: > > OK, here's something. I can reproduce it quite easily on this > > machine, and I can fix it like this: > > >libpq_gettext("could not send startup packet: %s\n"), > >SOCK_STRERROR(SOCK_ERRNO, sebuf,

Re: Rare SSL failures on eelpout

2019-03-04 Thread Tom Lane
Thomas Munro writes: > On Tue, Mar 5, 2019 at 10:08 AM Tom Lane wrote: >> It's all very confusing, but I think there's a nontrivial chance >> that this is an OpenSSL bug, especially since we haven't been able >> to replicate it elsewhere. > Hmm. Yes, it is strange that we haven't seen it

Re: Rare SSL failures on eelpout

2019-03-04 Thread Thomas Munro
On Tue, Mar 5, 2019 at 10:08 AM Tom Lane wrote: > I wrote: > > Thomas Munro writes: > >> That suggests that we could perhaps handle ECONNRESET both at startup > >> packet send time (for certificate rejection, eelpout's case) and at > >> initial query send (for idle timeout, bug #15598's case) by

Re: Rare SSL failures on eelpout

2019-03-04 Thread Tom Lane
I wrote: > Thomas Munro writes: >> That suggests that we could perhaps handle ECONNRESET both at startup >> packet send time (for certificate rejection, eelpout's case) and at >> initial query send (for idle timeout, bug #15598's case) by attempting >> to read. Does that make sense? > Hmm ...

Re: Rare SSL failures on eelpout

2019-03-03 Thread Tom Lane
Thomas Munro writes: > With a simple socket test program I can see that if you send a single > packet after the remote end has closed and after it had already read > everything you sent it up to now, you get EPIPE. If there was some > outstanding data from a previous send that it hadn't read yet

Re: Rare SSL failures on eelpout

2019-03-03 Thread Thomas Munro
On Wed, Jan 23, 2019 at 11:23 AM Thomas Munro wrote: > On Wed, Jan 23, 2019 at 4:07 AM Tom Lane wrote: > > The whole thing reminds me of the recent bug #15598: > > > > https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk > > Yeah, if errors get moved to later exchanges

Re: Rare SSL failures on eelpout

2019-01-22 Thread Thomas Munro
On Wed, Jan 23, 2019 at 4:07 AM Tom Lane wrote: > Thomas Munro writes: > > Hmm. Why is psql doing two sendto() calls without reading a response > > in between, when it's possible for the server to exit after the first, > > anyway? Seems like a protocol violation somewhere? > > Keep in mind

Re: Rare SSL failures on eelpout

2019-01-22 Thread Tom Lane
Thomas Munro writes: > Hmm. Why is psql doing two sendto() calls without reading a response > in between, when it's possible for the server to exit after the first, > anyway? Seems like a protocol violation somewhere? Keep in mind this is all down inside the SSL handshake, so if any protocol

Rare SSL failures on eelpout

2019-01-22 Thread Thomas Munro
(Moving this discussion from the pgsql-committers thread "pgsql: Update ssl test certificates and keys", which is innocent.) On Wed, Jan 16, 2019 at 10:37 AM Thomas Munro wrote: > On Fri, Jan 4, 2019 at 10:08 AM Thomas Munro > wrote: > > Thanks. FWIW I've just updated eelpout (a Debian testing