Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, 13 May 2019 01:02:45 +0200, Guilhem Moulin wrote: > Thanks for your analysis, Steffen. Dropping the Debian-specific patch > is definitely the way to go for libwww/LWP. Thanks for the confirmation, Guilhem, and all your work on this issue, and thanks alot to Steffen for tracking down the real cuplrit of the troubles! I've now uploaded libwww-perl without the problematic patch (and this upload will close the bug). Cheers, gregor -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Tori Amos: Yes, Anastasia signature.asc Description: Digital Signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 14 May 2019 at 03:57:46 +0200, Steffen Ullrich wrote: >> Ah I see, thanks for the clarification. I thought you meant it could >> yield a deadlock. Aren't temporary failures also possible on plain >> sockets (though of course the extra SSL layer make it strictly more >> likely to happen)? IIRC if the checksum of the incoming packet >> mismatches, which causes the read() call to block until the packet is >> retransmitted. > > select only shows an fd ready if data are available for read in the socket > buffer. Data with wrong checksum are discarded by the kernel before they are > put into the socket buffer and thus don't cause select to show it ready for > read. > > select(2) explicitly says: > > A file descriptor is considered ready if it is possible to perform a > corresponding I/O operation (e.g., read(2) without blocking ... Hmm, however in the “Bugs” sections, it says it's in fact not the case, and that non-blocking I/O should be used to avoid temporary failures: Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block. >>> And while using blocking I/O with polling sockets might be fine with plain >>> sockets where each byte is part of application data it is not fine with SSL >>> since the unit in SSL is not a byte but a record and some records might >>> contain application data and some not. >> >> Why is that not fine, if the SSL_read() caller is ready for that (documented) >> outcome, and doesn't assume that the call will always block until some >> application data is received? > > IO::Socket::SSL is intended as abstraction which behaves as much as possible > as other IO::Socket classes. It is not intended that the developer has to be > familiar with the exact semantics of SSL_read (which also changed over > time, especially with OpenSSL 1.1.0 and again with OpenSSL 1.1.1). While it > is impossible to behave in exactly all cases sysread is usually not expect > to return a temporary error on a blocking socket. > […] > Yes, the intention was to reflect as much as possible what is expected from > IO::Socket::sysread and not what the SSL_read documentation says. Fair enough. Then it sounds like you'd want to set SSL_MODE_AUTO_RETRY explicitly and not rely on the OpenSSL old or new defaults :-) (Or loop in Perl if support for OpenSSL that are two decades old is desired.) For what it's worth, I interpreted Also, calls to sysread might fail, because it must first finish an SSL handshake. To understand these behaviors is essential, if you write applications which use event loops and/or non-blocking sockets. from the sysread() documentation as an invitation to read the low-level documentation and see what SSL_read() may return, also with blocking I/O :-) (After all since sysread() will never block if there is some unprocessed data left in the current SSL frame, that's already a hint that this sysread() has some peculiarities that are not found in the version with plain sockets.) The new documentation clarifies a bit the expectation, thanks! But I guess it would be clearer if the paragraphs I quoted above were explicitly said to only apply to non-blocking I/O. Also isn't the workaround you implemented earlier “deal with OpenSSL 1.1.1 switching on SSL_AUTO_RETRY by default by disabling it when non-blocking” https://github.com/noxxi/p5-io-socket-ssl/commit/09bc6a3203bc7bc89078317da42a3e96cdbf94fc a no-op? AFAICT setting SSL_MODE_AUTO_RETRY avoids SSL_read() returning SSL_ERROR_WANT_READ when a non-application data record is received, and instead makes it block until application data is received. For non-blocking I/O however, SSL_read() will of course never block and may — regardless of whether SSL_MODE_AUTO_RETRY is set — return SSL_ERROR_WANT_{READ,WRITE}. -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
> Ah I see, thanks for the clarification. I thought you meant it could > yield a deadlock. Aren't temporary failures also possible on plain > sockets (though of course the extra SSL layer make it strictly more > likely to happen)? IIRC if the checksum of the incoming packet > mismatches, which causes the read() call to block until the packet is > retransmitted. select only shows an fd ready if data are available for read in the socket buffer. Data with wrong checksum are discarded by the kernel before they are put into the socket buffer and thus don't cause select to show it ready for read. select(2) explicitly says: A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2) without blocking ... > > And while using blocking I/O with polling sockets might be fine with plain > > sockets where each byte is part of application data it is not fine with SSL > > since the unit in SSL is not a byte but a record and some records might > > contain application data and some not. > > Why is that not fine, if the SSL_read() caller is ready for that (documented) > outcome, and doesn't assume that the call will always block until some > application data is received? IO::Socket::SSL is intended as abstraction which behaves as much as possible as other IO::Socket classes. It is not intended that the developer has to be familiar with the exact semantics of SSL_read (which also changed over time, especially with OpenSSL 1.1.0 and again with OpenSSL 1.1.1). While it is impossible to behave in exactly all cases sysread is usually not expect to return a temporary error on a blocking socket. > Then I don't get it. AFAICT the current documentation reflects what > happens with blocking sockets and OpenSSL <1.1.1a, namely that > IO::Socket::SSL::sysread() returns undef when the TLS ≤1.2 session is > renegotiated. Are you saying that the intention was to keep retrying > (either Perl-side, or within SSL_read() via SSL_MODE_AUTO_RETRY with > OpenSSL ≥0.9.6) until application data is received or a fatal error is > encountered? Yes, the intention was to reflect as much as possible what is expected from IO::Socket::sysread and not what the SSL_read documentation says. > Not anything using IO::Socket::SSL, I'm afraid. I'm personally more > used to the lower-level APIs like libssl and Net::SSLeay. For libssl in > particular, a quick codesearch returns a few prominent programs that > reverted the new OpenSSL default. For applications using lower level libraries it is perfectly fine to follow changes in the behavior in these lower level libraries. The idea behind higher level abstractions is that this does not need to be done, i.e. it should hide lower level implementation details and just continue to work. It is not always possible but at least one can try. > Anyway this bug can probably be closed now, maybe we should follow up > elsewhere. Thanks for the discussion & analysis! Thank You too. Regards, Steffen
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, 13 May 2019 at 22:24:55 +0200, Steffen Ullrich wrote: > On Mon, May 13, 2019 at 03:18:14PM +0200, Guilhem Moulin > wrote: >> Uh, what? “Before” meaning with ≤TLSv1.2, or with OpenSSL <1.1.1a's >> default flags? libssl mentions no such thing beside the new default >> mode. And in fact the s_client() program, *from OpenSSL upstream >> itself*, does precisely that: a select loop in blocking I/O mode (unless >> the ‘-nbio’ flag is set). > > "Before" meaning already with previous OpenSSL versions. But the effect was > likely hard to notice: an incomplete SSL record would not cause a permanent > hang but only a blocking read until the rest of the data was received, i.e. > only some slow down in some situations. Ah I see, thanks for the clarification. I thought you meant it could yield a deadlock. Aren't temporary failures also possible on plain sockets (though of course the extra SSL layer make it strictly more likely to happen)? IIRC if the checksum of the incoming packet mismatches, which causes the read() call to block until the packet is retransmitted. > And while using blocking I/O with polling sockets might be fine with plain > sockets where each byte is part of application data it is not fine with SSL > since the unit in SSL is not a byte but a record and some records might > contain application data and some not. Why is that not fine, if the SSL_read() caller is ready for that (documented) outcome, and doesn't assume that the call will always block until some application data is received? >> The documentation (libssl's SSL_read()'s [0], Net::SSLeay::read() [1], >> as well as IO::Socket::SSL::sysread() [2]) all warn that reading from an >> SSL socket has a different behavior than usual read(2) system calls, in >> that read failures should be treated with care, as there are both >> retryable (eg. when no application data was received) and non-retryable >> errors. > > As I was the one who wrote the documentation for IO::Socket::SSL: the part > about sysread failing was supposed to be about non-blocking sockets. A > blocking socket was not expected to return nothing just because the > handshake was not done. Then I don't get it. AFAICT the current documentation reflects what happens with blocking sockets and OpenSSL <1.1.1a, namely that IO::Socket::SSL::sysread() returns undef when the TLS ≤1.2 session is renegotiated. Are you saying that the intention was to keep retrying (either Perl-side, or within SSL_read() via SSL_MODE_AUTO_RETRY with OpenSSL ≥0.9.6) until application data is received or a fatal error is encountered? > Do you know a relevant module or actually used application which has > problems because of the default mode of SSL_MODE_AUTO_RETRY the ability to > change it would fix the related problems and not introduce new ones, i.e. > where such a fix would be the right thing to do? Not anything using IO::Socket::SSL, I'm afraid. I'm personally more used to the lower-level APIs like libssl and Net::SSLeay. For libssl in particular, a quick codesearch returns a few prominent programs that reverted the new OpenSSL default. Apache2's mod_ssl: Bug:https://github.com/openssl/openssl/issues/7178 Commit: https://github.com/apache/httpd/commit/6ee9d597e01281f2ef2e146586129af6aed7854d#diff-b70bd458eb699e70c322ee797a3e0991 Neon: File: http://svn.webdav.org/repos/projects/neon/tags/0.30.2/src/ne_socket.c Change: http://lists.manyfish.co.uk/pipermail/neon-commits/2018-September/001077.html YottaDB Bug: https://github.com/YottaDB/YDB/commit/1d0a8943d8be24a6f3dd3e4d2c8bfaad1fb75c87 > Another option might be to have an option SSL_mode_auto_retry which by > default is not set but might be set to either 0 or 1 to get a consistent > behavior across different OpenSSL versions. I quite like this approach :-) Anyway this bug can probably be closed now, maybe we should follow up elsewhere. Thanks for the discussion & analysis! -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, May 13, 2019 at 03:18:14PM +0200, Guilhem Moulin wrote: > On Mon, 13 May 2019 at 06:31:26 +0200, Steffen Ullrich wrote: > > Applications which relied on blocking I/O in connection with select could > > also hang before, > > Uh, what? “Before” meaning with ≤TLSv1.2, or with OpenSSL <1.1.1a's > default flags? libssl mentions no such thing beside the new default > mode. And in fact the s_client() program, *from OpenSSL upstream > itself*, does precisely that: a select loop in blocking I/O mode (unless > the ‘-nbio’ flag is set). "Before" meaning already with previous OpenSSL versions. But the effect was likely hard to notice: an incomplete SSL record would not cause a permanent hang but only a blocking read until the rest of the data was received, i.e. only some slow down in some situations. > > > https://github.com/openssl/openssl/blob/OpenSSL_1_1_1/apps/s_client.c#L2817 > > s_client() is roughly speaking a C version the ‘netcat.pl’ prototype I > attached earlier. Unsurprisingly, since 1.1.1a the code clears > SSL_MODE_AUTO_RETRY from the bitmask mode of the newly created SSL_CTX > object. That change was even done in the very same commit that enabled > SSL_MODE_AUTO_RETRY by default :-) > > > https://github.com/openssl/openssl/commit/693cf80c6ff54ae276a44d305d4ad07168ec6895#diff-7f3b79983f6d53c047c90a62813cc11f s_client is just a better test program and not production code which actually cares about edge cases. And the change was likely needed since s_client now had serious problems and this was the easiest way to address these. > IMHO using polling sockets in blocking I/O is fairly common. Granted, > that itself itself doesn't say much about code quality, but the fact that > the OpenSSL project *does* use it (and explicitly mentions it in the > docs) gives me confidence that it's a fine use which should still be > supported. From my experience with OpenSSL and its documentation I don't agree with this. Documentation is sometimes wrong or misleading and more often incomplete (but its actually improving). And while using blocking I/O with polling sockets might be fine with plain sockets where each byte is part of application data it is not fine with SSL since the unit in SSL is not a byte but a record and some records might contain application data and some not. > The documentation (libssl's SSL_read()'s [0], Net::SSLeay::read() [1], > as well as IO::Socket::SSL::sysread() [2]) all warn that reading from an > SSL socket has a different behavior than usual read(2) system calls, in > that read failures should be treated with care, as there are both > retryable (eg. when no application data was received) and non-retryable > errors. As I was the one who wrote the documentation for IO::Socket::SSL: the part about sysread failing was supposed to be about non-blocking sockets. A blocking socket was not expected to return nothing just because the handshake was not done. > > So from my perspective, the expectation that IO::Socket::SSL::sysread() > behaves like a “normal” sysread doesn't hold, and never did. (There is > even an entry for “Expecting exactly the same behavior as plain > sockets” in the “Common errors” section of IO::Socket::SSL's manpage!) This part talks about specific cases and how to deal with these but says nothing about blocking sockets temporary failing. > For what it's worth I think it's a shame that SSL_MODE_AUTO_RETRY was > not the default earlier, as it's convenient to be able to write > > $sock->sysread($buf, $len) // die; > > and not bother about inspecting $SSL_ERROR. But programs have been I agree with that. But in the past SSL_MODE_AUTO_RETRY was only relevant with unexpected renegotiations which nearly never happened. So nobody actually noticed. > written with the old default in mind, and the fact that SSL_read() > doesn't behave like a normal read() system call. For these programs to > keep working, there needs to be a way to switch SSL_MODE_AUTO_RETRY off. > libssl provides SSL_CTX_clear_mode(), Net::SSLeay has > CTX_ctrl(,78,SSL_CTRL_CLEAR_MODE,0), and I'd like to have something > similar in IO::Socket::SSL, too. I'm not convinced that this is the way to go, i.e. I'd rather see the applications use non-blocking sockets. But I'm also not stubbornly against it. Do you know a relevant module or actually used application which has problems because of the default mode of SSL_MODE_AUTO_RETRY the ability to change it would fix the related problems and not introduce new ones, i.e. where such a fix would be the right thing to do? LWP of course does not count since the right thing to do in LWP was to remove the obsolete patch which kept the socket blocking. I have also some problems to provide a consistent API for this which can actually be understood. This would in my opinion mean that I have a clear default for the value, i.e. that I enable SSL_MODE_AUTO_RETRY by default for older OpenSSL versions the same way it is done in OpenSSL 1.1.1. But
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, 13 May 2019 at 06:31:26 +0200, Steffen Ullrich wrote: > Additionally switching off SSL_MODE_AUTO_RETRY would actually just add > a different unexpected behavior: that sysread might return with EAGAIN > on a blocking socket. FWIW as shown below that's always been the case, until OpenSSL 1.1.1a where SSL_MODE_AUTO_RETRY was switched on by default, so I'd say it's wrong to expect that it won't :-P Also I'm not arguing that the default should be toggled back in IO::Socket::SSL, but that it should have a flag to optionally revert to the old OpenSSL default. Please consider the following code, which uses blocking I/O and merely echoes what's being received from the SSL/TLS server. perl -I. -MIO::Socket::SSL -e ' my $sock = IO::Socket::SSL->new( PeerAddr => "127.0.0.1:4433", SSL_ca_file => "/tmp/cert.pem" ) // die; while(1) { my $buf = ""; $sock->sysread($buf, 4096) // die "errno=\"$!\", SSL_ERROR=\"$SSL_ERROR\"\n"; print $buf; }' Running in a Stretch chroot (libssl1.1 1.1.0j-1~deb9u1, libnet-ssleay-perl 1.80-1, and IO::Socket::SSL upstream 2.066), it dies with the following message when the server renegotiates the TLSv1.2 session (“r\n” command in the `s_server` input): errno="Resource temporarily unavailable", SSL_ERROR="SSL wants a read first" That's expected because SSL_read() fails and SSL_get_error() returns SSL_ERROR_WANT_READ. That code is broken as it doesn't inspect $SSL_ERROR on sysread failure, and treats retryable errors as fatal. With TLSv1.3 (but ensuring SSL_MODE_AUTO_RETRY is still unset) it's way worse because it dies immediately after the handshake, not “just” when the session is renegotiated. In that light it makes sense that the OpenSSL developers have switched SSL_MODE_AUTO_RETRY on by default. Now with an OpenSSL version where SSL_MODE_AUTO_RETRY is set by default, SSL_read() automatically retries and blocks until application data is received, so the above program keeps looping as expected. Automatic retrying in lower-level functions is a fine default, but unfortunately breaks applications that *were* relying on SSL_read() *not* blocking when only non-application data was received. That's why there needs to be a way to optionally switch it back off. Changing these programs to use non-blocking I/O is clearly much more invasive. > I've added more information regarding this to the IO::Socket::SSL > documentation: > https://github.com/noxxi/p5-io-socket-ssl/commit/ee176e489f02bfaaa479fc8d9312c8458bf55565 | A sysread on the IO::Socket::SSL socket will not return any data | though since it is an abstraction which only returns application data. | This causes the sysread to hang in case the socket was blocking As shown above this is incorrect for OpenSSL <1.1.1a's (or any later OpenSSL version where SSL_MODE_AUTO_RETRY was switched off). There IO::Socket::SSL::sysread() doesn't hang, but instead fails immediately and sets $SSL_ERROR to SSL_ERROR_WANT_READ (and $! to EAGAIN). While setting errno to EAGAIN is specific to IO::Socket::SSL (and AFAICT undocumented for blocking I/O), the manpage for SSL_read() and its higher level bindings & wrappers, incl. IO::Socket::SSL, explicitly says that upon failure one should first check SSL_get_error() for SSL-specific errors, i.e., not rely on the errno value unless the SSL error code is SSL_ERROR_SYSCALL. That also applies to blocking I/O. -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, 13 May 2019 at 06:31:26 +0200, Steffen Ullrich wrote: > Applications which relied on blocking I/O in connection with select could > also hang before, Uh, what? “Before” meaning with ≤TLSv1.2, or with OpenSSL <1.1.1a's default flags? libssl mentions no such thing beside the new default mode. And in fact the s_client() program, *from OpenSSL upstream itself*, does precisely that: a select loop in blocking I/O mode (unless the ‘-nbio’ flag is set). https://github.com/openssl/openssl/blob/OpenSSL_1_1_1/apps/s_client.c#L2817 s_client() is roughly speaking a C version the ‘netcat.pl’ prototype I attached earlier. Unsurprisingly, since 1.1.1a the code clears SSL_MODE_AUTO_RETRY from the bitmask mode of the newly created SSL_CTX object. That change was even done in the very same commit that enabled SSL_MODE_AUTO_RETRY by default :-) https://github.com/openssl/openssl/commit/693cf80c6ff54ae276a44d305d4ad07168ec6895#diff-7f3b79983f6d53c047c90a62813cc11f IMHO using polling sockets in blocking I/O is fairly common. Granted, that itself itself doesn't say much about code quality, but the fact that the OpenSSL project *does* use it (and explicitly mentions it in the docs) gives me confidence that it's a fine use which should still be supported. > only this problem is worse now. Before TLS 1.3 these applications > could hang if the peer initiated a renegotiation since this were TCP > level data without any SSL application payload, i.e. select was > triggered but sysread would not return with data. I feel that we're talking past each other :-/ select() being triggered means indeed that a subsequent read() system call *won't block*; however read() is not called directly, but through SSL_read(). Whether SSL_read() will block until application data is received, or not, is controlled with the SSL_MODE_AUTO_RETRY flag. If that flag is unset, then SSL_read() returns immediately with failure and sets SSL_ERROR. For an IO::Socket::SSL object, $sock->sysread() is AFAICT a mere wrapper for SSL_read(): https://github.com/noxxi/p5-io-socket-ssl/blob/2.066/lib/IO/Socket/SSL.pm#L1187 So if select() is triggered because the TLS session is renegotiated, but no application data was received, sysread will block iff. SSL_MODE_AUTO_RETRY is set. As I showed in the traces enclosed in my previous message. > Additionally switching off SSL_MODE_AUTO_RETRY would actually just add a > different unexpected behavior: that sysread might return with EAGAIN on a > blocking socket. This is not the behavior one expects from a blocking > socket, i.e. it should block until it returns data, should return no data > only on connection shutdown or should fail permanently. The documentation (libssl's SSL_read()'s [0], Net::SSLeay::read() [1], as well as IO::Socket::SSL::sysread() [2]) all warn that reading from an SSL socket has a different behavior than usual read(2) system calls, in that read failures should be treated with care, as there are both retryable (eg. when no application data was received) and non-retryable errors. So from my perspective, the expectation that IO::Socket::SSL::sysread() behaves like a “normal” sysread doesn't hold, and never did. (There is even an entry for “Expecting exactly the same behavior as plain sockets” in the “Common errors” section of IO::Socket::SSL's manpage!) For what it's worth I think it's a shame that SSL_MODE_AUTO_RETRY was not the default earlier, as it's convenient to be able to write $sock->sysread($buf, $len) // die; and not bother about inspecting $SSL_ERROR. But programs have been written with the old default in mind, and the fact that SSL_read() doesn't behave like a normal read() system call. For these programs to keep working, there needs to be a way to switch SSL_MODE_AUTO_RETRY off. libssl provides SSL_CTX_clear_mode(), Net::SSLeay has CTX_ctrl(,78,SSL_CTRL_CLEAR_MODE,0), and I'd like to have something similar in IO::Socket::SSL, too. > It was just a coincidence that LWP::protocol::http could deal with > this situation. And this coincidence came from the fact, that this > code was actually designed for non-blocking sockets and only the > Debian patch caused it to use a blocking socket instead. Yup, I now agree with you as far as LWP is concerned. > I've added more information regarding this to the IO::Socket::SSL > documentation: > https://github.com/noxxi/p5-io-socket-ssl/commit/ee176e489f02bfaaa479fc8d9312c8458bf55565 | A sysread on the IO::Socket::SSL socket will not return any data | though since it is an abstraction which only returns application data. | This causes the sysread to hang in case the socket was blocking I believe that statement is incorrect for OpenSSL <1.1.1a's (or if SSL_MODE_AUTO_RETRY was toggled via some other means). If you try netcat.pl with an older OpenSSL release, you won't be able to reproduce the TLSv1.2 trace I pasted yesterday; the client doesn't end up stuck in a blocking read :-) If no application data is
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Mon, May 13, 2019 at 01:02:45AM +0200, Guilhem Moulin wrote: > Thanks for your analysis, Steffen. Dropping the Debian-specific patch > is definitely the way to go for libwww/LWP. However I still believe > IO::Socket::SSL should provide a way to clear SSL_MODE_AUTO_RETRY in > order to fix applications relying on the former OpenSSL defaults, as > suggested in the OpenSSL changelog: > > “SSL_MODE_AUTO_RETRY is enabled by default. Applications that use > blocking I/O in combination with something like select() or poll() > will hang. This can be turned off again using SSL_CTX_clear_mode().” > > Otherwise the “usual” way to write event loops in blocking I/O won't be > possible with IO::Socket::SSL. Applications which relied on blocking I/O in connection with select could also hang before, only this problem is worse now. Before TLS 1.3 these applications could hang if the peer initiated a renegotiation since this were TCP level data without any SSL application payload, i.e. select was triggered but sysread would not return with data. It could also lead to delays when SSL frames larger than the TCP window size where used or due to interactions with NAGLE algorithms or delayed ACK. In this case too TCP level data were received but the SSL level data where only received later once the SSL frame was completed, which means it had to wait until completation of the SSL frame in sysread. With TLS 1.3 this problem is now worse since it happens more often. And frankly, I'd rather see the applications fix their wrong approach than to give them an opportunity to work around the most common cause and leave them wondering about strange problems which happen from time to time and which nobody can really reproduce. Additionally switching off SSL_MODE_AUTO_RETRY would actually just add a different unexpected behavior: that sysread might return with EAGAIN on a blocking socket. This is not the behavior one expects from a blocking socket, i.e. it should block until it returns data, should return no data only on connection shutdown or should fail permanently. It was just a coincidence that LWP::protocol::http could deal with this situation. And this coincidence came from the fact, that this code was actually designed for non-blocking sockets and only the Debian patch caused it to use a blocking socket instead. I've added more information regarding this to the IO::Socket::SSL documentation: https://github.com/noxxi/p5-io-socket-ssl/commit/ee176e489f02bfaaa479fc8d9312c8458bf55565 Regards, Steffen
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Thanks for your analysis, Steffen. Dropping the Debian-specific patch is definitely the way to go for libwww/LWP. However I still believe IO::Socket::SSL should provide a way to clear SSL_MODE_AUTO_RETRY in order to fix applications relying on the former OpenSSL defaults, as suggested in the OpenSSL changelog: “SSL_MODE_AUTO_RETRY is enabled by default. Applications that use blocking I/O in combination with something like select() or poll() will hang. This can be turned off again using SSL_CTX_clear_mode().” Otherwise the “usual” way to write event loops in blocking I/O won't be possible with IO::Socket::SSL. On Sat, 11 May 2019 at 21:56:01 +0200, Steffen Ullrich wrote: > As far as I can see it has nothing to do with SSL_MODE_AUTO_RETRY but > instead is caused by expectations on the behavior of select which are > wrong with TLS 1.3. Please consider the enclosed netcat-like program. I don't think I'm relying on any particular behavior of a specific TLS version, and follow the practices for polling blocking sockets, as documented in libssl, Net::SSLeay, and IO::Socket::SSL, namely: - If SSL_pending() > 0, skip the (blocking) select() call and instead call SSL_read() to process remaining bytes in the current SSL frame. - If SSL_read() fails and sets SSL_ERROR_WANT_READ, don't treat it as a read error. The last point however relies on SSL_MODE_AUTO_RETRY being *unset*, like it used to be with OpenSSL <1.1.1a. With SSL_MODE_AUTO_RETRY being set, the program doesn't work properly. (*Not only for TLSv1.3, but also for TLSv1.2*). This is expected with the new default: “If the underlying BIO is blocking, a read function will only return once the read operation has been finished or an error occurred, except when a non-application data record has been processed and SSL_MODE_AUTO_RETRY is not set. Note that if SSL_MODE_AUTO_RETRY is set and only non-application data is available the call will hang.” — https://www.openssl.org/docs/manmaster/man3/SSL_read.html As seen below, this also breaks with ≤TLSv1.2; but only when the TLS session is renegotiated, not during the initial handshake. Generate a self-signed certificate: $ openssl req -x509 -keyout /tmp/key.pem -out /tmp/cert.pem -subj /CN=127.0.0.1 -nodes Start a TLSv1.2 server on [127.0.0.1]:4433: $ openssl s_server -accept 127.0.0.1:4433 -key /tmp/key.pem -cert /tmp/cert.pem -tls1_2 Now start the enclosed program in another terminal. What's being written in the s_server(1ssl) TTY is echoed on the netcat.pl side, and vice versa. All good. Now trigger renegotiate the TLS session by pressing ‘r\n’. The server prints SSL_do_handshake -> 1 Read BLOCK and netcat ends up being stuck in a blocking read(). So what's being written client-side won't show up anymore in the server window, until data is being sent from the server to the client and makes read() return. openssl s_server … -tls1_2 netcat.pl --- - S: Using default temp DH parameters S: ACCEPT S: -BEGIN SSL SESSION PARAMETERS- S: […] S: --- S: No server certificate CA names sent S: CIPHER is ECDHE-RSA-AES128-GCM-SHA256 S: Secure Renegotiation IS supported S: Entering loop... C: can you hear me now? S: can you hear me now? C: yes S: yes C: good S: good C: starving you now S: starving you now C: r S: SSL_do_handshake -> 1 S: Read BLOCK C: meh, I'm muted C: unstarving S: meh, I'm muted S: unstarving (The ‘C: ’ prefix indicates a line written to the standard input, and the ‘S: ’ prefix a line written to the standard output or error output.) After renegotiation, the client is stuck in a blocking read() until the server sends some data. Same thing with TLSv1.3, but of course without the renegotiation part: this happens right at the begining. openssl s_server … -tls1_3 netcat.pl --- - S: Using default temp DH parameters S: ACCEPT S: -BEGIN SSL SESSION PARAMETERS- S: […] S: --- S: No server certificate CA names sent S: CIPHER is TLS_AES_256_GCM_SHA384 S: Secure Renegotiation IS supported S: Entering loop... C: can you hear me now? C: I guess no... C: unstarving S: can you hear me now? S: I guess no...
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Control: reassign -1 libwww-perl 6.36-1 Control: retitle -1 libwww-perl: ancient patch causes TLSv1.3 connection deadlock Control: tag -1 + patch On Sun, 12 May 2019 07:31:57 +0200, Steffen Ullrich wrote: > Actually, the fix is probably even simpler. > > The bug does not happen with an original libwww. It only happens with an LWP > which contains Debian patches. Specifically the patch > drop-non-blocking-socket.patch causes the problem since it keeps the socket > blocking while the original libwww explicitly sets the socket non-blocking > in _new_socket. D'oh! > From my understanding of the code the original problem which caused the > patch to be added (#216821) should be no longer relevant since the socket is > actually connected before it is set to non-blocking. > Thus I recommend to just remove this old patch (from 2003) instead of > introducing yet another patch. I can confirm that removing this patch makes the same test as yesterday work. Thanks for digging this up! I'll wait a day or two to give others a chance to take a look before I upload the fixed package. Cheers, gregor -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Simon and Garfunkel: The Boxer signature.asc Description: Digital Signature
Processed: Re: Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Processing control commands: > reassign -1 libwww-perl 6.36-1 Bug #914034 [libio-socket-ssl-perl] libio-socket-ssl-perl: TLSv1.3 connection deadlock Bug reassigned from package 'libio-socket-ssl-perl' to 'libwww-perl'. No longer marked as found in versions libio-socket-ssl-perl/2.060-3. Ignoring request to alter fixed versions of bug #914034 to the same values previously set Bug #914034 [libwww-perl] libio-socket-ssl-perl: TLSv1.3 connection deadlock Marked as found in versions libwww-perl/6.36-1. > retitle -1 libwww-perl: ancient patch causes TLSv1.3 connection deadlock Bug #914034 [libwww-perl] libio-socket-ssl-perl: TLSv1.3 connection deadlock Changed Bug title to 'libwww-perl: ancient patch causes TLSv1.3 connection deadlock' from 'libio-socket-ssl-perl: TLSv1.3 connection deadlock'. > tag -1 + patch Bug #914034 [libwww-perl] libwww-perl: ancient patch causes TLSv1.3 connection deadlock Added tag(s) patch. -- 914034: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914034 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Actually, the fix is probably even simpler. The bug does not happen with an original libwww. It only happens with an LWP which contains Debian patches. Specifically the patch drop-non-blocking-socket.patch causes the problem since it keeps the socket blocking while the original libwww explicitly sets the socket non-blocking in _new_socket. From my understanding of the code the original problem which caused the patch to be added (#216821) should be no longer relevant since the socket is actually connected before it is set to non-blocking. Thus I recommend to just remove this old patch (from 2003) instead of introducing yet another patch. Regards, Steffen On Sun, May 12, 2019 at 01:11:45AM +0200, gregor herrmann wrote: > On Sat, 11 May 2019 21:56:01 +0200, Steffen Ullrich wrote: > > > I think the issue is a bit different than what was analyzed so far. > […] > > Thank you for this detailed analysis! > > > The following small patch for LWP::protocol::http seems to fix the > > problem for me: > > > > --- /usr/share/perl5/LWP/Protocol/http.pm 2019-05-11 > > 19:05:21.488561325 + > > +++ lib/LWP/Protocol/http.pm 2019-05-11 19:40:49.332810627 + > > @@ -374,7 +374,9 @@ > > if (defined($rbits) && $rbits =~ /[^\0]/) { > > # readable > > my $buf = $socket->_rbuf; > > + my $was_blocking = $socket->blocking(0); > > my $n = $socket->sysread($buf, 1024, length($buf)); > > + $socket->blocking(1) if $was_blocking; > > unless (defined $n) { > > die "read failed: $!" unless $!{EINTR} || > > $!{EWOULDBLOCK} || $!{EAGAIN}; > > # if we get here the rest of the block will do nothing > > I've now tried the following, as per Guilhem's proposal in message > #126: > > Terminal 1: > > % openssl req -x509 -keyout /tmp/key.pem -out /tmp/cert.pem -subj > /CN=127.0.0.1 -nodes > % openssl s_server -accept 127.0.0.1:4433 -key /tmp/key.pem -cert > /tmp/cert.pem -tls1_3 > > Terminal 2: > > % strace -y -ttt -etrace=select,read,write -o ~/tmp/strace.orig perl > -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {SSL_ca_file => > "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, {data => "foo"})' > > and with the patch applied: > > % strace -y -ttt -etrace=select,read,write -o ~/tmp/strace.patched perl > -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {SSL_ca_file => > "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, {data => "foo"})' > > > Result: > > Terminal 1 > > Before the patch: > > Using default temp DH parameters > ACCEPT > -BEGIN SSL SESSION PARAMETERS- > MH0CAQECAgMEBAITAgQgr5ycydiRESVmPMev7McV6BfGSUqTodBWWKKM08FFNagE > MOxvDk39ZVjQ0w/KpEvrm839LIrrWeigYwg3ofZJiFoT/Hu9VUbpxzDrx3EOTx3a > oqEGAgRc11T3ogQCAhwgpAYEBAEAAACuBgIEaCDB8g== > -END SSL SESSION PARAMETERS- > Shared > ciphers:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES256-SHA:AES128-SHA:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:AES256-SHA256 > Signature Algorithms: > ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:DSA+SHA256:DSA+SHA384:DSA+SHA512 > Shared Signature Algorithms: > ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224 > Supported Elliptic Groups: X25519:P-256:X448:P-521:P-384 > Shared Elliptic groups: X25519:P-256:X448:P-521:P-384 > --- > No server certificate CA names sent > CIPHER is TLS_AES_256_GCM_SHA384 > Secure Renegotiation IS supported > ERROR > shutting down SSL > CONNECTION CLOSED > > > After the patch: > > -BEGIN SSL SESSION PARAMETERS- > MH4CAQECAgMEBAITAgQgm+J0M38juLT4Xzeo8Vtx8JH/JPianuR8GUsyGMV3MfIE > MIVGCLjmfpGauD0228pm7jpxjhdcZ1KvRW4ag/Xx9MifU4n/zUKVrnUQrivvcOoq > caEGAgRc11UeogQCAhwgpAYEBAEAAACuBwIFAOBjKP4= > -END SSL SESSION PARAMETERS- > Shared > ciphers:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES256-SHA:AES128-SHA:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:AES256-SHA256 > Signature Algorithms: >
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Sat, 11 May 2019 21:56:01 +0200, Steffen Ullrich wrote: > I think the issue is a bit different than what was analyzed so far. […] Thank you for this detailed analysis! > The following small patch for LWP::protocol::http seems to fix the > problem for me: > > --- /usr/share/perl5/LWP/Protocol/http.pm 2019-05-11 19:05:21.488561325 > + > +++ lib/LWP/Protocol/http.pm 2019-05-11 19:40:49.332810627 + > @@ -374,7 +374,9 @@ > if (defined($rbits) && $rbits =~ /[^\0]/) { > # readable > my $buf = $socket->_rbuf; > + my $was_blocking = $socket->blocking(0); > my $n = $socket->sysread($buf, 1024, length($buf)); > + $socket->blocking(1) if $was_blocking; > unless (defined $n) { > die "read failed: $!" unless $!{EINTR} || > $!{EWOULDBLOCK} || $!{EAGAIN}; > # if we get here the rest of the block will do nothing I've now tried the following, as per Guilhem's proposal in message #126: Terminal 1: % openssl req -x509 -keyout /tmp/key.pem -out /tmp/cert.pem -subj /CN=127.0.0.1 -nodes % openssl s_server -accept 127.0.0.1:4433 -key /tmp/key.pem -cert /tmp/cert.pem -tls1_3 Terminal 2: % strace -y -ttt -etrace=select,read,write -o ~/tmp/strace.orig perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {SSL_ca_file => "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, {data => "foo"})' and with the patch applied: % strace -y -ttt -etrace=select,read,write -o ~/tmp/strace.patched perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {SSL_ca_file => "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, {data => "foo"})' Result: Terminal 1 Before the patch: Using default temp DH parameters ACCEPT -BEGIN SSL SESSION PARAMETERS- MH0CAQECAgMEBAITAgQgr5ycydiRESVmPMev7McV6BfGSUqTodBWWKKM08FFNagE MOxvDk39ZVjQ0w/KpEvrm839LIrrWeigYwg3ofZJiFoT/Hu9VUbpxzDrx3EOTx3a oqEGAgRc11T3ogQCAhwgpAYEBAEAAACuBgIEaCDB8g== -END SSL SESSION PARAMETERS- Shared ciphers:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES256-SHA:AES128-SHA:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:AES256-SHA256 Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:DSA+SHA256:DSA+SHA384:DSA+SHA512 Shared Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224 Supported Elliptic Groups: X25519:P-256:X448:P-521:P-384 Shared Elliptic groups: X25519:P-256:X448:P-521:P-384 --- No server certificate CA names sent CIPHER is TLS_AES_256_GCM_SHA384 Secure Renegotiation IS supported ERROR shutting down SSL CONNECTION CLOSED After the patch: -BEGIN SSL SESSION PARAMETERS- MH4CAQECAgMEBAITAgQgm+J0M38juLT4Xzeo8Vtx8JH/JPianuR8GUsyGMV3MfIE MIVGCLjmfpGauD0228pm7jpxjhdcZ1KvRW4ag/Xx9MifU4n/zUKVrnUQrivvcOoq caEGAgRc11UeogQCAhwgpAYEBAEAAACuBwIFAOBjKP4= -END SSL SESSION PARAMETERS- Shared ciphers:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES256-SHA:AES128-SHA:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:AES256-SHA256 Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:DSA+SHA256:DSA+SHA384:DSA+SHA512 Shared Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224 Supported Elliptic Groups: X25519:P-256:X448:P-521:P-384 Shared Elliptic groups: X25519:P-256:X448:P-521:P-384 --- No server certificate CA names sent CIPHER is TLS_AES_256_GCM_SHA384 Secure Renegotiation IS supported POST / HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: 127.0.0.1:4433 User-Agent: libwww-perl/6.36 Content-Length: 8 Content-Type: application/x-www-form-urlencoded data=fooERROR shutting down SSL CONNECTION CLOSED So in the second case, the POST actually
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
I think the issue is a bit different than what was analyzed so far. As far as I can see it has nothing to do with SSL_MODE_AUTO_RETRY but instead is caused by expectations on the behavior of select which are wrong with TLS 1.3. I've added some more debugging to IO::Socket::SSL and what I saw was: - With TLS 1.2 LWP did first a write to send the request and then a read to read the response - that's expected. - With TLS 1.3 LWP first tried to read the response and then hang there, i.e. no request was even send to the server. This is obviously not what is expected. The reason for this is that LWP::protocol::http::request does not actually try to write the request first. Instead it creates a select-loop to wait for both read and write events on the socket and then acts based on what event select returns. The obvious expectation is, that there is nothing to read first but that it will be able to write the request. After the request is sent it will disable selecting for write-events and only care about reads. Only, with TLS 1.3 this expectation is wrong. Up to TLS 1.2 issuing session tickets by the server were part of the TLS handshake. With TLS 1.3 this is no longer the case and these are send after the TLS handshake is done. This means that the initial select will actually trigger a read-event since there are actual data to read on the underlying socket: the session tickets send by the server. Only, these are no application data. But sysread on a blocking socket is supposed to only return if there was at least 1 byte read or if the peer closed the connection or if a permanent error occured. And since no application data were send it just hangs. I don't see how this can be fixed in IO::Socket::SSL only since the real problem is that sysread is called at this stage (i.e. no request sent yet) in the first place. Instead the code which is making the wrong assumptions about select need to be fixed, so that sysread either not get called at this stage or at least not get called with blocking. The following small patch for LWP::protocol::http seems to fix the problem for me: --- /usr/share/perl5/LWP/Protocol/http.pm 2019-05-11 19:05:21.488561325 + +++ lib/LWP/Protocol/http.pm 2019-05-11 19:40:49.332810627 + @@ -374,7 +374,9 @@ if (defined($rbits) && $rbits =~ /[^\0]/) { # readable my $buf = $socket->_rbuf; + my $was_blocking = $socket->blocking(0); my $n = $socket->sysread($buf, 1024, length($buf)); + $socket->blocking(1) if $was_blocking; unless (defined $n) { die "read failed: $!" unless $!{EINTR} || $!{EWOULDBLOCK} || $!{EAGAIN}; # if we get here the rest of the block will do nothing Regards, Steffen
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 7 May 2019 19:39:15 +0200 Guilhem Moulin wrote: > Hi Dimitri, > > On Tue, 07 May 2019 at 15:46:25 +0100, Dimitri John Ledkov wrote: > > On Tue, 7 May 2019 14:16:43 +0100 Dimitri John Ledkov > > wrote: > >> This issue concerns me a lot at the moment. I am currently trying to > >> upgrade OpenSSL from 1.1.0 to 1.1.1 in Ubuntu 18.04 LTS (bionic). And > >> as far as I understand all the comment on this debian bug report, > >> current application are potentially broken and brokeness happens more > >> often with TLSv1.3 and the new OpenSSL 1.1.1 defaults > >> (SSL_MODE_AUTO_RETRY). > >> > >> As far as I understand we do not have a fixed LWP that works correctly > >> in blocking, non-blocking, tls 1.2 and tls 1.3. To prevent regressing > >> existing users further, does it make sense for me to make updates in > >> bionic that: > >> > >> 1) limit SSL_new and SSL_CTX_new to TLS v1.2 max > >> and > >> 2) disable SSL_MODE_AUTO_RETRY by default for TLS v1.2 connections? > >> > >> My goal is to keep existing breakages as is, without introducing new > >> ones, whilst getting OpenSSL 1.1.1 into bionic. Granted this will not > >> get TLS v1.3 enabled for perl server/clients without code changes, but > >> oh well. Those who want it, will be able to force / start using it. > > It's IMHO unfortunate to change the default in Net::SSLeay, as TLSv1.3 > brings quite a few improvements (in terms of security as well as > performance). OpenSSL 1.1.1 was released on 11 Sep 2018 and uploaded to > sid the day after, breaking programs using select()/poll() with blocking > I/O; this is not specific to Perl bindings — other languages are also > affected — yet no one is suggesting to disable TLSv1.3 globally in > libssl :-) > > If TLSv1.3 should be disabled (and the SSL_MODE_AUTO_RETRY flag cleared) > then IMHO I think it should be done as close at possible to the leaf > application (LWP in this case), not in the library itself. After all we > have only one RC bug about this, so I guess other programs (in any > language) 1/ have workarounds; 2/ aren't using select()/poll() with > blocking I/O; or 3/ aren't affected because they never used SSL_read() > as documented. IHMO the libssl change shouldn't be reason to penalize > all applications, given most seems unaffected. > > I still think the right fix is to make SSL_MODE_AUTO_RETRY (or even the > whole mode bitmask mode?) configurable in IO::Socket::SSL, and clear it > in programs and libraries using select()/poll() with blocking I/O, such > as LWP in this case. AFAICT that follows the intention of OpenSSL's > development team, unlike global library changes. AFAICT the attached > patch (to sid's IO::Socket::SSL and LWP::Protocol::https, respectively > 2.060-3 and 6.36-1) fixes the problem for me, while preserving TLSv1.3 > support and default. > I like this, and your proposed patch as it solved the issue with TLSv1.3 and my changes kept things still broken over TLSv1.3. Can we get this upstreamed? (or as you suggest even exposing the whole bitmask mode). > > I proposed the following patch upstream / request for comments > > https://github.com/radiator-software/p5-net-ssleay/pull/139 > > I personally don't like this change as I hope Buster's Net::SSLeay and > other SSL libraries will default to TLSv1.3 on capable servers :-) 2 > comments anyway: I have now self-rejected that merge proposal and reverted it in Ubuntu. > > * OpenSSL <1.1.0 has no SSL_CTX_set_max_proto_version(), so an OpenSSL > version test is lacking (nothing more as <1.1.0 has no TLSv1.3
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Hi Dimitri, On Tue, 07 May 2019 at 15:46:25 +0100, Dimitri John Ledkov wrote: > On Tue, 7 May 2019 14:16:43 +0100 Dimitri John Ledkov wrote: >> This issue concerns me a lot at the moment. I am currently trying to >> upgrade OpenSSL from 1.1.0 to 1.1.1 in Ubuntu 18.04 LTS (bionic). And >> as far as I understand all the comment on this debian bug report, >> current application are potentially broken and brokeness happens more >> often with TLSv1.3 and the new OpenSSL 1.1.1 defaults >> (SSL_MODE_AUTO_RETRY). >> >> As far as I understand we do not have a fixed LWP that works correctly >> in blocking, non-blocking, tls 1.2 and tls 1.3. To prevent regressing >> existing users further, does it make sense for me to make updates in >> bionic that: >> >> 1) limit SSL_new and SSL_CTX_new to TLS v1.2 max >> and >> 2) disable SSL_MODE_AUTO_RETRY by default for TLS v1.2 connections? >> >> My goal is to keep existing breakages as is, without introducing new >> ones, whilst getting OpenSSL 1.1.1 into bionic. Granted this will not >> get TLS v1.3 enabled for perl server/clients without code changes, but >> oh well. Those who want it, will be able to force / start using it. It's IMHO unfortunate to change the default in Net::SSLeay, as TLSv1.3 brings quite a few improvements (in terms of security as well as performance). OpenSSL 1.1.1 was released on 11 Sep 2018 and uploaded to sid the day after, breaking programs using select()/poll() with blocking I/O; this is not specific to Perl bindings — other languages are also affected — yet no one is suggesting to disable TLSv1.3 globally in libssl :-) If TLSv1.3 should be disabled (and the SSL_MODE_AUTO_RETRY flag cleared) then IMHO I think it should be done as close at possible to the leaf application (LWP in this case), not in the library itself. After all we have only one RC bug about this, so I guess other programs (in any language) 1/ have workarounds; 2/ aren't using select()/poll() with blocking I/O; or 3/ aren't affected because they never used SSL_read() as documented. IHMO the libssl change shouldn't be reason to penalize all applications, given most seems unaffected. I still think the right fix is to make SSL_MODE_AUTO_RETRY (or even the whole mode bitmask mode?) configurable in IO::Socket::SSL, and clear it in programs and libraries using select()/poll() with blocking I/O, such as LWP in this case. AFAICT that follows the intention of OpenSSL's development team, unlike global library changes. AFAICT the attached patch (to sid's IO::Socket::SSL and LWP::Protocol::https, respectively 2.060-3 and 6.36-1) fixes the problem for me, while preserving TLSv1.3 support and default. > I proposed the following patch upstream / request for comments > https://github.com/radiator-software/p5-net-ssleay/pull/139 I personally don't like this change as I hope Buster's Net::SSLeay and other SSL libraries will default to TLSv1.3 on capable servers :-) 2 comments anyway: * OpenSSL <1.1.0 has no SSL_CTX_set_max_proto_version(), so an OpenSSL version test is lacking (nothing more as <1.1.0 has no TLSv1.3 support and SSL_MODE_AUTO_RETRY is unset by default). * Disabling TLSv1.3 won't always prevent hangs, you also need to clear SSL_MODE_AUTO_RETRY to revert to the pre-1.1.1 defaults. With TLSv1.2, SSL_read() returns SSL_ERROR_WANT_READ upon renegotiation, causing applications using select()/poll() with blocking I/O to hang if SSL_MODE_AUTO_RETRY is set. Cheers, -- Guilhem, who isn't not in the Debian Perl team, but who would be quite sad to have to wait one full release cycle for out-of-the-box TLSv1.3 support. --- a/IO/Socket/SSL.pm +++ b/IO/Socket/SSL.pm @@ -260,6 +260,14 @@ INIT { init() } init(); } + +if (!defined ::SSLeay::CTX_clear_mode) { + # assume SSL_CTRL_CLEAR_MODE being 78 since it was always this way + *Net::SSLeay::CTX_clear_mode = sub { + my ($ctx,$opt) = @_; + Net::SSLeay::CTX_ctrl($ctx,78,$opt,0); + }; +} } # global defaults which can be changed using set_defaults @@ -2433,6 +2441,11 @@ # cannot guarantee, that the location of the buffer stays constant Net::SSLeay::CTX_set_mode( $ctx, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER|SSL_MODE_ENABLE_PARTIAL_WRITE); + # Clear SSL_MODE_AUTO_RETRY on request; useful for applications using select/poll + # with blocking I/O. (Since OpenSSL 1.1.1 SSL_MODE_AUTO_RETRY is enabled by + # default, making such applications hang.) + Net::SSLeay::CTX_clear_mode( $ctx, Net::SSLeay::MODE_AUTO_RETRY() ) + if $arg_hash->{SSL_mode_auto_no_retry}; if ( my $proto_list = $arg_hash->{SSL_npn_protocols} ) { return IO::Socket::SSL->_internal_error("NPN not supported in Net::SSLeay",9) --- a/LWP/Protocol/https.pm +++ b/LWP/Protocol/https.pm @@ -32,6 +32,11 @@ $ssl_opts{SSL_ca_file} = '/etc/ssl/certs/ca-certificates.crt'; } } +# Clear SSL_MODE_AUTO_RETRY as LWP uses select with blocking I/O. +# (Since OpenSSL 1.1.1 SSL_MODE_AUTO_RETRY is enabled
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 7 May 2019 14:16:43 +0100 Dimitri John Ledkov wrote: > Hi, > > On Wed, 10 Apr 2019 15:22:09 +0200 Guilhem Moulin wrote: > > > > Not setting the SSL_MODE_AUTO_RETRY flag back after removing O_NONBLOCK > > (ie commenting out `Net::SSLeay::set_mode($ssl, $mode_auto_retry);` in > > the patch) solves the problem with blocking I/O and select/poll, but > > breaks programs expecting SSL_read() to block until application data > > comes in. (That is, programs not conforming to SSL_read()'s documented > > behavior — hence which would break on renegotiation with TLS <1.3; or > > programs relying on SSL_MODE_AUTO_RETRY being set, as in OpenSSL ≥1.1.1's > > default context flags.) > > > > This issue concerns me a lot at the moment. I am currently trying to > upgrade OpenSSL from 1.1.0 to 1.1.1 in Ubuntu 18.04 LTS (bionic). And > as far as I understand all the comment on this debian bug report, > current application are potentially broken and brokeness happens more > often with TLSv1.3 and the new OpenSSL 1.1.1 defaults > (SSL_MODE_AUTO_RETRY). > > As far as I understand we do not have a fixed LWP that works correctly > in blocking, non-blocking, tls 1.2 and tls 1.3. To prevent regressing > existing users further, does it make sense for me to make updates in > bionic that: > > 1) limit SSL_new and SSL_CTX_new to TLS v1.2 max > and > 2) disable SSL_MODE_AUTO_RETRY by default for TLS v1.2 connections? > > My goal is to keep existing breakages as is, without introducing new > ones, whilst getting OpenSSL 1.1.1 into bionic. Granted this will not > get TLS v1.3 enabled for perl server/clients without code changes, but > oh well. Those who want it, will be able to force / start using it. I proposed the following patch upstream / request for comments https://github.com/radiator-software/p5-net-ssleay/pull/139 Regards, Dimitri.
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
Hi, On Wed, 10 Apr 2019 15:22:09 +0200 Guilhem Moulin wrote: > > Not setting the SSL_MODE_AUTO_RETRY flag back after removing O_NONBLOCK > (ie commenting out `Net::SSLeay::set_mode($ssl, $mode_auto_retry);` in > the patch) solves the problem with blocking I/O and select/poll, but > breaks programs expecting SSL_read() to block until application data > comes in. (That is, programs not conforming to SSL_read()'s documented > behavior — hence which would break on renegotiation with TLS <1.3; or > programs relying on SSL_MODE_AUTO_RETRY being set, as in OpenSSL ≥1.1.1's > default context flags.) > This issue concerns me a lot at the moment. I am currently trying to upgrade OpenSSL from 1.1.0 to 1.1.1 in Ubuntu 18.04 LTS (bionic). And as far as I understand all the comment on this debian bug report, current application are potentially broken and brokeness happens more often with TLSv1.3 and the new OpenSSL 1.1.1 defaults (SSL_MODE_AUTO_RETRY). As far as I understand we do not have a fixed LWP that works correctly in blocking, non-blocking, tls 1.2 and tls 1.3. To prevent regressing existing users further, does it make sense for me to make updates in bionic that: 1) limit SSL_new and SSL_CTX_new to TLS v1.2 max and 2) disable SSL_MODE_AUTO_RETRY by default for TLS v1.2 connections? My goal is to keep existing breakages as is, without introducing new ones, whilst getting OpenSSL 1.1.1 into bionic. Granted this will not get TLS v1.3 enabled for perl server/clients without code changes, but oh well. Those who want it, will be able to force / start using it. Regards, Dimitri.
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 at 23:39:31 +0200, Guilhem Moulin wrote: > AFAICT this worked this time because the socket was *only* marked as > ready for writing after the first select() call. Only during the second > call was there some data to be read: > >> select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 1 (out [3], left >> {tv_sec=179, tv_usec=96}) >> select(8, [3], NULL, NULL, {tv_sec=180, tv_usec=0}) = 1 (in [3], left >> {tv_sec=179, tv_usec=977469}) > > I'm unable to reproduce this with v1.3, probably due to race conditions. Forgot to add this, sorry: perhaps the reproducibility of this issue is improved when one connects to the loopback interface rather than to a remote TLS termination proxy? (Even though I suppose it doesn't completely eliminate the race.) In a clean sid chroot, after `apt install --no-install-recommends strace liblwp-protocol-https-perl libio-socket-ssl-perl libnet-ssleay-perl`: ## Start a loopback-bound TLS (v1.3 only) server in a terminal $ openssl req -x509 -keyout /tmp/key.pem -out /tmp/cert.pem -subj /CN=127.0.0.1 -nodes $ openssl s_server -accept 127.0.0.1:4433 -key /tmp/key.pem -cert /tmp/cert.pem -tls1_3 ## Connect to it from another terminal and send an HTTP POST request $ patch -p2 new(ssl_opts => {SSL_ca_file => "/tmp/cert.pem"})-> post("https://127.0.0.1:4433;, {data => "foo"})' […] select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left {tv_sec=179, tv_usec=98}) read(3, "…", 5) = 5 read(3, "…", 234) = 234 read(3, "…", 5) = 5 read(3, "…", 250) = 250 read(3, This does hang *anyway* but it should hang *after* sending the request out to the server (ie when waiting for the HTTP reply), not *before* any application data was sent, unlike the above. AFAICT the local server never receives “POST / HTTP/1.1\r\n” when select(2) marks the socket as ready both for reads and writes client-side, whether the patch is applied or not. Not setting the SSL_MODE_AUTO_RETRY flag back after removing O_NONBLOCK (ie commenting out `Net::SSLeay::set_mode($ssl, $mode_auto_retry);` in the patch) solves the problem with blocking I/O and select/poll, but breaks programs expecting SSL_read() to block until application data comes in. (That is, programs not conforming to SSL_read()'s documented behavior — hence which would break on renegotiation with TLS <1.3; or programs relying on SSL_MODE_AUTO_RETRY being set, as in OpenSSL ≥1.1.1's default context flags.) -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 at 17:26:22 +0200, gregor herrmann wrote: > On Tue, 09 Apr 2019 17:14:32 +0200, Guilhem Moulin wrote: >> With TLS 1.3? (You can pass ‘SSL_version => "TLSv1_3"’ to ssl_opts to >> force this.) Doesn't work here, still hangs on read(): > > Yes, also with using TLSv1_3 explicitly: > […] > (trace attached in case it helps) AFAICT this worked this time because the socket was *only* marked as ready for writing after the first select() call. Only during the second call was there some data to be read: > select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 1 (out [3], left > {tv_sec=179, tv_usec=96}) > select(8, [3], NULL, NULL, {tv_sec=180, tv_usec=0}) = 1 (in [3], left > {tv_sec=179, tv_usec=977469}) I'm unable to reproduce this with v1.3, probably due to race conditions. Anyway I fail to see how the patch can help, because as I wrote in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914034#101 the socket is in blocking mode (hence SSL_MODE_AUTO_RETRY is set) by the time LWP starts its select loop, and SSL_MODE_AUTO_RETRY is set. This is visible by adding fcntl(2) to the traced set of system calls: $ strace -etrace=fcntl,select,read perl -MLWP::UserAgent -MIO::Socket::SSL -e \ '$IO::Socket::SSL::DEBUG = 3; LWP::UserAgent->new(ssl_opts => {SSL_version => "TLSv1_3"})->post("https://facebook.com;, { data => "" })' […] fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)= 0 DEBUG: .../IO/Socket/SSL.pm:831: set socket to non-blocking to enforce timeout=180 DEBUG: .../IO/Socket/SSL.pm:844: call Net::SSLeay::connect read(3, 0x5628bec16923, 5) = -1 EAGAIN (Resource temporarily unavailable) DEBUG: .../IO/Socket/SSL.pm:847: done Net::SSLeay::connect -> -1 DEBUG: .../IO/Socket/SSL.pm:857: ssl handshake in progress DEBUG: .../IO/Socket/SSL.pm:867: waiting for fd to become ready: SSL wants a read first select(8, [3], NULL, NULL, {tv_sec=180, tv_usec=0}) = 1 (in [3], left {tv_sec=179, tv_usec=988296}) DEBUG: .../IO/Socket/SSL.pm:887: socket ready, retrying connect DEBUG: .../IO/Socket/SSL.pm:844: call Net::SSLeay::connect […] DEBUG: .../IO/Socket/SSL.pm:847: done Net::SSLeay::connect -> 1 DEBUG: .../IO/Socket/SSL.pm:902: ssl handshake done fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) fcntl(3, F_SETFL, O_RDWR) = 0 […] select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left {tv_sec=179, tv_usec=98}) read(3, "…", 5) = 5 read(3, "…", 156) = 156 read(3, When the non-application record comes in, the socket is marked as ready for reading, but SSL_read() transparently processes the non-application data record, and blocks on trying to read an application data record. If one is lucky and the socket is *only* marked as ready for writing (ie not for reading as well, like in your trace) when select() returns then the problem doesn't trigger (at least not right after the handshake — OTOH it might occur later on renegotiation), but AFAICT it's orthogonal to whether the patch is applied or not: we use blocking I/O, so SSL_MODE_AUTO_RETRY is set just like before (`Net::SSLeay::set_mode($ssl, $mode_auto_retry)` is called just before clearing O_NONBLOCK). If the (blocking) socket is marked for reading when select() returns, then the application assumes that SSL_read() won't block, and setting SSL_MODE_AUTO_RETRY breaks that assumption, as written in the OpenSSL changelog. Instead of a blocking SSL_read() the application expects it to return SSL_ERROR_WANT_READ. And proceeds with SSL_write() if the socket is also ready for writing, like in the trace above. -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 17:14:32 +0200, Guilhem Moulin wrote: > On Tue, 09 Apr 2019 at 16:59:20 +0200, gregor herrmann wrote: > > When I install the package with the patch and run our test case > > again, I don't get any hangs anymore: > > > > % time perl -MLWP::UserAgent -e > > 'LWP::UserAgent->new->post("https://facebook.com;, { data => "foo" }) or > > die' > > perl -MLWP::UserAgent -e 0.18s user 0.02s system 22% cpu 0.867 total > > With TLS 1.3? (You can pass ‘SSL_version => "TLSv1_3"’ to ssl_opts to > force this.) Doesn't work here, still hangs on read(): Yes, also with using TLSv1_3 explicitly: % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => { SSL_version => "TLSv1_2"})->post("https://facebook.com;, { data => "foo" }) or die' perl -MLWP::UserAgent -e 0.15s user 0.04s system 39% cpu 0.472 total % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => { SSL_version => "TLSv1_3"})->post("https://facebook.com;, { data => "foo" }) or die' perl -MLWP::UserAgent -e 0.16s user 0.05s system 46% cpu 0.449 total > $ strace -etrace=select,read perl -MLWP::UserAgent -e 'LWP::UserAgent->new( > ssl_opts => > {SSL_version => "TLSv1_3"})->post("https://facebook.com;, { data => "foo" > })' > […] > select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left > {tv_sec=179, tv_usec=98}) > read(3, "…", 5) = 5 > read(3, "…" 156) = 156 > read(3, No hang here: % time strace -etrace=select,read -o ~/tmp/strace perl -MLWP::UserAgent -e 'LWP::UserAgent->new( ssl_opts => {SSL_version => "TLSv1_3"})->post("https://facebook.com;, { data => "foo" })' strace -etrace=select,read -o ~/tmp/strace perl -MLWP::UserAgent -e 0.17s user 0.09s system 54% cpu 0.482 total (trace attached in case it helps) Cheers, gregor -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Kurt Ostbahn & Die Kombo: Wenn der Herbst langsam näherrückt read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\\21\0\0\0\0\0\0"..., 832) = 832 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\322\0\0\0\0\0\0"..., 832) = 832 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@l\0\0\0\0\0\0"..., 832) = 832 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\21\0\0\0\0\0\0"..., 832) = 832 read(3, "\257\310~f", 4)= 4 read(4, "package LWP::UserAgent;\n\nuse str"..., 8192) = 8192 read(5, "package strict;\n\n$strict::VERSIO"..., 8192) = 1606 read(5, "use 5.008;\npackage base;\n\nuse st"..., 8192) = 8192 read(5, "ibute\n# (Public, Private, et"..., 8192) = 720 read(5, "package LWP::MemberMixin;\n\nour $"..., 8192) = 875 read(5, "package Carp;\n\n{ use 5.006; }\nus"..., 8192) = 8192 read(6, "# -*- buffer-read-only: t -*-\n# "..., 8192) = 8192 read(6, "recursion'\t\t\t\t=> \"\\x00\\x00\\x00\\x"..., 8192) = 8192 read(6, "hadow'\t\t\t\t=> \"\\x00\\x00\\x00\\x00\\x"..., 8192) = 8133 read(6, "package overloading;\nuse warning"..., 8192) = 964 read(5, "; # allow caller to format refer"..., 8192) = 8192 read(5, " # Perl versions and platfo"..., 8192) = 8192 read(5, "the caller's namespace.\n"..., 8192) = 812 read(5, "package Exporter;\n\nrequire 5.006"..., 8192) = 2367 read(5, "package HTTP::Request;\n\nuse stri"..., 8192) = 8192 read(6, "package HTTP::Message;\n\nuse stri"..., 8192) = 8192 read(6, "ss\" || $ce eq \"x-compress\") {\n\t\t"..., 8192) = 8192 read(6, "SCALAR\") {\n\t# must recalculate n"..., 8192) = 8192 read(6, "HTTP::Message::decodable()\n\nThis"..., 8192) = 6155 read(6, "package HTTP::Headers;\n\nuse stri"..., 8192) = 8192 read(6, "{'content-type'};\n$self->{'c"..., 8192) = 8192 read(6, "t in the\nheader. The field name"..., 8192) = 8192 read(6, "eader(\":foo_bar\" => 1);\n\nThese f"..., 8192) = 517 read(6, "#\n# Copyright (c) 1995-2001, Ra"..., 8192) = 8192 read(7, "package Fcntl;\n\nuse strict;\nour("..., 8192) = 2156 read(7, "", 8192) = 0 read(7, "# Generated from XSLoader_pm.PL "..., 8192) = 3967 read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320!\0\0\0\0\0\0"..., 832) = 832 read(7, "package Exporter::Heavy;\n\nuse st"..., 8192) = 6406 read(7, "", 8192) = 0 read(6, " network order.\n#\nsub nfreeze {\n"..., 8192) = 8192 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P6\0\0\0\0\0\0"..., 832) = 832 read(6, "package URI;\n\nuse strict;\nuse wa"..., 8192) = 8192 read(7, "package URI::Escape;\n\nuse strict"..., 8192) = 7061 read(7, "", 8192) = 0 read(7, "package overload;\n\nour $VERSION "..., 8192) = 4441 read(8, "package warnings::register;\n\nour"..., 8192) = 488 read(5, "pplication/json; charset=UTF-8']"..., 8192) = 556 read(5,
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 at 16:59:20 +0200, gregor herrmann wrote: > When I install the package with the patch and run our test case > again, I don't get any hangs anymore: > > % time perl -MLWP::UserAgent -e > 'LWP::UserAgent->new->post("https://facebook.com;, { data => "foo" }) or die' > perl -MLWP::UserAgent -e 0.18s user 0.02s system 22% cpu 0.867 total With TLS 1.3? (You can pass ‘SSL_version => "TLSv1_3"’ to ssl_opts to force this.) Doesn't work here, still hangs on read(): $ strace -etrace=select,read perl -MLWP::UserAgent -e 'LWP::UserAgent->new( ssl_opts => {SSL_version => "TLSv1_3"})->post("https://facebook.com;, { data => "foo" })' […] select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left {tv_sec=179, tv_usec=98}) read(3, "…", 5) = 5 read(3, "…" 156) = 156 read(3, See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914034#101 and the OpenSSL changelog: “Applications that use blocking I/O in combination with something like select() or poll() will hang” -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 07:48:45 +0200, Steffen Ullrich wrote: > > You're welcome :-) Does clearing the SSL_MODE_AUTO_RETRY context flag > > (i.e., reverting the default from OpenSSL <1.1.1) solves this for you > > too? If so, what do you think about my proposed paths forwards from Not yet, I first wanted to check Net::SSLeay and IO::Socket::SSL upstream releases/commits, and ... > Simply clearing SSL_MODE_AUTO_RETRY will cause problems with blocking > connections in TLS 1.3. > I've tried to work around the behavior change by clearing SSL_MODE_AUTO_RETRY > for non-blocking and setting it again when doing blocking connections. > Please check if > https://github.com/noxxi/p5-io-socket-ssl/commit/09bc6a3203bc7bc89078317da42a3e96cdbf94fc > fixes the problems you see. ... this comes in very handy, thanks Steffen :) Unoftunately the patch doesn't apply: patching file lib/IO/Socket/SSL.pm Hunk #1 FAILED at 73. Hunk #2 succeeded at 260 with fuzz 2 (offset 128 lines). Hunk #3 succeeded at 1052 (offset 174 lines). Hunk #4 FAILED at 1091. Hunk #5 succeeded at 1126 (offset -38 lines). 2 out of 5 hunks FAILED -- rejects in file lib/IO/Socket/SSL.pm The problem is that we have IO::Socket::SL 2.060 and Net::SSLeay 1.85 (with some patches), and due to Debian being in freeze for the upcoming release, we can't just update to the newest (dev) releases … [0] Anyway, next step: I've updated the patch so that it applies against 2.060, and the package builds and passes the test suite. (Patch attached.) When I install the package with the patch and run our test case again, I don't get any hangs anymore: % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://facebook.com;, { data => "foo" }) or die' perl -MLWP::UserAgent -e 0.18s user 0.02s system 22% cpu 0.867 total So this looks promising but definitively needs more eyeballs. Thanks again, gregor [0] Status of the packages, with patches in the debia/patches directory, at: https://salsa.debian.org/perl-team/modules/packages/libio-socket-ssl-perl https://salsa.debian.org/perl-team/modules/packages/libnet-ssleay-perl -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Tom Waits: Sins Of My Father From 09bc6a3203bc7bc89078317da42a3e96cdbf94fc Mon Sep 17 00:00:00 2001 From: Steffen Ullrich Date: Tue, 9 Apr 2019 00:08:06 +0200 Subject: [PATCH] deal with OpenSSL 1.1.1 switching on SSL_AUTO_RETRY by default by disabling it when non-blocking --- lib/IO/Socket/SSL.pm | 35 +++ 1 file changed, 35 insertions(+) --- a/lib/IO/Socket/SSL.pm +++ b/lib/IO/Socket/SSL.pm @@ -67,6 +67,7 @@ my $can_ecdh;# do we support ECD my $can_ocsp;# do we support OCSP my $can_ocsp_staple; # do we support OCSP stapling my $can_tckt_keycb; # TLS ticket key callback +my $auto_retry; # (clear|set)_mode SSL_MODE_AUTO_RETRY with OpenSSL 1.1.1+ with non-blocking BEGIN { $can_client_sni = Net::SSLeay::OPENSSL_VERSION_NUMBER() >= 0x0100; $can_server_sni = defined ::SSLeay::get_servername; @@ -260,6 +261,30 @@ DH INIT { init() } init(); } + +if (!defined ::SSLeay::clear_mode) { + # assume SSL_CTRL_CLEAR_MODE being 78 since it was always this way + *Net::SSLeay::clear_mode = sub { + my ($ctx,$opt) = @_; + Net::SSLeay::ctrl($ctx,78,$opt,0); + }; +} + +if (Net::SSLeay::OPENSSL_VERSION_NUMBER() >= 0x10101000) { + # openssl 1.1.1 enabled SSL_MODE_AUTO_RETRY by default, which is bad for + # non-blocking sockets + my $mode_auto_retry = + # was always 0x0004 + eval { Net::SSLeay::MODE_AUTO_RETRY() } || 0x0004; + $auto_retry = sub { + my ($ssl,$on) = @_; + if ($on) { + Net::SSLeay::set_mode($ssl, $mode_auto_retry); + } else { + Net::SSLeay::clear_mode($ssl, $mode_auto_retry); + } + } +} } # global defaults which can be changed using set_defaults @@ -1028,6 +1053,7 @@ sub accept_SSL { } else { # timeout does not apply because invalid or socket non-blocking $timeout = undef; + $auto_retry && $auto_retry->($ssl,$socket->blocking); } my $start = defined($timeout) && time(); @@ -1101,6 +1127,14 @@ sub accept_SSL { ### I/O subroutines +if ($auto_retry) { +*blocking = sub { + my $self = shift; + { @_ && $auto_retry->($self->_get_ssl_object || last, @_); } + return $self->SUPER::blocking(@_); +}; +} + sub _generic_read { my ($self, $read_func, undef, $length, $offset) = @_; my $ssl = ${*$self}{_SSL_object} || return; signature.asc Description: Digital Signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Tue, 09 Apr 2019 at 07:48:45 +0200, Steffen Ullrich wrote: > Simply clearing SSL_MODE_AUTO_RETRY will cause problems with blocking > connections in TLS 1.3. AFAICT not when SSL_read() is used as documented. Also while the issue is triggered more often for TLS 1.3 than for earlier TLS protocol versions, it's not specific to TLS 1.3: “TLSv1.3 sends more non-application data records after the handshake is finished. At least the session ticket and possibly a key update is send after the finished message. With TLSv1.2 it happened in case of renegotiation. SSL_read() has always documented that it can return SSL_ERROR_WANT_READ after processing non-application data, even when there is still data that can be read. When SSL_MODE_AUTO_RETRY is set using SSL_CTX_set_mode() OpenSSL will try to process the next record, and so not return SSL_ERROR_WANT_READ while it still has data available. Because many applications did not handle this properly, SSL_MODE_AUTO_RETRY has been made the default. If the application is using blocking sockets and SSL_MODE_AUTO_RETRY is enabled, and select() is used to check if a socket is readable this results in SSL_read() processing the non-application data records, but then try to read an application data record which might not be available and hang.” — https://wiki.openssl.org/index.php/TLS1.3#Non-application_data_records FWIW OpenSSL 1.1.1a's changelog does mention that the new default causes regressions: “SSL_MODE_AUTO_RETRY is enabled by default. Applications that use blocking I/O in combination with something like select() or poll() will hang. This can be turned off again using SSL_CTX_clear_mode(). Many applications do not properly handle non-application data records, and TLS 1.3 sends more of such records. Setting SSL_MODE_AUTO_RETRY works around the problems in those applications, but can also break some. It's recommended to read the manpages about SSL_read(), SSL_write(), SSL_get_error(), SSL_shutdown(), SSL_CTX_set_mode() and SSL_CTX_set_read_ahead() again.” — https://github.com/openssl/openssl/blob/OpenSSL_1_1_1a/CHANGES#L153 Programs that *were* broken (would have choked on renegotation with TLS <1.3, or on key update / session ticket with TLS 1.3) might work better now, but it's *really* unfortunate that programs like LWP::Protocol::http, with a properly written select(2) loop (ie able to work around SSL_ERROR_WANT_{READ,WRITE}), are now broken. > Please check if > https://github.com/noxxi/p5-io-socket-ssl/commit/09bc6a3203bc7bc89078317da42a3e96cdbf94fc > fixes the problems you see. It doesn't, as the socket is in blocking mode when it enters the select loop. As the OpenSSL's changelog puts it, “Applications that use blocking I/O in combination with something like select() or poll() will hang”. I guess a better fix is to not to change the OpenSSL default in IO::Socket::SSL but make it configurable with a new option ‘SSL_auto_retry’; and set that option to 0 in applications with select loops. AFAICT the alternative would be to refactor all these loops, so clearly a much bigger task. This is not specific to IO::Socket::SSL, also. Any program with such select/poll loops, written in any language, needs either refactoring or SSL_MODE_AUTO_RETRY be cleared. -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
> You're welcome :-) Does clearing the SSL_MODE_AUTO_RETRY context flag > (i.e., reverting the default from OpenSSL <1.1.1) solves this for you > too? If so, what do you think about my proposed paths forwards from Simply clearing SSL_MODE_AUTO_RETRY will cause problems with blocking connections in TLS 1.3. I've tried to work around the behavior change by clearing SSL_MODE_AUTO_RETRY for non-blocking and setting it again when doing blocking connections. Please check if https://github.com/noxxi/p5-io-socket-ssl/commit/09bc6a3203bc7bc89078317da42a3e96cdbf94fc fixes the problems you see. Regards, Steffen Ullrich, Maintainer IO::Socket::SSL.
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Sun, 07 Apr 2019 at 20:56:41 +0200, gregor herrmann wrote: > Alright, after purging libssl1.0.2 (and the outdated packages which > depended on it *cough*) I get the hang as well: > […] > Thanks for the push in the right direction! You're welcome :-) Does clearing the SSL_MODE_AUTO_RETRY context flag (i.e., reverting the default from OpenSSL <1.1.1) solves this for you too? If so, what do you think about my proposed paths forwards from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914034#71 If there is consensus that libssl's SSL_CTRL_CLEAR_MODE and/or SSL_CTX_clear_mode should be exposed to Net::SSLeay I'd be happy to propose a patch there. That leaves the question about which defaults context flags should IO::Socket::SSL (or LWP) have, though. -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Sun, 07 Apr 2019 18:39:44 +0200, Guilhem Moulin wrote: > > I can't reproduce this problem: > Interesting, are you talking TLS 1.3? Good question :) > $ dpkg-query -l "libssl*" "libnet-ssleay-perl" "liblwp-protocol-https-perl" > "libio-socket-ssl-perl" > Desired=Unknown/Install/Remove/Purge/Hold > | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > ||/ Name Version Architecture Description > +++-==---= > ii libio-socket-ssl-perl 2.060-3 all Perl module > implementing object oriented interface to SSL sockets > ii liblwp-protocol-https-perl 6.07-2 all HTTPS driver for > LWP::UserAgent > ii libnet-ssleay-perl 1.85-2+b1amd64Perl module for > Secure Sockets Layer (SSL) > ii libssl-dev:amd64 1.1.1b-1 amd64Secure Sockets Layer > toolkit - development files > un libssl-doc (no description > available) > un libssl0.9.8 (no description > available) > un libssl1.0-dev(no description > available) > ii libssl1.1:amd641.1.1b-1 amd64Secure Sockets Layer > toolkit - shared libraries % dpkg -l "libssl*" "libnet-ssleay-perl" "liblwp-protocol-https-perl" "libio-socket-ssl-perl" Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==-===--= ii libio-socket-ssl-perl 2.060-3 all Perl module implementing object oriented interface to SSL sockets ii liblwp-protocol-https-perl 6.07-2 all HTTPS driver for LWP::UserAgent ii libnet-ssleay-perl 1.85-2+b1 amd64Perl module for Secure Sockets Layer (SSL) un libssl0.9.8 (no description available) un libssl1.0.0 (no description available) ii libssl1.0.2:amd64 1.0.2r-1~deb9u1 amd64Secure Sockets Layer toolkit - shared libraries ii libssl1.1:amd641.1.1b-1amd64Secure Sockets Layer toolkit - shared libraries ii libssl1.1:i386 1.1.1b-1i386 Secure Sockets Layer toolkit - shared libraries Hm I note that I still have libssl1.0.2 installed additionally. Alright, after purging libssl1.0.2 (and the outdated packages which depended on it *cough*) I get the hang as well: % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://facebook.com;, { data => "foo" }) or die' [long time nothing] perl -MLWP::UserAgent -e 0.18s user 0.02s system 0% cpu 3:06.66 total Thanks for the push in the right direction! > > % time perl -MLWP::UserAgent -e > > 'LWP::UserAgent->new->post("https://twitter.com;, { data => "foo" }) or die' > > perl -MLWP::UserAgent -e 0.13s user 0.02s system 36% cpu 0.415 total > > twitter.com doesn't support TLS 1.3 though, right? Good catch, I just wanted to try a random website which is IPv4-only. Cheers, gregor -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Furry Lewis: Billy lyons & stack o' lee signature.asc Description: Digital Signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Sun, 07 Apr 2019 at 18:12:45 +0200, gregor herrmann wrote: > On Sun, 18 Nov 2018 19:41:05 +0200, Niko Tyni wrote: > >> Reiterating a bit: the underlying issue with TLSv1.3 seems to be related >> to handling of 'non-application_data_records'. >> >> The client tries to POST but gets an 'SSL wants a read first' error, >> then waits until timeout for the socket to become writable. >> >> A simple way to reproduce it here is >> >> perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://facebook.com;, >> { data => "foo" }) or die' >> >> which deadlocks for me. > > I can't reproduce this problem: Interesting, are you talking TLS 1.3? $ dpkg-query -l "libssl*" "libnet-ssleay-perl" "liblwp-protocol-https-perl" "libio-socket-ssl-perl" Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==---= ii libio-socket-ssl-perl 2.060-3 all Perl module implementing object oriented interface to SSL sockets ii liblwp-protocol-https-perl 6.07-2 all HTTPS driver for LWP::UserAgent ii libnet-ssleay-perl 1.85-2+b1amd64Perl module for Secure Sockets Layer (SSL) ii libssl-dev:amd64 1.1.1b-1 amd64Secure Sockets Layer toolkit - development files un libssl-doc (no description available) un libssl0.9.8 (no description available) un libssl1.0-dev(no description available) ii libssl1.1:amd641.1.1b-1 amd64Secure Sockets Layer toolkit - shared libraries $ openssl req -x509 -newkey rsa:4096 -keyout /tmp/key.pem -out /tmp/cert.pem -subj /CN=example.net -nodes $ openssl s_server -accept 127.0.0.1:4433 -key /tmp/key.pem -cert /tmp/cert.pem -tls1_3 […] Then on a separate terminal, with SSL_MODE_AUTO_RETRY set (the default), it blocks on read(2): $ strace -eselect,read,write perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {verify_hostname => 0, SSL_ca_file => "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, { data => "foo" })' […] select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left {tv_sec=179, tv_usec=98}) read(3, "…", 5) = 5 read(3, "…", 250) = 250 read(3, "…", 5) = 5 read(3, "…", 250) = 250 read(3, With SSL_MODE_AUTO_RETRY cleared, the handshake terminates and it waits for the reply from the server: $ strace -eselect,read,write perl -MLWP::UserAgent -e 'LWP::UserAgent->new(ssl_opts => {verify_hostname => 0, SSL_ca_file => "/tmp/cert.pem"})->post("https://127.0.0.1:4433;, { data => "foo" })' […] select(8, [3], [3], NULL, {tv_sec=180, tv_usec=0}) = 2 (in [3], out [3], left {tv_sec=179, tv_usec=98}) read(3, "…", 5) = 5 read(3, "…", 250) = 250 write(3, "…", 216) = 216 select(8, [3], NULL, NULL, {tv_sec=180, tv_usec=0}) = 1 (in [3], left {tv_sec=179, tv_usec=99}) read(3, "…", 5) = 5 read(3, "…", 250) = 250 select(8, [3], NULL, NULL, {tv_sec=180, tv_usec=0} (and the connection closes gracefuly when I write “HTTP/1.1 200\r\nContent-Length: 0\r\n\r\n” from the server) > % time perl -MLWP::UserAgent -e > 'LWP::UserAgent->new->post("https://twitter.com;, { data => "foo" }) or die' > perl -MLWP::UserAgent -e 0.13s user 0.02s system 36% cpu 0.415 total twitter.com doesn't support TLS 1.3 though, right? $ openssl s_client -4 -connect twitter.com:443 -servername twitter.com -tls1_3 CONNECTED(0003) 139682444989504:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../ssl/record/rec_layer_s3.c:1536:SSL alert number 40 -- Guilhem. signature.asc Description: PGP signature
Bug#914034: Bug#911938: libhttp-daemon-ssl-perl FTBFS: tests fail: Connection refused
On Sun, 18 Nov 2018 19:41:05 +0200, Niko Tyni wrote: > Reiterating a bit: the underlying issue with TLSv1.3 seems to be related > to handling of 'non-application_data_records'. > > The client tries to POST but gets an 'SSL wants a read first' error, > then waits until timeout for the socket to become writable. > > A simple way to reproduce it here is > > perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://facebook.com;, > { data => "foo" }) or die' > > which deadlocks for me. I can't reproduce this problem: % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://facebook.com;, { data => "foo" }) or die' perl -MLWP::UserAgent -e 0.15s user 0.01s system 40% cpu 0.397 total Has there something changed in LWP::Protocol::https Net::HTTPS IO::Socket::SSL Net::SSLeay or something else, or is this some local environment thing? Also no issue with IPv4-only hosts: % time perl -MLWP::UserAgent -e 'LWP::UserAgent->new->post("https://twitter.com;, { data => "foo" }) or die' perl -MLWP::UserAgent -e 0.13s user 0.02s system 36% cpu 0.415 total Cheers, gregor, confused, as Guilhem (in message #71) could still reproduce it at 7 Apr 2019 -- .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe `- NP: Furry Lewis: Billy lyons & stack o' lee signature.asc Description: Digital Signature signature.asc Description: Digital Signature