Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-16 Thread Rainer Jung

Hi Stefan, Joe and all,

Am 16.10.2018 um 11:15 schrieb Joe Orton:

On Mon, Oct 15, 2018 at 12:55:45PM +0200, Rainer Jung wrote:

I'm currently testing the following patch which looks OK wrt. test suite
results. Need to run more combinations (OpenSSL version client versus
server) though. Server with 1.1.1 and with 1.0.2p both look OK (including
the h2 tests). Maybe some cases could be folded together or be dropped, but
I tried to make the logic changes not to big. The SSL_ERROR_ZERO_RETURN part
is new, because without that we get an ssl:info log line AH01992 with error
6 (SSL_ERROR_ZERO_RETURN) at the end of the response (at least with 1.1.1).


Thanks a lot Rainer & Stefan, sorry I didn't follow through on that
ticket/issue far enough.  Strike it down as another way that 1.1.1
really is ABI-incompatible with <1.1.1 :(

The change committed to ssl_engine_io.c makes sense to me. I wonder if
mod_ssl should also handle SSL_ERROR_WANT_WRITE here as well. It will be
clearly logged if that happens ("SSL library error 3 reading data") so
we should find out anyway.


Thanks to you both for double checking.

I tried to keep behavior change restricted to the observed problem but 
yes, it might be, that a more complete approach would cover more cases, 
that we are just not aware right now. It is just that I don't feel 
myself in a position for that more complete approach.


I will propose the current trunk change for 2.4 but any suggestions for 
improvements would be highly welcome. My testing currently at least 
shows no problems with the httpd test suite.


Regards,

Rainer


Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-16 Thread Joe Orton
On Mon, Oct 15, 2018 at 12:55:45PM +0200, Rainer Jung wrote:
> I'm currently testing the following patch which looks OK wrt. test suite
> results. Need to run more combinations (OpenSSL version client versus
> server) though. Server with 1.1.1 and with 1.0.2p both look OK (including
> the h2 tests). Maybe some cases could be folded together or be dropped, but
> I tried to make the logic changes not to big. The SSL_ERROR_ZERO_RETURN part
> is new, because without that we get an ssl:info log line AH01992 with error
> 6 (SSL_ERROR_ZERO_RETURN) at the end of the response (at least with 1.1.1).

Thanks a lot Rainer & Stefan, sorry I didn't follow through on that 
ticket/issue far enough.  Strike it down as another way that 1.1.1 
really is ABI-incompatible with <1.1.1 :(

The change committed to ssl_engine_io.c makes sense to me. I wonder if 
mod_ssl should also handle SSL_ERROR_WANT_WRITE here as well. It will be 
clearly logged if that happens ("SSL library error 3 reading data") so 
we should find out anyway.

Regards, Joe



Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-16 Thread Stefan Eissing
> Am 15.10.2018 um 23:16 schrieb Rainer Jung :
> 
> Adjusted SSL_read() rc value 0 handling applied in r1843954 to trunk. I'll 
> let it sit there until tomorrow for comments and then suggest for backport.

Thanks, Rainer! 

The h2 test suite runs as smoothly as it did before on my machine with OpenSSL 
1.1.1 in trunk. I am all for backporting and testing von 2.4.x on this.

-Stefan


> 
> Am 15.10.2018 um 12:55 schrieb Rainer Jung:
>> Am 15.10.2018 um 10:02 schrieb Stefan Eissing:
>>> 
>>> 
 Am 14.10.2018 um 00:46 schrieb Rainer Jung :
 
 It seems the h2 failure only happens when building httpd against OpenSSL 
 1.1.1 (independent of TLS version used). I did a quick check with an httpd 
 build against 1.1.0i and there the same vhost of the test framework 
 instance worked with the same clients, that failed for 1.1.1.
 
 The client side OpenSSL version seems not to matter.
 
 When using 1.1.1 in the server even browsers seem to fail with h2.
 
 Anyone who can successfully use h2 with 2.4.36 build against OpenSSL 1.1.1?
>>> 
>>> Me, and I got reports from others.
>>> 
 When comparing trace logs between the 1.1.1 server and the 1.1.0i server, 
 one can see, that a failing OpenSSL read (0 bytes) results in APR_EOF 
 (70014) for 1.1.1 but in errno 11 (EAGAIN) for 1.1.0i. I wonder whether 
 this is due to the new SSL_MODE_AUTO_RETRY and maybe the following change:
 
 +#if OPENSSL_VERSION_NUMBER >= 0x1010100fL
 +/* For OpenSSL >=1.1.1, disable auto-retry mode so it's possible
 + * to consume handshake records without blocking for app-data.
 + * https://github.com/openssl/openssl/issues/7178 */
 +SSL_CTX_clear_mode(ctx, SSL_MODE_AUTO_RETRY);
 +#endif
>>> 
>>> Hmm, just read the ticket made by Joe regarding this change.
>>> 
>>> I try to summarize my understanding:
>>> 
>>> - this has nothing to do with HTTP/2 in particular. It's only triggered by 
>>> the h2 calling sequence.
>>> - SSL_read() only returns transport data, but sometimes reads (part of) TLS 
>>> meta data, such as handshake messages.
>>> - When it reads this meta data (and has processed it), it can either do 
>>> another read directly after or return with
>>>its equivalent of EAGAIN. By default, it does the first.
>>> - The change in handshake handling between TLSv1.2 and TLSv1.3 led, in 
>>> Joe's testing, to the case where SSL_read()'s
>>>internal re-read() on the socket blocked, as the client was not sending 
>>> anything.
>>> - So, after discussion on the openssl issue tracker, Joe changed the 
>>> OpenSSL behaviour of this by inserting the code above.
>>> - Now, when SSL_read() is called, processes some meta data, it will not 
>>> read() on the socket again, but return a read of length 0.
>>> - mod_ssl interprets this as EOF:
>>> ssl_engine_io.c, lines 673..
>>> 
>>>  else if (rc == 0) {
>>>  /* If EAGAIN, we will loop given a blocking read,
>>>   * otherwise consider ourselves at EOF.
>>>   */
>>> 
>>> and returns APR_EOF to h2 which then shuts the connection down.
>>> 
>>> To my understanding, mod_ssl's ssl_engine_io.c, ssl_io_input_read() has the 
>>> special case handling that:
>>> - if it gets a SSL_ERROR_WANT_READ from SSL_read(), it remembers that, 
>>> calls again
>>> - if SSL_read() == 0, non-blocking, it
>>>a) on having seen SSL_ERROR_WANT_READ before, return APR_EGAIN
>>>b) otherwise, return APR_EOF
>>> 
>>> OpenSSL's documentation of SSL_read() now states:
>>> 
 Old documentation indicated a difference between 0 and -1, and that -1 was 
 retry-able.
 You should instead call SSL_get_error() to find out if it's retry-able.
>>> 
>>> So, we should change our logic here. Anyone having the courage to do so? ;-)
>> Thanks for the further investigations.
>> I'm currently testing the following patch which looks OK wrt. test suite 
>> results. Need to run more combinations (OpenSSL version client versus 
>> server) though. Server with 1.1.1 and with 1.0.2p both look OK (including 
>> the h2 tests). Maybe some cases could be folded together or be dropped, but 
>> I tried to make the logic changes not to big. The SSL_ERROR_ZERO_RETURN part 
>> is new, because without that we get an ssl:info log line AH01992 with error 
>> 6 (SSL_ERROR_ZERO_RETURN) at the end of the response (at least with 1.1.1).
>> --- modules/ssl/ssl_engine_io.c.orig2018-08-15 17:01:08.0 +0200
>> +++ modules/ssl/ssl_engine_io.c 2018-10-15 11:46:01.258042000 +0200
>> @@ -680,35 +680,32 @@
>>  }
>>  return inctx->rc;
>>  }
>> -else if (rc == 0) {
>> -/* If EAGAIN, we will loop given a blocking read,
>> - * otherwise consider ourselves at EOF.
>> - */
>> -if (APR_STATUS_IS_EAGAIN(inctx->rc)
>> -|| APR_STATUS_IS_EINTR(inctx->rc)) {
>> -/* Already read something, return 

Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-15 Thread Rainer Jung
Adjusted SSL_read() rc value 0 handling applied in r1843954 to trunk. 
I'll let it sit there until tomorrow for comments and then suggest for 
backport.


Am 15.10.2018 um 12:55 schrieb Rainer Jung:

Am 15.10.2018 um 10:02 schrieb Stefan Eissing:




Am 14.10.2018 um 00:46 schrieb Rainer Jung :

It seems the h2 failure only happens when building httpd against 
OpenSSL 1.1.1 (independent of TLS version used). I did a quick check 
with an httpd build against 1.1.0i and there the same vhost of the 
test framework instance worked with the same clients, that failed for 
1.1.1.


The client side OpenSSL version seems not to matter.

When using 1.1.1 in the server even browsers seem to fail with h2.

Anyone who can successfully use h2 with 2.4.36 build against OpenSSL 
1.1.1?


Me, and I got reports from others.

When comparing trace logs between the 1.1.1 server and the 1.1.0i 
server, one can see, that a failing OpenSSL read (0 bytes) results in 
APR_EOF (70014) for 1.1.1 but in errno 11 (EAGAIN) for 1.1.0i. I 
wonder whether this is due to the new SSL_MODE_AUTO_RETRY and maybe 
the following change:


+#if OPENSSL_VERSION_NUMBER >= 0x1010100fL
+    /* For OpenSSL >=1.1.1, disable auto-retry mode so it's possible
+ * to consume handshake records without blocking for app-data.
+ * https://github.com/openssl/openssl/issues/7178 */
+    SSL_CTX_clear_mode(ctx, SSL_MODE_AUTO_RETRY);
+#endif


Hmm, just read the ticket made by Joe regarding this change.

I try to summarize my understanding:

- this has nothing to do with HTTP/2 in particular. It's only 
triggered by the h2 calling sequence.
- SSL_read() only returns transport data, but sometimes reads (part 
of) TLS meta data, such as handshake messages.
- When it reads this meta data (and has processed it), it can either 
do another read directly after or return with

   its equivalent of EAGAIN. By default, it does the first.
- The change in handshake handling between TLSv1.2 and TLSv1.3 led, in 
Joe's testing, to the case where SSL_read()'s
   internal re-read() on the socket blocked, as the client was not 
sending anything.
- So, after discussion on the openssl issue tracker, Joe changed the 
OpenSSL behaviour of this by inserting the code above.
- Now, when SSL_read() is called, processes some meta data, it will 
not read() on the socket again, but return a read of length 0.

- mod_ssl interprets this as EOF:
ssl_engine_io.c, lines 673..

 else if (rc == 0) {
 /* If EAGAIN, we will loop given a blocking read,
  * otherwise consider ourselves at EOF.
  */

and returns APR_EOF to h2 which then shuts the connection down.

To my understanding, mod_ssl's ssl_engine_io.c, ssl_io_input_read() 
has the special case handling that:
- if it gets a SSL_ERROR_WANT_READ from SSL_read(), it remembers that, 
calls again

- if SSL_read() == 0, non-blocking, it
   a) on having seen SSL_ERROR_WANT_READ before, return APR_EGAIN
   b) otherwise, return APR_EOF

OpenSSL's documentation of SSL_read() now states:

Old documentation indicated a difference between 0 and -1, and that 
-1 was retry-able.

You should instead call SSL_get_error() to find out if it's retry-able.


So, we should change our logic here. Anyone having the courage to do 
so? ;-)


Thanks for the further investigations.

I'm currently testing the following patch which looks OK wrt. test suite 
results. Need to run more combinations (OpenSSL version client versus 
server) though. Server with 1.1.1 and with 1.0.2p both look OK 
(including the h2 tests). Maybe some cases could be folded together or 
be dropped, but I tried to make the logic changes not to big. The 
SSL_ERROR_ZERO_RETURN part is new, because without that we get an 
ssl:info log line AH01992 with error 6 (SSL_ERROR_ZERO_RETURN) at the 
end of the response (at least with 1.1.1).


--- modules/ssl/ssl_engine_io.c.orig    2018-08-15 17:01:08.0 +0200
+++ modules/ssl/ssl_engine_io.c 2018-10-15 11:46:01.258042000 +0200
@@ -680,35 +680,32 @@
  }
  return inctx->rc;
  }
-    else if (rc == 0) {
-    /* If EAGAIN, we will loop given a blocking read,
- * otherwise consider ourselves at EOF.
- */
-    if (APR_STATUS_IS_EAGAIN(inctx->rc)
-    || APR_STATUS_IS_EINTR(inctx->rc)) {
-    /* Already read something, return APR_SUCCESS instead.
- * On win32 in particular, but perhaps on other kernels,
- * a blocking call isn't 'always' blocking.
+    else /* (rc <= 0) */ {
+    if (rc == 0) {
+    /* If EAGAIN, we will loop given a blocking read,
+ * otherwise consider ourselves at EOF.
   */
-    if (*len > 0) {
-    inctx->rc = APR_SUCCESS;
-    break;
-    }
-    if (inctx->block == APR_NONBLOCK_READ) {
-    break;
-    }
-   

Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-15 Thread Rainer Jung

Am 15.10.2018 um 10:02 schrieb Stefan Eissing:




Am 14.10.2018 um 00:46 schrieb Rainer Jung :

It seems the h2 failure only happens when building httpd against OpenSSL 1.1.1 
(independent of TLS version used). I did a quick check with an httpd build 
against 1.1.0i and there the same vhost of the test framework instance worked 
with the same clients, that failed for 1.1.1.

The client side OpenSSL version seems not to matter.

When using 1.1.1 in the server even browsers seem to fail with h2.

Anyone who can successfully use h2 with 2.4.36 build against OpenSSL 1.1.1?


Me, and I got reports from others.


When comparing trace logs between the 1.1.1 server and the 1.1.0i server, one 
can see, that a failing OpenSSL read (0 bytes) results in APR_EOF (70014) for 
1.1.1 but in errno 11 (EAGAIN) for 1.1.0i. I wonder whether this is due to the 
new SSL_MODE_AUTO_RETRY and maybe the following change:

+#if OPENSSL_VERSION_NUMBER >= 0x1010100fL
+/* For OpenSSL >=1.1.1, disable auto-retry mode so it's possible
+ * to consume handshake records without blocking for app-data.
+ * https://github.com/openssl/openssl/issues/7178 */
+SSL_CTX_clear_mode(ctx, SSL_MODE_AUTO_RETRY);
+#endif


Hmm, just read the ticket made by Joe regarding this change.

I try to summarize my understanding:

- this has nothing to do with HTTP/2 in particular. It's only triggered by the 
h2 calling sequence.
- SSL_read() only returns transport data, but sometimes reads (part of) TLS 
meta data, such as handshake messages.
- When it reads this meta data (and has processed it), it can either do another 
read directly after or return with
   its equivalent of EAGAIN. By default, it does the first.
- The change in handshake handling between TLSv1.2 and TLSv1.3 led, in Joe's 
testing, to the case where SSL_read()'s
   internal re-read() on the socket blocked, as the client was not sending 
anything.
- So, after discussion on the openssl issue tracker, Joe changed the OpenSSL 
behaviour of this by inserting the code above.
- Now, when SSL_read() is called, processes some meta data, it will not read() 
on the socket again, but return a read of length 0.
- mod_ssl interprets this as EOF:
ssl_engine_io.c, lines 673..

 else if (rc == 0) {
 /* If EAGAIN, we will loop given a blocking read,
  * otherwise consider ourselves at EOF.
  */

and returns APR_EOF to h2 which then shuts the connection down.

To my understanding, mod_ssl's ssl_engine_io.c, ssl_io_input_read() has the 
special case handling that:
- if it gets a SSL_ERROR_WANT_READ from SSL_read(), it remembers that, calls 
again
- if SSL_read() == 0, non-blocking, it
   a) on having seen SSL_ERROR_WANT_READ before, return APR_EGAIN
   b) otherwise, return APR_EOF

OpenSSL's documentation of SSL_read() now states:


Old documentation indicated a difference between 0 and -1, and that -1 was 
retry-able.
You should instead call SSL_get_error() to find out if it's retry-able.


So, we should change our logic here. Anyone having the courage to do so? ;-)


Thanks for the further investigations.

I'm currently testing the following patch which looks OK wrt. test suite 
results. Need to run more combinations (OpenSSL version client versus 
server) though. Server with 1.1.1 and with 1.0.2p both look OK 
(including the h2 tests). Maybe some cases could be folded together or 
be dropped, but I tried to make the logic changes not to big. The 
SSL_ERROR_ZERO_RETURN part is new, because without that we get an 
ssl:info log line AH01992 with error 6 (SSL_ERROR_ZERO_RETURN) at the 
end of the response (at least with 1.1.1).


--- modules/ssl/ssl_engine_io.c.orig2018-08-15 17:01:08.0 +0200
+++ modules/ssl/ssl_engine_io.c 2018-10-15 11:46:01.258042000 +0200
@@ -680,35 +680,32 @@
 }
 return inctx->rc;
 }
-else if (rc == 0) {
-/* If EAGAIN, we will loop given a blocking read,
- * otherwise consider ourselves at EOF.
- */
-if (APR_STATUS_IS_EAGAIN(inctx->rc)
-|| APR_STATUS_IS_EINTR(inctx->rc)) {
-/* Already read something, return APR_SUCCESS instead.
- * On win32 in particular, but perhaps on other kernels,
- * a blocking call isn't 'always' blocking.
+else /* (rc <= 0) */ {
+if (rc == 0) {
+/* If EAGAIN, we will loop given a blocking read,
+ * otherwise consider ourselves at EOF.
  */
-if (*len > 0) {
-inctx->rc = APR_SUCCESS;
-break;
-}
-if (inctx->block == APR_NONBLOCK_READ) {
-break;
-}
-}
-else {
-if (*len > 0) {
-inctx->rc = APR_SUCCESS;
+if (APR_STATUS_IS_EAGAIN(inctx->rc)
+|| 

Re: h2 broken in 2.4.36 with OpenSSL 1.1.1? Related to SSL_MODE_AUTO_RETRY?

2018-10-15 Thread Stefan Eissing



> Am 14.10.2018 um 00:46 schrieb Rainer Jung :
> 
> It seems the h2 failure only happens when building httpd against OpenSSL 
> 1.1.1 (independent of TLS version used). I did a quick check with an httpd 
> build against 1.1.0i and there the same vhost of the test framework instance 
> worked with the same clients, that failed for 1.1.1.
> 
> The client side OpenSSL version seems not to matter.
> 
> When using 1.1.1 in the server even browsers seem to fail with h2.
> 
> Anyone who can successfully use h2 with 2.4.36 build against OpenSSL 1.1.1?

Me, and I got reports from others.

> When comparing trace logs between the 1.1.1 server and the 1.1.0i server, one 
> can see, that a failing OpenSSL read (0 bytes) results in APR_EOF (70014) for 
> 1.1.1 but in errno 11 (EAGAIN) for 1.1.0i. I wonder whether this is due to 
> the new SSL_MODE_AUTO_RETRY and maybe the following change:
> 
> +#if OPENSSL_VERSION_NUMBER >= 0x1010100fL
> +/* For OpenSSL >=1.1.1, disable auto-retry mode so it's possible
> + * to consume handshake records without blocking for app-data.
> + * https://github.com/openssl/openssl/issues/7178 */
> +SSL_CTX_clear_mode(ctx, SSL_MODE_AUTO_RETRY);
> +#endif

Hmm, just read the ticket made by Joe regarding this change.

I try to summarize my understanding:

- this has nothing to do with HTTP/2 in particular. It's only triggered by the 
h2 calling sequence.
- SSL_read() only returns transport data, but sometimes reads (part of) TLS 
meta data, such as handshake messages.
- When it reads this meta data (and has processed it), it can either do another 
read directly after or return with
  its equivalent of EAGAIN. By default, it does the first.
- The change in handshake handling between TLSv1.2 and TLSv1.3 led, in Joe's 
testing, to the case where SSL_read()'s
  internal re-read() on the socket blocked, as the client was not sending 
anything.
- So, after discussion on the openssl issue tracker, Joe changed the OpenSSL 
behaviour of this by inserting the code above.
- Now, when SSL_read() is called, processes some meta data, it will not read() 
on the socket again, but return a read of length 0.
- mod_ssl interprets this as EOF:
ssl_engine_io.c, lines 673..

else if (rc == 0) {
/* If EAGAIN, we will loop given a blocking read,
 * otherwise consider ourselves at EOF.
 */

and returns APR_EOF to h2 which then shuts the connection down.

To my understanding, mod_ssl's ssl_engine_io.c, ssl_io_input_read() has the 
special case handling that:
- if it gets a SSL_ERROR_WANT_READ from SSL_read(), it remembers that, calls 
again
- if SSL_read() == 0, non-blocking, it
  a) on having seen SSL_ERROR_WANT_READ before, return APR_EGAIN
  b) otherwise, return APR_EOF

OpenSSL's documentation of SSL_read() now states:

> Old documentation indicated a difference between 0 and -1, and that -1 was 
> retry-able. 
> You should instead call SSL_get_error() to find out if it's retry-able.

So, we should change our logic here. Anyone having the courage to do so? ;-)

-Stefan

> 
> The logs:
> 
> < OpenSSL 1.1.1
> > OpenSSL 1.1.0i
> 
> < 23:35:16.021201  h2_session.c(1704): AH03078: h2_session(81,BUSY,0): 
> transit [INIT] -- init --> [BUSY]
> > 23:39:55.715439  h2_session.c(1670): AH03078: h2_session(66,BUSY,0): 
> > transit [INIT] -- init --> [BUSY]
> 
> Some additional early stuff for 1.1.0i
> 
> > 23:39:55.715448 h2_filter.c(145): h2_session(66): read, NONBLOCK_READ, 
> > mode=0, readbytes=65536
> > 23:39:55.715470 ssl_engine_io.c(): OpenSSL: I/O error, 5 bytes expected 
> > to read on BIO#7efee0005280 [mem: 7efee0014ee3]
> > 23:39:55.715480 h2_filter.c(190): (11)Resource temporarily unavailable: 
> > h2_session(66): read
> > 23:39:55.715508 h2_mplx.c(1240): h2_mplx(66): dispatch events
> > 23:39:55.715536 h2_session.c(589): AH03068: h2_session(66,BUSY,0): sent 
> > FRAME[SETTINGS[length=6, stream=0]], frames=0/1 (r/s)
> > 23:39:55.715548 h2_session.c(589): AH03068: h2_session(66,BUSY,0): sent 
> > FRAME[WINDOW_UPDATE[stream=0, incr=2147418112]], frames=0/2 (r/s)
> > 23:39:55.715565 h2_conn_io.c(122): h2_session(66)-out: heap[28] flush
> > 23:39:55.715590 core_filters.c(580): brigade contains: bytes: 57, non-file 
> > bytes: 57, eor buckets: 0, morphing buckets: 0
> > 23:39:55.715598 ssl_engine_io.c(2213): OpenSSL: write 57/57 bytes to 
> > BIO#7efee00021b0 [mem: 7efee0014ee3]
> > 23:39:55.715648 core_filters.c(525): will flush because of FLUSH bucket
> > 23:39:55.715653 core_filters.c(535): seen in brigade so far: bytes: 57, 
> > non-file bytes: 57, eor buckets: 0, morphing buckets: 0
> > 23:39:55.715657 core_filters.c(554): flushing now
> > 23:39:55.715683 core_filters.c(569): total bytes written: 2466
> > 23:39:55.715690 core_filters.c(580): brigade contains: bytes: 0, non-file 
> > bytes: 0, eor buckets: 0, morphing buckets: 0
> 
> Then many things identical
> 
> < 23:35:16.021217 h2_filter.c(145): h2_session(81): read, NONBLOCK_READ, 
> mode=0,