Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion

On 02/02/2017 03:05 PM, Yann Ylavic wrote:

Couldn't htcacheclean or alike do something like this?
"EnableMMAP off" could definitely help here.


(Didn't mean to ignore this part of your email, but I don't have much 
experience with htcacheclean yet so I can't really comment...)


--Jacob



Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion

On 02/02/2017 03:05 PM, Yann Ylavic wrote:

Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).


See also https://bz.apache.org/bugzilla/show_bug.cgi?id=46688 .

Niklas, just to clarify: you're not willfully truncating large files as 
they're being served, right? I *can* reproduce a SIGBUS if I start 
truncating files during my stress testing. But EnableMMAP's 
documentation calls that case out explicitly.


--Jacob


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic
On Thu, Feb 2, 2017 at 11:36 PM, Jacob Champion  wrote:
> On 02/02/2017 02:32 PM, Yann Ylavic wrote:
>>
>> On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion 
>> wrote:
>>>
>>> Idle thoughts: "Cannot access memory" in this case could be a red
>>> herring,
>>> if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems
>>> reasonable
>>> that the data in question could be mmap'd, given the nice round address
>>> and
>>> 4 MiB length (equal to APR_MMAP_LIMIT).
>>>
>>> That doesn't mean we're looking in the wrong place, though, since SIGBUS
>>> can
>>> also be generated by an out-of-bounds access to an mmap'd region.
>>
>>
>> Right, looks like the memory has been unmapped though (SIGBUS) before
>> being (re)used.
>
> Oh, I thought an access after an unmap would SIGSEGV instead of SIGBUS. I
> haven't ever tested that out; I should try it...

Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).

Couldn't htcacheclean or alike do something like this?
"EnableMMAP off" could definitely help here.

[1] http://man7.org/linux/man-pages/man2/mmap.2.html


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion

On 02/02/2017 02:32 PM, Yann Ylavic wrote:

On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion  wrote:


Idle thoughts: "Cannot access memory" in this case could be a red herring,
if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems reasonable
that the data in question could be mmap'd, given the nice round address and
4 MiB length (equal to APR_MMAP_LIMIT).

That doesn't mean we're looking in the wrong place, though, since SIGBUS can
also be generated by an out-of-bounds access to an mmap'd region.


Right, looks like the memory has been unmapped though (SIGBUS) before
being (re)used.


Oh, I thought an access after an unmap would SIGSEGV instead of SIGBUS. 
I haven't ever tested that out; I should try it...


--Jacob


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic
On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion  wrote:
>
> Idle thoughts: "Cannot access memory" in this case could be a red herring,
> if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems reasonable
> that the data in question could be mmap'd, given the nice round address and
> 4 MiB length (equal to APR_MMAP_LIMIT).
>
> That doesn't mean we're looking in the wrong place, though, since SIGBUS can
> also be generated by an out-of-bounds access to an mmap'd region.

Right, looks like the memory has been unmapped though (SIGBUS) before
being (re)used.
Does "EnableMMAP off" help or produce another backtrace?

>
> Niklas, what version of APR are you using? Are you serving large (> 4 MiB)
> static files? I have not been able to reproduce so far (Ubuntu 16.04, httpd
> 2.4.25 + mod_ssl + mpm_event).

The original file bucket comes from mod_cache, and indeed looks larger than 4MB.
If it were (htcache)cleaned while being served, SIGBUS shouldn't
happen still since we hold an fd (and reference) on it...


Regards,
Yann.


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion

On 02/02/2017 02:04 AM, Yann Ylavic wrote:

Hi Niklas,

On Wed, Feb 1, 2017 at 7:02 PM, Niklas Edmundsson  wrote:


We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl on
Ubuntu 14.04LTS. Not frequent, but none the less happening.

#4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
ssl_engine_io.c:1746
data = 0x7f5075518000 
len = 4194304
bucket = 0x7f4f840b1ba8
status = 
filter_ctx = 0x7f507013cf88
inctx = 
outctx = 0x7f507013d008
rblock = APR_NONBLOCK_READ


Idle thoughts: "Cannot access memory" in this case could be a red 
herring, if Niklas' gdb can't peer into mmap'd memory spaces [1]. It 
seems reasonable that the data in question could be mmap'd, given the 
nice round address and 4 MiB length (equal to APR_MMAP_LIMIT).


That doesn't mean we're looking in the wrong place, though, since SIGBUS 
can also be generated by an out-of-bounds access to an mmap'd region.


Niklas, what version of APR are you using? Are you serving large (> 4 
MiB) static files? I have not been able to reproduce so far (Ubuntu 
16.04, httpd 2.4.25 + mod_ssl + mpm_event).


--Jacob

[1] 
https://stackoverflow.com/questions/654393/examining-mmaped-addresses-using-gdb


Re: Autobuild Progress (was Re: Automated tests)

2017-02-02 Thread Jacob Champion

On 01/30/2017 12:02 PM, Jacob Champion wrote:

- run per-commit incremental builds
- run nightly clean builds


These two are implemented. Every commit you make to trunk (well, group 
of commits, within fifteen seconds of each other) is run through an 
incremental build, which takes about ten seconds. Every eight hours, the 
build tree is clobbered, resync'd, and built from scratch, which takes 
about a minute.


Reporting is still IRC-only. I want to keep an eye on the bots for bit 
before unleashing them upon the mailing list.


--Jacob



Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Niklas Edmundsson

On Thu, 2 Feb 2017, Niklas Edmundsson wrote:


On Thu, 2 Feb 2017, Yann Ylavic wrote:


Are we hitting a corner case of process cleanup that plays merry hell with
https/ssl, or are we just having bad luck? Ideas? Suggestions?


2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Built and deployed, waiting for the most affected host to drain in order to 
restart it.


I'll also lower MaxConnectionsPerChild a bit more in order to stress this.


Still seems to SIGBUS. backtraces attached.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 New Borg Software Package : Locutus 1-2-3
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

bt.2201.bz2
Description: Binary data


bt.6429.bz2
Description: Binary data


bt.8071.bz2
Description: Binary data


Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

2017-02-02 Thread Yann Ylavic
On Thu, Feb 2, 2017 at 5:16 PM, Yann Ylavic  wrote:
> On Thu, Feb 2, 2017 at 4:51 PM, Luca Toscano  wrote:
>>
>> 2017-01-30 21:58 GMT+01:00 Yann Ylavic :
>>>
>>> Maybe what is missing is nonblocking reads on the backend side, and on
>>> EAGAIN flush on the client side to detect potential socket errors?
>>
>> Interesting.. So the idea would be to be non blocking while reading from the
>> FCGI backend, and in case of EAGAIN flush to the client in order to force
>> ap_pass_brigade to detect if the client aborted the connection.
>
> Yes, poll() => POLLIN => read() while data available => EAGAIN =>
> flush => poll().

The "read() while data available => EAGAIN" may be replaced by "read()
while read buffer is full" so to avoid a (likely) useless last
non-blocking read.

>
> The flush is necessary because output filters down the road might have
> buffered, so it makes so that everything so far reaches the socket
> (with potential error).
>
>> The flushing
>> part might be tricky to implement (at least for me, but I'll try to work on
>> it :).
>
> It should be as easy as pushing a FLUSH bucket at the end of the
> brigade passed to the client ;)
>
> The hard part may be more the non-blocking read (and handling of
> incomplete headers/data).
>
> Regards,
> Yann.


Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

2017-02-02 Thread Yann Ylavic
On Thu, Feb 2, 2017 at 4:51 PM, Luca Toscano  wrote:
>
> 2017-01-30 21:58 GMT+01:00 Yann Ylavic :
>>
>> Maybe what is missing is nonblocking reads on the backend side, and on
>> EAGAIN flush on the client side to detect potential socket errors?
>
> Interesting.. So the idea would be to be non blocking while reading from the
> FCGI backend, and in case of EAGAIN flush to the client in order to force
> ap_pass_brigade to detect if the client aborted the connection.

Yes, poll() => POLLIN => read() while data available => EAGAIN =>
flush => poll().

The flush is necessary because output filters down the road might have
buffered, so it makes so that everything so far reaches the socket
(with potential error).

> The flushing
> part might be tricky to implement (at least for me, but I'll try to work on
> it :).

It should be as easy as pushing a FLUSH bucket at the end of the
brigade passed to the client ;)

The hard part may be more the non-blocking read (and handling of
incomplete headers/data).

Regards,
Yann.


Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

2017-02-02 Thread Luca Toscano
2017-01-30 21:58 GMT+01:00 Yann Ylavic :

> On Mon, Jan 30, 2017 at 7:17 PM, Luca Toscano 
> wrote:
> >
> > The use case that I had (the one that caused me to check the original
> > bugzilla task/patch and work on it) was a long running PHP script
> (running
> > on HHVM) that wasn't returning anything until the end of the job
> (minutes),
> > causing mod-proxy-fcgi to keep waiting even if the client disconnected.
> In
> > the beginning I also thought that there was a way to "signal" the FCGI
> > backend to stop whatever was doing (FCGI_ABORT), but it turned out to be
> not
> > widely implemented (at least from what I have checked, more info in
> > bugzilla). Timeout and ProxyTimeout could be a good compromise for this
> > particular issue, but it is a "one size fits all" solution that some
> users
> > didn't like (providing a patch with the pollset solution).
>
> I think that's what CGIDScriptTimeout or ProxyTimeout are for: avoid
> waiting too much for the script or backend (above some reasonably
> configured limit).
>
> >
> > If we remove the above use case (I fixed it in my Production environment
> > with a long Timeout value), would it be fine to rely on something like
> the
> > conn->aborted flag (afaics set by the output filter's chain while
> writing to
> > the client's conn)? It would simplify a lot the problem :)
>
> AFAICT, if forwarding data to the client fails (e.g. socket
> disconnected), ap_pass_brigade() returns an error and the loop exits.
>

Yes confirmed with a test (that flushes data to output periodically), the
connection->aborted flag is set and mod_proxy_fcgi behaves correctly.


> Maybe what is missing is nonblocking reads on the backend side, and on
> EAGAIN flush on the client side to detect potential socket errors?
>

Interesting.. So the idea would be to be non blocking while reading from
the FCGI backend, and in case of EAGAIN flush to the client in order to
force ap_pass_brigade to detect if the client aborted the connection. The
flushing part might be tricky to implement (at least for me, but I'll try
to work on it :).

In any case, it seems like we are in agreement to discard the client
connection polling idea (if anybody has more thoughts please speak up). In
any case, I believe that this use case would need to be documented properly
in the docs to warn users assuming that FCGI_ABORT is implemented.

Thanks for the review!

Luca


Re: Underscores in hostnames

2017-02-02 Thread Reindl Harald



Am 02.02.2017 um 14:22 schrieb Reindl Harald:



Am 02.02.2017 um 13:53 schrieb Joe Orton:

Another 2.4.25 regression reported from a Fedora user is that
underscores in hostnames are rejected by default now.  I couldn't see a
specific discussion of this, was it deliberate?


underscores are not allowed in host names by RFC and many things will
break at all with them because in different layers of client software
things just break by using them

we had that more than once in devel environments where strange bugs
turend out to be another case where sombody used a underline in
/etc/hosts and his local webserver


here you go:
https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames

The Internet standards (Requests for Comments) for protocols mandate 
that component hostname labels may contain only the ASCII letters 'a' 
through 'z' (in a case-insensitive manner), the digits '0' through '9', 
and the hyphen ('-'). The original specification of hostnames in RFC 
952, mandated that labels could not start with a digit or with a hyphen, 
and must not end with a hyphen. However, a subsequent specification (RFC 
1123) permitted hostname labels to start with digits. No other symbols, 
punctuation characters, or white space are permitted.


While a hostname may not contain other characters, such as the 
underscore character (_), other DNS names may contain the underscore.[4] 
Systems such as DomainKeys and service records use the underscore as a 
means to assure that their special character is not confused with 
hostnames. For example, _http._sctp.www.example.com specifies a service 
pointer for an SCTP capable webserver host (www) in the domain 
example.com. Note that some applications (e.g. Microsoft Internet 
Explorer) won't work correctly if any part of the hostname contains an 
underscore character.[5]


Re: Underscores in hostnames

2017-02-02 Thread Reindl Harald



Am 02.02.2017 um 13:53 schrieb Joe Orton:

Another 2.4.25 regression reported from a Fedora user is that
underscores in hostnames are rejected by default now.  I couldn't see a
specific discussion of this, was it deliberate?


underscores are not allowed in host names by RFC and many things will 
break at all with them because in different layers of client software 
things just break by using them


we had that more than once in devel environments where strange bugs 
turend out to be another case where sombody used a underline in 
/etc/hosts and his local webserver




Re: Underscores in hostnames

2017-02-02 Thread Issac Goldstand
AFAIK, underscores are forbidden from being part of a host name as per 
RFC 1123 Sec 2.1/RFC 952 (Assummptions Sec 1)


It's also spelled out in RFC 3986:
"
  A registered name intended for lookup in the DNS (...)
  consists of a sequence of domain labels separated by ".",
   each domain label starting and ending with an alphanumeric character
   and possibly also containing "-" characters.

"
  Issac

On 2/2/2017 2:53 PM, Joe Orton wrote:

Another 2.4.25 regression reported from a Fedora user is that
underscores in hostnames are rejected by default now.  I couldn't see a
specific discussion of this, was it deliberate?

Following breadcrumbs...

https://tools.ietf.org/html/rfc7230#section-5.4
  Host = uri-host [ ":" port ] ; Section 2.7.1

https://tools.ietf.org/html/rfc7230#section-2.7
uri-host = 

https://tools.ietf.org/html/rfc3986#section-3.2.2
   host= IP-literal / IPv4address / reg-name
...
   reg-name= *( unreserved / pct-encoded / sub-delims )

https://tools.ietf.org/html/rfc3986#section-2.3
   unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"


From google I can see that _ in hostnames has various issues, so I'm not

sure what is right here.  It's simple enough to relax the check, seems
unlikely allowing ~ is a good idea.

Index: server/vhost.c
===
--- server/vhost.c  (revision 1781359)
+++ server/vhost.c  (working copy)
@@ -757,7 +757,10 @@
  int is_dotted_decimal = 1, leading_zeroes = 0, dots = 0;

  for (ch = host; *ch; ch++) {
-if (apr_isalpha(*ch) || *ch == '-') {
+/* This should allow any character in 'uri-host' per RFC
+ * 7320s5.4, which is 'host' by RFC 3986, which matches any
+ * 'unreserved' character. */
+if (apr_isalpha(*ch) || *ch == '-' || *ch == '_') {
  is_dotted_decimal = 0;
  }
  else if (ch[0] == '.') {





Underscores in hostnames

2017-02-02 Thread Joe Orton
Another 2.4.25 regression reported from a Fedora user is that 
underscores in hostnames are rejected by default now.  I couldn't see a 
specific discussion of this, was it deliberate?

Following breadcrumbs...

https://tools.ietf.org/html/rfc7230#section-5.4
 Host = uri-host [ ":" port ] ; Section 2.7.1

https://tools.ietf.org/html/rfc7230#section-2.7
   uri-host = 

https://tools.ietf.org/html/rfc3986#section-3.2.2
  host= IP-literal / IPv4address / reg-name
...
  reg-name= *( unreserved / pct-encoded / sub-delims )

https://tools.ietf.org/html/rfc3986#section-2.3
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

>From google I can see that _ in hostnames has various issues, so I'm not 
sure what is right here.  It's simple enough to relax the check, seems 
unlikely allowing ~ is a good idea.

Index: server/vhost.c
===
--- server/vhost.c  (revision 1781359)
+++ server/vhost.c  (working copy)
@@ -757,7 +757,10 @@
 int is_dotted_decimal = 1, leading_zeroes = 0, dots = 0;
 
 for (ch = host; *ch; ch++) {
-if (apr_isalpha(*ch) || *ch == '-') {
+/* This should allow any character in 'uri-host' per RFC
+ * 7320s5.4, which is 'host' by RFC 3986, which matches any
+ * 'unreserved' character. */
+if (apr_isalpha(*ch) || *ch == '-' || *ch == '_') {
 is_dotted_decimal = 0;
 }
 else if (ch[0] == '.') {



AW: httpd 2.4.25, mpm_event, ssl: Status of async write completion?

2017-02-02 Thread Plüm , Rüdiger , Vodafone Group


> -Ursprüngliche Nachricht-
> Von: Niklas Edmundsson [mailto:ni...@acc.umu.se]
> Gesendet: Donnerstag, 2. Februar 2017 13:31
> An: httpd-dev 
> Betreff: httpd 2.4.25, mpm_event, ssl: Status of async write completion?
> 
> 
> Hi all!
> 
> As we're seeing more and more https on ftp.acc.umu.se I've noticed
> that the number of threads listed as state W in server-status has
> skyrocketed.
> 
> From 2-4 threads busy using http we're talking 70-100 threads for the
> same bandwidth when the machine is pushing a mighty 2.7% average CPU
> load.
> 
> This is on a large-file workload, serving plain files.
> 
> I anticipated a few more threads due to ssl, but for slow downloaders
> I would expect connections doing async write and mostly waiting for
> the bits to dribble through the Internet Tubes inbetween the
> occational wakeup to do ssl
> 
> Are we not doing async write completion at all on https/ssl and
> falling back to the old worker behaviour of one thread per connection?


We do async with SSL as well, but in a worse way than http with static 
resources. For static resources
in the form of files we can deliver the whole thing async without switching
back to sync with http. For SSL we are in sync mode each time we need to 
encrypt. Only encrypted
data that cannot be sent immediately is sent async.

So the increase you see can be said to be expected. Trunk has approaches to 
overcome this, though there
has been reported that there are still some bugs with it.

Regards

Rüdiger



httpd 2.4.25, mpm_event, ssl: Status of async write completion?

2017-02-02 Thread Niklas Edmundsson


Hi all!

As we're seeing more and more https on ftp.acc.umu.se I've noticed 
that the number of threads listed as state W in server-status has 
skyrocketed.


From 2-4 threads busy using http we're talking 70-100 threads for the 
same bandwidth when the machine is pushing a mighty 2.7% average CPU 
load.


This is on a large-file workload, serving plain files.

I anticipated a few more threads due to ssl, but for slow downloaders 
I would expect connections doing async write and mostly waiting for 
the bits to dribble through the Internet Tubes inbetween the 
occational wakeup to do ssl


Are we not doing async write completion at all on https/ssl and 
falling back to the old worker behaviour of one thread per connection?



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 At last I'm organized", he sighed, and died.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Niklas Edmundsson

On Thu, 2 Feb 2017, Yann Ylavic wrote:


Are we hitting a corner case of process cleanup that plays merry hell with
https/ssl, or are we just having bad luck? Ideas? Suggestions?


2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Built and deployed, waiting for the most affected host to drain in 
order to restart it.


I'll also lower MaxConnectionsPerChild a bit more in order to stress 
this.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 At last I'm organized", he sighed, and died.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_http2 and Frequent wake-ups for mpm_event

2017-02-02 Thread Stefan Priebe - Profihost AG

Am 02.02.2017 um 11:09 schrieb Yann Ylavic:
> Hi Stefan,
> 
> On Tue, Jan 31, 2017 at 4:01 PM, Stefan Priebe - Profihost AG
>  wrote:
>>
>> any ideas?
> 
> I wonder if the attached patch (related to mod_ssl and proposed for
> another segfault report) could help in your case.
> 
> Would you mind give it a try?

Up and running. It will take some time until i can report back as those
crashes are very rare.

Stefan


Re: mod_http2 and Frequent wake-ups for mpm_event

2017-02-02 Thread Yann Ylavic
Hi Stefan,

On Tue, Jan 31, 2017 at 4:01 PM, Stefan Priebe - Profihost AG
 wrote:
>
> any ideas?

I wonder if the attached patch (related to mod_ssl and proposed for
another segfault report) could help in your case.

Would you mind give it a try?


Thanks,
Yann.
Index: modules/ssl/ssl_engine_io.c
===
--- modules/ssl/ssl_engine_io.c	(revision 1781324)
+++ modules/ssl/ssl_engine_io.c	(working copy)
@@ -138,6 +138,7 @@ static int bio_filter_out_pass(bio_filter_out_ctx_
 
 outctx->rc = ap_pass_brigade(outctx->filter_ctx->pOutputFilter->next,
  outctx->bb);
+apr_brigade_cleanup(outctx->bb);
 /* Fail if the connection was reset: */
 if (outctx->rc == APR_SUCCESS && outctx->c->aborted) {
 outctx->rc = APR_ECONNRESET;
@@ -1699,13 +1700,12 @@ static apr_status_t ssl_io_filter_output(ap_filter
 while (!APR_BRIGADE_EMPTY(bb) && status == APR_SUCCESS) {
 apr_bucket *bucket = APR_BRIGADE_FIRST(bb);
 
-if (APR_BUCKET_IS_METADATA(bucket)) {
+if (APR_BUCKET_IS_METADATA(bucket) || !filter_ctx->pssl) {
 /* Pass through metadata buckets untouched.  EOC is
  * special; terminate the SSL layer first. */
 if (AP_BUCKET_IS_EOC(bucket)) {
 ssl_filter_io_shutdown(filter_ctx, f->c, 0);
 }
-AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(outctx->bb));
 
 /* Metadata buckets are passed one per brigade; it might
  * be more efficient (but also more complex) to use
@@ -1712,11 +1712,10 @@ static apr_status_t ssl_io_filter_output(ap_filter
  * outctx->bb as a true buffer and interleave these with
  * data buckets. */
 APR_BUCKET_REMOVE(bucket);
-APR_BRIGADE_INSERT_HEAD(outctx->bb, bucket);
-status = ap_pass_brigade(f->next, outctx->bb);
-if (status == APR_SUCCESS && f->c->aborted)
-status = APR_ECONNRESET;
-apr_brigade_cleanup(outctx->bb);
+APR_BRIGADE_INSERT_TAIL(outctx->bb, bucket);
+if (bio_filter_out_pass(outctx) < 0) {
+status = outctx->rc;
+}
 }
 else {
 /* Filter a data bucket. */


Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic
Hi Niklas,

On Wed, Feb 1, 2017 at 7:02 PM, Niklas Edmundsson  wrote:
>
> We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl on
> Ubuntu 14.04LTS. Not frequent, but none the less happening.
>
> #4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
> ssl_engine_io.c:1746
> data = 0x7f5075518000  0x7f5075518000>
> len = 4194304
> bucket = 0x7f4f840b1ba8
> status = 
> filter_ctx = 0x7f507013cf88
> inctx = 
> outctx = 0x7f507013d008
> rblock = APR_NONBLOCK_READ

I suspect some cleanup ordering issue happening in
ssl_io_filter_output(), when the EOC bucket is found.

>
> Are we hitting a corner case of process cleanup that plays merry hell with
> https/ssl, or are we just having bad luck? Ideas? Suggestions?

2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Regards,
Yann.
Index: modules/ssl/ssl_engine_io.c
===
--- modules/ssl/ssl_engine_io.c	(revision 1781324)
+++ modules/ssl/ssl_engine_io.c	(working copy)
@@ -138,6 +138,7 @@ static int bio_filter_out_pass(bio_filter_out_ctx_
 
 outctx->rc = ap_pass_brigade(outctx->filter_ctx->pOutputFilter->next,
  outctx->bb);
+apr_brigade_cleanup(outctx->bb);
 /* Fail if the connection was reset: */
 if (outctx->rc == APR_SUCCESS && outctx->c->aborted) {
 outctx->rc = APR_ECONNRESET;
@@ -1699,13 +1700,12 @@ static apr_status_t ssl_io_filter_output(ap_filter
 while (!APR_BRIGADE_EMPTY(bb) && status == APR_SUCCESS) {
 apr_bucket *bucket = APR_BRIGADE_FIRST(bb);
 
-if (APR_BUCKET_IS_METADATA(bucket)) {
+if (APR_BUCKET_IS_METADATA(bucket) || !filter_ctx->pssl) {
 /* Pass through metadata buckets untouched.  EOC is
  * special; terminate the SSL layer first. */
 if (AP_BUCKET_IS_EOC(bucket)) {
 ssl_filter_io_shutdown(filter_ctx, f->c, 0);
 }
-AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(outctx->bb));
 
 /* Metadata buckets are passed one per brigade; it might
  * be more efficient (but also more complex) to use
@@ -1712,11 +1712,10 @@ static apr_status_t ssl_io_filter_output(ap_filter
  * outctx->bb as a true buffer and interleave these with
  * data buckets. */
 APR_BUCKET_REMOVE(bucket);
-APR_BRIGADE_INSERT_HEAD(outctx->bb, bucket);
-status = ap_pass_brigade(f->next, outctx->bb);
-if (status == APR_SUCCESS && f->c->aborted)
-status = APR_ECONNRESET;
-apr_brigade_cleanup(outctx->bb);
+APR_BRIGADE_INSERT_TAIL(outctx->bb, bucket);
+if (bio_filter_out_pass(outctx) < 0) {
+status = outctx->rc;
+}
 }
 else {
 /* Filter a data bucket. */