Re: [users@httpd] graceful-stop closes established connections without response

2024-01-31 Thread Sherrard Burton




On 1/31/24 3:11 PM, Sherrard Burton wrote:


two is about par for the course _when there are resets_. but it doesn't 
necessarily happen on every test run. for example, after initially 
applying the v1 patch (and having fully composed a response to say that 
the patch seemed to have worked :-)), i discovered that i had forgotten 
to switch back from prefork to event. ie, i ran the test a few times in 
a row with no resets, even though the patch was not in play.


i have not previously, but i will try to gather some statistics about 
number of resets per failed run and failed vs successful runs, since the 
combination of the two is needed to make any quantitative assessment.


correction: two is par for the course when the client connection is 
remote. in order to have an iterable test setup, i created a bash loop 
that started the tcpdump, strace and curl instances all on the server 
while simultaneously gracefully stopping apache. with client connection 
coming from the localhost, we end up with many more reset connections 
per run, and failures present in each test.


but ultimately, the results are such that i don't think any changes are 
warranted.


stock debian apache 2.4.57-2, 100 iterations:
average number of established connections/submitted requests:
67284 / 100 = 672.84
average number of responses received:
65162 / 100 = 651.62
average number of reset connections:
2122 / 100 = 21.22

patched with v3, 100 iterations:
average number of established connections/submitted requests:
68319 / 100 = 683.19
average number of responses received:
66082 / 100 = 660.82
average number of reset connections:
2237 / 100 = 22.37


not sure if the slight increase in connections established and handled 
under the patched version indicates that there might be some benefit to 
the patch, or if the difference is small enough that it can be chalked 
up slight differences in resource demands in the VM host at the time 
that the tests were running.


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-31 Thread Sherrard Burton




On 1/31/24 06:24 AM, Yann Ylavic wrote:

On Tue, Jan 30, 2024 at 8:24 PM Sherrard Burton  wrote:


i have confirmed that the patch has been applied, and the behavior still
persists, as confirmed by comparing the counts of [SYN,ACK] and accept()

~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4'
/tmp/strace-apache2.out.* | grep -Fc .240.209
reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked
v2), snapshot length 262144
Warning: interface names might be incorrect
3485
3483


This means those two connections came in (or were made available by
the system) after the last accept() call, which is the race condition
that httpd can do nothing about unfortunately.

How much does it improve compared to non-patched httpd, how many reset
connections without the patch?
If not significant I don't think it's worth attempting to do something
about it..



two is about par for the course _when there are resets_. but it doesn't 
necessarily happen on every test run. for example, after initially 
applying the v1 patch (and having fully composed a response to say that 
the patch seemed to have worked :-)), i discovered that i had forgotten 
to switch back from prefork to event. ie, i ran the test a few times in 
a row with no resets, even though the patch was not in play.


i have not previously, but i will try to gather some statistics about 
number of resets per failed run and failed vs successful runs, since the 
combination of the two is needed to make any quantitative assessment.


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-31 Thread Yann Ylavic
On Tue, Jan 30, 2024 at 8:24 PM Sherrard Burton  wrote:
>
> i have confirmed that the patch has been applied, and the behavior still
> persists, as confirmed by comparing the counts of [SYN,ACK] and accept()
>
> ~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4'
> /tmp/strace-apache2.out.* | grep -Fc .240.209
> reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked
> v2), snapshot length 262144
> Warning: interface names might be incorrect
> 3485
> 3483

This means those two connections came in (or were made available by
the system) after the last accept() call, which is the race condition
that httpd can do nothing about unfortunately.

How much does it improve compared to non-patched httpd, how many reset
connections without the patch?
If not significant I don't think it's worth attempting to do something
about it..

Regards;
Yann.

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-30 Thread Sherrard Burton
i have confirmed that the patch has been applied, and the behavior still 
persists, as confirmed by comparing the counts of [SYN,ACK] and accept()


~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4' 
/tmp/strace-apache2.out.* | grep -Fc .240.209
reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked 
v2), snapshot length 262144

Warning: interface names might be incorrect
3485
3483

as well as

~$ for i in '> GET /index.html' '< HTTP/1.1 200 OK'; do echo -e 
"'$i':\t$( cat /tmp/curl-[0-9]*.out | grep -Fc "$i" )"; done

'> GET /index.html': 3485
'< HTTP/1.1 200 OK': 3483

~$ grep -F reset /tmp/curl-[0-9]*.out
/tmp/curl-13.out:* Recv failure: Connection reset by peer
/tmp/curl-13.out:curl: (56) Recv failure: Connection reset by peer
/tmp/curl-2.out:* Recv failure: Connection reset by peer
/tmp/curl-2.out:curl: (56) Recv failure: Connection reset by peer

which all seem to agree that there were two failed connections

and, in case this shines some light on things:

~$ grep -F '19:10:50' /var/log/apache2/error.log | grep -F XXX
[Tue Jan 30 19:10:50.357074 2024] [mpm_event:notice] [pid 151381:tid 
140267709454016] XXX: may exit (0, 1)
[Tue Jan 30 19:10:50.357498 2024] [mpm_event:notice] [pid 151382:tid 
140267709454016] XXX: may exit (0, 2)
[Tue Jan 30 19:10:50.357610 2024] [mpm_event:notice] [pid 151381:tid 
140267709454016] XXX: draining
[Tue Jan 30 19:10:50.358011 2024] [mpm_event:notice] [pid 151381:tid 
140267709454016] XXX: exiting
[Tue Jan 30 19:10:50.358091 2024] [mpm_event:notice] [pid 151382:tid 
140267709454016] XXX: draining
[Tue Jan 30 19:10:50.358201 2024] [mpm_event:notice] [pid 151381:tid 
140267709454016] XXX: closing
[Tue Jan 30 19:10:50.358492 2024] [mpm_event:notice] [pid 151382:tid 
140267709454016] XXX: exiting
[Tue Jan 30 19:10:50.358906 2024] [mpm_event:notice] [pid 151382:tid 
140267709454016] XXX: closing
[Tue Jan 30 19:10:50.366948 2024] [mpm_event:notice] [pid 151381:tid 
140267709454016] XXX: exited
[Tue Jan 30 19:10:50.367124 2024] [mpm_event:notice] [pid 151382:tid 
140267709454016] XXX: exited



On 1/30/24 06:09 AM, Yann Ylavic wrote:

On Tue, Jan 30, 2024 at 11:54 AM Yann Ylavic  wrote:


On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton  wrote:


i was going to add some debugging lines, but when i took a quick look at
the patch, i wasn't clear on which sections of the code i should be
guaranteed to hit. can you be so kind as to send an updated patch with
some gratuitous logging in the appropriate sections so that there will
be positive affirmation that the patch has (or hasn't) been applied and
is falling into the expected sections?


Sure, here is a v2 (which also includes a fix w.r.t. v1).


Argh, please use this v3 instead, I missed that EINTR could interfere
and should be ignored while draining.



Regards;
Yann.


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-30 Thread Yann Ylavic
On Tue, Jan 30, 2024 at 11:54 AM Yann Ylavic  wrote:
>
> On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton  wrote:
> >
> > i was going to add some debugging lines, but when i took a quick look at
> > the patch, i wasn't clear on which sections of the code i should be
> > guaranteed to hit. can you be so kind as to send an updated patch with
> > some gratuitous logging in the appropriate sections so that there will
> > be positive affirmation that the patch has (or hasn't) been applied and
> > is falling into the expected sections?
>
> Sure, here is a v2 (which also includes a fix w.r.t. v1).

Argh, please use this v3 instead, I missed that EINTR could interfere
and should be ignored while draining.

>
> Regards;
> Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1915442)
+++ server/mpm/event/event.c	(working copy)
@@ -174,7 +174,7 @@ static int had_healthy_child = 0;
 static volatile int dying = 0;
 static volatile int workers_may_exit = 0;
 static volatile int start_thread_may_exit = 0;
-static volatile int listener_may_exit = 0;
+static volatile apr_uint32_t listener_may_exit = 0;
 static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
@@ -481,8 +481,7 @@ static void disable_listensocks(void)
 static void enable_listensocks(void)
 {
 int i;
-if (listener_may_exit
-|| apr_atomic_cas32(_disabled, 0, 1) != 1) {
+if (dying || apr_atomic_cas32(_disabled, 0, 1) != 1) {
 return;
 }
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457)
@@ -575,8 +574,7 @@ static void wakeup_listener(void)
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf,
  "wake up listener%s", listener_may_exit ? " again" : "");
 
-listener_may_exit = 1;
-disable_listensocks();
+apr_atomic_cas32(_may_exit, 1, 0);
 
 /* Unblock the listener if it's poll()ing */
 if (event_pollset && listener_is_wakeable) {
@@ -1184,12 +1182,9 @@ read_request:
 cs->pub.state = CONN_STATE_READ_REQUEST_LINE;
 goto read_request;
 }
-else if (!listener_may_exit) {
+else {
 cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE;
 }
-else {
-cs->pub.state = CONN_STATE_LINGER;
-}
 }
 
 if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) {
@@ -1256,18 +1251,21 @@ static void check_infinite_requests(void)
 }
 }
 
-static int close_listeners(int *closed)
+static int close_listeners(void)
 {
 ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf,
  "clos%s listeners (connection_count=%u)",
- *closed ? "ed" : "ing", apr_atomic_read32(_count));
-if (!*closed) {
+ dying ? "ed" : "ing", apr_atomic_read32(_count));
+if (!dying) {
 int i;
 
+dying = 1; /* once */
+
+ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf,
+ "XXX: closing");
+
 ap_close_listeners_ex(my_bucket->listeners);
-*closed = 1; /* once */
 
-dying = 1;
 ap_scoreboard_image->parent[ap_child_slot].quiescing = 1;
 for (i = 0; i < threads_per_child; ++i) {
 ap_update_child_status_from_indexes(ap_child_slot, i,
@@ -1654,8 +1652,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 proc_info *ti = dummy;
 int process_slot = ti->pslot;
 struct process_score *ps = ap_get_scoreboard_process(process_slot);
-int closed = 0;
-int have_idle_worker = 0;
+int have_idle_worker = 0, exiting = 0;
 apr_time_t last_log;
 
 last_log = apr_time_now();
@@ -1678,8 +1675,8 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 if (conns_this_child <= 0)
 check_infinite_requests();
 
-if (listener_may_exit) {
-int first_close = close_listeners();
+if (exiting) {
+int first_close = close_listeners();
 
 if (terminate_mode == ST_UNGRACEFUL
 || apr_atomic_read32(_count) == 0)
@@ -1710,7 +1707,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
  apr_atomic_read32(keepalive_q->total),
  apr_atomic_read32(_count),
  apr_atomic_read32(_count));
-if (dying) {
+if (exiting) {
 ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf,
  "%u/%u workers shutdown",
  apr_atomic_read32(_shutdown),
@@ -1792,6 +1789,13 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 }
 num = 0;
 }
+if (!exiting && apr_atomic_read32(_may_exit)) {
+

Re: [users@httpd] graceful-stop closes established connections without response

2024-01-30 Thread Yann Ylavic
On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton  wrote:
>
> i was going to add some debugging lines, but when i took a quick look at
> the patch, i wasn't clear on which sections of the code i should be
> guaranteed to hit. can you be so kind as to send an updated patch with
> some gratuitous logging in the appropriate sections so that there will
> be positive affirmation that the patch has (or hasn't) been applied and
> is falling into the expected sections?

Sure, here is a v2 (which also includes a fix w.r.t. v1).

Regards;
Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1915442)
+++ server/mpm/event/event.c	(working copy)
@@ -174,7 +174,7 @@ static int had_healthy_child = 0;
 static volatile int dying = 0;
 static volatile int workers_may_exit = 0;
 static volatile int start_thread_may_exit = 0;
-static volatile int listener_may_exit = 0;
+static volatile apr_uint32_t listener_may_exit = 0;
 static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
@@ -481,8 +481,7 @@ static void disable_listensocks(void)
 static void enable_listensocks(void)
 {
 int i;
-if (listener_may_exit
-|| apr_atomic_cas32(_disabled, 0, 1) != 1) {
+if (dying || apr_atomic_cas32(_disabled, 0, 1) != 1) {
 return;
 }
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457)
@@ -575,8 +574,7 @@ static void wakeup_listener(void)
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf,
  "wake up listener%s", listener_may_exit ? " again" : "");
 
-listener_may_exit = 1;
-disable_listensocks();
+apr_atomic_cas32(_may_exit, 1, 0);
 
 /* Unblock the listener if it's poll()ing */
 if (event_pollset && listener_is_wakeable) {
@@ -1184,12 +1182,9 @@ read_request:
 cs->pub.state = CONN_STATE_READ_REQUEST_LINE;
 goto read_request;
 }
-else if (!listener_may_exit) {
+else {
 cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE;
 }
-else {
-cs->pub.state = CONN_STATE_LINGER;
-}
 }
 
 if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) {
@@ -1256,18 +1251,21 @@ static void check_infinite_requests(void)
 }
 }
 
-static int close_listeners(int *closed)
+static int close_listeners(void)
 {
 ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf,
  "clos%s listeners (connection_count=%u)",
- *closed ? "ed" : "ing", apr_atomic_read32(_count));
-if (!*closed) {
+ dying ? "ed" : "ing", apr_atomic_read32(_count));
+if (!dying) {
 int i;
 
+dying = 1; /* once */
+
+ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf,
+ "XXX: closing");
+
 ap_close_listeners_ex(my_bucket->listeners);
-*closed = 1; /* once */
 
-dying = 1;
 ap_scoreboard_image->parent[ap_child_slot].quiescing = 1;
 for (i = 0; i < threads_per_child; ++i) {
 ap_update_child_status_from_indexes(ap_child_slot, i,
@@ -1654,8 +1652,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 proc_info *ti = dummy;
 int process_slot = ti->pslot;
 struct process_score *ps = ap_get_scoreboard_process(process_slot);
-int closed = 0;
-int have_idle_worker = 0;
+int have_idle_worker = 0, exiting = 0;
 apr_time_t last_log;
 
 last_log = apr_time_now();
@@ -1678,8 +1675,8 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 if (conns_this_child <= 0)
 check_infinite_requests();
 
-if (listener_may_exit) {
-int first_close = close_listeners();
+if (exiting) {
+int first_close = close_listeners();
 
 if (terminate_mode == ST_UNGRACEFUL
 || apr_atomic_read32(_count) == 0)
@@ -1710,7 +1707,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
  apr_atomic_read32(keepalive_q->total),
  apr_atomic_read32(_count),
  apr_atomic_read32(_count));
-if (dying) {
+if (exiting) {
 ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf,
  "%u/%u workers shutdown",
  apr_atomic_read32(_shutdown),
@@ -1792,6 +1789,10 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 }
 num = 0;
 }
+if (!exiting && apr_atomic_read32(_may_exit)) {
+ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf,
+ "XXX: may exit (%d, %d)", rc, num);
+}
 
 if (APLOGtrace7(ap_server_conf)) {
 now = 

Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Sherrard Burton




On 1/29/24 12:25 PM, Yann Ylavic wrote:


That's where we are, I think, if this first/light patch eventually
helps significantly with the "local" graceful-stop which you care
about still, it's possibly worth it since it requires no opt-in (but
needs testing..), but going further looks overkill/risky for httpd.


i (theoretically) applied your patch to the debian source and rebuilt 
and installed the packages, but the behavior still persists. it has been 
a long time since i have had to patch debian source (it was apache back 
then as well :-) thundering herd patch IIRC) so i'm not 100% sure that 
i've gotten it right.


mod_mpm_event.so does indeed have an updated modtime

-rw-r--r-- 1 root root 75984 Jan 30 00:13 
/usr/lib/apache2/modules/mod_mpm_event.so


vs

-rw-r--r-- 1 root root 75984 Apr 13  2023 
/usr/lib/apache2/modules/mod_mpm_event.so



but the identical file size makes me wonder whether i was successful in 
applying the patch.


i was going to add some debugging lines, but when i took a quick look at 
the patch, i wasn't clear on which sections of the code i should be 
guaranteed to hit. can you be so kind as to send an updated patch with 
some gratuitous logging in the appropriate sections so that there will 
be positive affirmation that the patch has (or hasn't) been applied and 
is falling into the expected sections?


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Yann Ylavic
On Mon, Jan 29, 2024 at 4:59 PM Sherrard Burton  wrote:
>
> On 1/29/24 10:17 AM, Yann Ylavic wrote:
> > On Mon, Jan 29, 2024 at 3:06 PM Eric Covener  wrote:
> >
> > The patch helps in this case because we no longer close the listening
> > sockets unconditionally, I mean without first checking if there are
> > new connections in the backlog. So I thought the option was needed
> > because if nothing stops new connections from arriving it could
> > prevent the child from stopping indefinitely? How could we know if a
> > LB/VIP is in place?
>
> it sounds like this issue is all but resolved, but i would like to
> understand whether the above (preventing the child from stopping
> indefinitely) is an actual possibility.
>
> my (naive) expectation is that if a given child has been signaled while
> handling an existing request then it "knows" not to accept() a new
> request after completing the existing request. so it seems that the
> child is not under any danger of continuing indefinitely, regardless of
> the contents of the backlog.

Yes, a stopping child won't accept any new connection currently in
httpd, but this is what I proposed to change: each child continues to
accept new connections after the graceful signal, until there is
nothing to accept anymore. Though this needs an opt-in obviously.
Sorry for the confusion because this is not what the patch I initially
proposed is doing either, the patch simply allows for one more try at
emptying the backlog after the signal was received, so it won't by
itself prevent the child from stopping, but it might (likely) not be
enough if resets don't happen mainly because of some bad timing in the
listener thread (which this patch addresses, only).
So before we go to the opt-in, as Eric said, we might as well consider
that since it's not fully addressable in httpd anyway (without races),
we'd rather let this be handled outside httpd (better/fully).
That's where we are, I think, if this first/light patch eventually
helps significantly with the "local" graceful-stop which you care
about still, it's possibly worth it since it requires no opt-in (but
needs testing..), but going further looks overkill/risky for httpd.

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Sherrard Burton




On 1/29/24 10:17 AM, Yann Ylavic wrote:

On Mon, Jan 29, 2024 at 3:06 PM Eric Covener  wrote:

The patch helps in this case because we no longer close the listening
sockets unconditionally, I mean without first checking if there are
new connections in the backlog. So I thought the option was needed
because if nothing stops new connections from arriving it could
prevent the child from stopping indefinitely? How could we know if a
LB/VIP is in place?


it sounds like this issue is all but resolved, but i would like to 
understand whether the above (preventing the child from stopping 
indefinitely) is an actual possibility.


my (naive) expectation is that if a given child has been signaled while 
handling an existing request then it "knows" not to accept() a new 
request after completing the existing request. so it seems that the 
child is not under any danger of continuing indefinitely, regardless of 
the contents of the backlog.


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Sherrard Burton




On 1/29/24 9:05 AM, Eric Covener wrote:

Maybe I wasn't clear enough but this patch makes sense only if there
is something in place that prevents new connections from arriving at
the stopping httpd children processes (like a frontend/load-balancer
or a tcp/bpf filter), otherwise they may never really stop which does
not help for a graceful stop/restart obviously. So this change (if
useful) should be guarded by a GracefulDrain on/off or something
config option to not hurt the other use cases.


Thanks Yann!

It seems to me If there is no such LB/VIP that stops new connections
from landing on this server, the new option should be avoided.
But if there is such a LB/VIP, the option is not really needed.  Is it fair?


indeed. i tried to convey this in my most-recent response, but i think 
you put it much more clearly.


we have essentially starting removing servers from the backend pool 
before issuing the graceful-stop. _when_ this is feasible, it does 
obviate the need for any changes at the httpd level. but it still leaves 
us with the possibility of resetting established connections during 
unplanned shutdown, such as an unexpected host reboot.


but, as indicated previously, i think i am satisfied with this approach 
after gaining understanding the underlying mechanisms.


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Sherrard Burton




On 1/29/24 8:59 AM, Yann Ylavic wrote:


Maybe I wasn't clear enough but this patch makes sense only if there
is something in place that prevents new connections from arriving at
the stopping httpd children processes (like a frontend/load-balancer
or a tcp/bpf filter), otherwise they may never really stop which does
not help for a graceful stop/restart obviously. So this change (if
useful) should be guarded by a GracefulDrain on/off or something
config option to not hurt the other use cases.



Yann,
thank you for the feedback and the thorough explanation (i hadn't 
thought about the OS-level backlog). after discovery of this problem i 
began removing the target server from the backend pool in the 
load-balancer before issuing the graceful-stop. this seems to work as 
expected for planned maintenance, but the problem can still occur if the 
backend server ends up needing to be stopped outside of my direct command.


we know that we cannot guard against all situations, but the systemd 
unit files on debian seem to trigger graceful-stop for its 'stop' action


ExecStop=/usr/sbin/apachectl graceful-stop

so if indeed this were addressable within httpd (which it sounds like it 
is not) then we would theoretically be protected from all but unexpected 
hard crashes in httpd or at the OS level.


my takeaway at this point, not having yet finished reading Eric's other 
responses on this thread, is that this mostly "is what it is", and i 
think that i am (provisionally) satisfied with knowing that the behavior 
that i am observing is not a bug, per-se.



ps--is there any possibility of "shrinking" this window by tuning either 
ListenBacklog or any analogous settings at the OS level? or do i risk 
creating a different problem?


i ask because theoretically, the backlog/queuing (from the client 
perspective) should mostly be handled by the load-balancer, which is 
currently L4, not L7. so if reducing/eliminating the backlog on the 
server might positively impact this situation then i may spend some time 
on it.


thanks again

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Yann Ylavic
On Mon, Jan 29, 2024 at 4:21 PM Eric Covener  wrote:
>
> > > It seems to me If there is no such LB/VIP that stops new connections
> > > from landing on this server, the new option should be avoided.
> >
> > Correct.
> >
> > > But if there is such a LB/VIP, the option is not really needed.  Is it 
> > > fair?
> >
> > The patch helps in this case because we no longer close the listening
> > sockets unconditionally, I mean without first checking if there are
> > new connections in the backlog. So I thought the option was needed
> > because if nothing stops new connections from arriving it could
> > prevent the child from stopping indefinitely? How could we know if a
> > LB/VIP is in place?
>
> I mean the initial patch vs. the status quo, not just the opt-in part.

Even if there is a LB that stops routing new connections to the
stopping httpd we might kill the ones that are in the backlog already.
But yes I suppose that the switch on the LB could precede the
graceful-stop by a few seconds to let httpd drain the backlog
normally, in any case the race is hardly addressable fully in httpd so
we might consider doing nothing to minimize it too, that's fair enough
:)

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Eric Covener
> > It seems to me If there is no such LB/VIP that stops new connections
> > from landing on this server, the new option should be avoided.
>
> Correct.
>
> > But if there is such a LB/VIP, the option is not really needed.  Is it fair?
>
> The patch helps in this case because we no longer close the listening
> sockets unconditionally, I mean without first checking if there are
> new connections in the backlog. So I thought the option was needed
> because if nothing stops new connections from arriving it could
> prevent the child from stopping indefinitely? How could we know if a
> LB/VIP is in place?

I mean the initial patch vs. the status quo, not just the opt-in part.

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Yann Ylavic
On Mon, Jan 29, 2024 at 3:06 PM Eric Covener  wrote:
>
> > Maybe I wasn't clear enough but this patch makes sense only if there
> > is something in place that prevents new connections from arriving at
> > the stopping httpd children processes (like a frontend/load-balancer
> > or a tcp/bpf filter), otherwise they may never really stop which does
> > not help for a graceful stop/restart obviously. So this change (if
> > useful) should be guarded by a GracefulDrain on/off or something
> > config option to not hurt the other use cases.
>
> Thanks Yann!
>
> It seems to me If there is no such LB/VIP that stops new connections
> from landing on this server, the new option should be avoided.

Correct.

> But if there is such a LB/VIP, the option is not really needed.  Is it fair?

The patch helps in this case because we no longer close the listening
sockets unconditionally, I mean without first checking if there are
new connections in the backlog. So I thought the option was needed
because if nothing stops new connections from arriving it could
prevent the child from stopping indefinitely? How could we know if a
LB/VIP is in place?

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Eric Covener
> Maybe I wasn't clear enough but this patch makes sense only if there
> is something in place that prevents new connections from arriving at
> the stopping httpd children processes (like a frontend/load-balancer
> or a tcp/bpf filter), otherwise they may never really stop which does
> not help for a graceful stop/restart obviously. So this change (if
> useful) should be guarded by a GracefulDrain on/off or something
> config option to not hurt the other use cases.

Thanks Yann!

It seems to me If there is no such LB/VIP that stops new connections
from landing on this server, the new option should be avoided.
But if there is such a LB/VIP, the option is not really needed.  Is it fair?

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Yann Ylavic
On Mon, Jan 29, 2024 at 2:23 PM Yann Ylavic  wrote:
>
> On Sun, Jan 28, 2024 at 5:26 AM Sherrard Burton  wrote:
> >
> > On 1/27/24 09:46 PM, Eric Covener wrote:
> > >
> > > Both worker and event MPMs have a dedicated listener thread per child
> > > process, so it will close those copies of the listening sockets much
> > > more quickly.
> >
> > so that i am clear, are you saying that this behavior is still possible,
> > although less likely under the worker and event MPMs?
>
> I think it's possible regardless of the MPM, and there is quite little
> a server can do about it without the help of the system or some
> tcp/bpf filter (something that prepares the graceful shutdown at the
> system level to prevent the 3-way handshake from completing).
> This is because when the connections are ready to be accept()ed (i.e.
> in the listening socket's backlog), they are already fully established
> and likely contain the request data (at least partly), the system has
> done this underneath httpd already.
> So if/when httpd closes its listening socket(s) all the connections in
> the backlog(s) are lost/reset anyway, and there is always going to be
> a race condition with the draining of the backlog if nothing stops new
> connections from being established at the system level.
>
> To minimize the race condition maybe httpd can do better at trying to
> drain the backlog before closing the listeners. Does the attached
> patch help for instance (it's against mpm_event 2.4.x)?
> But I don't think it can be fully solved at httpd level anyway, with
> this change the effective stop could be longer (so long as there are
> incoming/pending connections routed to each child by the system), it
> could even last forever theoretically if connections keep coming
> indefinitely..

Maybe I wasn't clear enough but this patch makes sense only if there
is something in place that prevents new connections from arriving at
the stopping httpd children processes (like a frontend/load-balancer
or a tcp/bpf filter), otherwise they may never really stop which does
not help for a graceful stop/restart obviously. So this change (if
useful) should be guarded by a GracefulDrain on/off or something
config option to not hurt the other use cases.

>
> Regards;
> Yann.

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-29 Thread Yann Ylavic
On Sun, Jan 28, 2024 at 5:26 AM Sherrard Burton  wrote:
>
> On 1/27/24 09:46 PM, Eric Covener wrote:
> >
> > Both worker and event MPMs have a dedicated listener thread per child
> > process, so it will close those copies of the listening sockets much
> > more quickly.
>
> so that i am clear, are you saying that this behavior is still possible,
> although less likely under the worker and event MPMs?

I think it's possible regardless of the MPM, and there is quite little
a server can do about it without the help of the system or some
tcp/bpf filter (something that prepares the graceful shutdown at the
system level to prevent the 3-way handshake from completing).
This is because when the connections are ready to be accept()ed (i.e.
in the listening socket's backlog), they are already fully established
and likely contain the request data (at least partly), the system has
done this underneath httpd already.
So if/when httpd closes its listening socket(s) all the connections in
the backlog(s) are lost/reset anyway, and there is always going to be
a race condition with the draining of the backlog if nothing stops new
connections from being established at the system level.

To minimize the race condition maybe httpd can do better at trying to
drain the backlog before closing the listeners. Does the attached
patch help for instance (it's against mpm_event 2.4.x)?
But I don't think it can be fully solved at httpd level anyway, with
this change the effective stop could be longer (so long as there are
incoming/pending connections routed to each child by the system), it
could even last forever theoretically if connections keep coming
indefinitely..

Regards;
Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1915442)
+++ server/mpm/event/event.c	(working copy)
@@ -174,7 +174,7 @@ static int had_healthy_child = 0;
 static volatile int dying = 0;
 static volatile int workers_may_exit = 0;
 static volatile int start_thread_may_exit = 0;
-static volatile int listener_may_exit = 0;
+static volatile apr_uint32_t listener_may_exit = 0;
 static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
@@ -481,8 +481,7 @@ static void disable_listensocks(void)
 static void enable_listensocks(void)
 {
 int i;
-if (listener_may_exit
-|| apr_atomic_cas32(_disabled, 0, 1) != 1) {
+if (dying || apr_atomic_cas32(_disabled, 0, 1) != 1) {
 return;
 }
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457)
@@ -575,8 +574,7 @@ static void wakeup_listener(void)
 ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf,
  "wake up listener%s", listener_may_exit ? " again" : "");
 
-listener_may_exit = 1;
-disable_listensocks();
+apr_atomic_cas32(_may_exit, 1, 0);
 
 /* Unblock the listener if it's poll()ing */
 if (event_pollset && listener_is_wakeable) {
@@ -1184,12 +1182,9 @@ read_request:
 cs->pub.state = CONN_STATE_READ_REQUEST_LINE;
 goto read_request;
 }
-else if (!listener_may_exit) {
+else {
 cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE;
 }
-else {
-cs->pub.state = CONN_STATE_LINGER;
-}
 }
 
 if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) {
@@ -1654,7 +1649,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 proc_info *ti = dummy;
 int process_slot = ti->pslot;
 struct process_score *ps = ap_get_scoreboard_process(process_slot);
-int closed = 0;
+int may_exit = 0, closed = 0;
 int have_idle_worker = 0;
 apr_time_t last_log;
 
@@ -1678,7 +1673,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 if (conns_this_child <= 0)
 check_infinite_requests();
 
-if (listener_may_exit) {
+if (may_exit) {
 int first_close = close_listeners();
 
 if (terminate_mode == ST_UNGRACEFUL
@@ -1899,7 +1894,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
  "Idle workers: %u",
  ap_queue_info_num_idlers(worker_queue_info));
 }
-else if (!listener_may_exit) {
+else {
 void *csd = NULL;
 ap_listen_rec *lr = (ap_listen_rec *) pt->baton;
 apr_pool_t *ptrans; /* Pool for per-transaction stuff */
@@ -1960,6 +1955,14 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 }   /* if:else on pt->type */
 } /* for processing poll */
 
+/* On graceful shutdown/stop we can close the listening sockets
+ * since the backlog should be drained now.
+ */
+if 

Re: [users@httpd] graceful-stop closes established connections without response

2024-01-27 Thread Sherrard Burton

Eric,
thanks for the quick reply. follow-up inline below:

On 1/27/24 09:46 PM, Eric Covener wrote:

apache2: 2.4.56-1~deb11u2, prefork MPM, mod_perl


I think it's a large window on prefork where this can happen.  If any
process is busy processing a request, it cannot close its copy of the
listening socket. The OS will continue to complete TCP connections and
acknowledge (some) data with nobody calling accept().  When the last
of the listening sockets is finally closed, the TCP connections that
arrived in this timeframe will be abruptly closed.


is this window essentially the time between when the parent process 
calls close() on its copy of the listening socket and when the last of 
the child processes is handling its request and can call close() on the 
last remaining copy of the listening socket? if so, then it sounds like 
the window becomes wider and this situation becomes more likely if the 
children are handling longer-running requests.





Both worker and event MPMs have a dedicated listener thread per child
process, so it will close those copies of the listening sockets much
more quickly.


so that i am clear, are you saying that this behavior is still possible, 
although less likely under the worker and event MPMs?


NVM, actually, it appears that i may have already answered this question 
by not paying enough attention to detail when setting up my test 
environment. i unintentionally replicated the behavior under the event 
MPM when my intention was to use prefork for the sake of consistency.


but i would still like to better understand "how much" the window is 
reduced under worker and event. for example, would they be 
similarly-effected by a child that is still handling a long-running 
request, or would the listener thread be able to close() its copy of the 
socket while the handling thread is still serving content to the client?


thanks again

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] graceful-stop closes established connections without response

2024-01-27 Thread Eric Covener
> apache2: 2.4.56-1~deb11u2, prefork MPM, mod_perl

I think it's a large window on prefork where this can happen.  If any
process is busy processing a request, it cannot close its copy of the
listening socket. The OS will continue to complete TCP connections and
acknowledge (some) data with nobody calling accept().  When the last
of the listening sockets is finally closed, the TCP connections that
arrived in this timeframe will be abruptly closed.

Both worker and event MPMs have a dedicated listener thread per child
process, so it will close those copies of the listening sockets much
more quickly.

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org