Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread ste...@eissing.org



> Am 26.08.2021 um 16:44 schrieb Joe Orton :
> 
> On Thu, Aug 26, 2021 at 01:11:16PM +0200, ste...@eissing.org wrote:
>> Joe: with r1892615 I fixed a bug in waiting for the server to become
>> unresponsive after a stop. This *should* make timed waits unnecessary
>> in the http2 test suite.
> 
> Great stuff, thanks for all your work Stefan.  
> 
> It worked locally and once in Travis, merged to trunk in r1892620.

\o/ Now, when I break things, everyone will see it...wait!


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread Joe Orton
On Thu, Aug 26, 2021 at 01:11:16PM +0200, ste...@eissing.org wrote:
> Joe: with r1892615 I fixed a bug in waiting for the server to become
> unresponsive after a stop. This *should* make timed waits unnecessary
> in the http2 test suite.

Great stuff, thanks for all your work Stefan.  

It worked locally and once in Travis, merged to trunk in r1892620.

Regards, Joe



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread ste...@eissing.org
Joe: with r1892615 I fixed a bug in waiting for the server to become
unresponsive after a stop. This *should* make timed waits unnecessary
in the http2 test suite.

- Stefan

> Am 26.08.2021 um 10:43 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 26.08.2021 um 10:27 schrieb Ruediger Pluem :
>> 
>> 
>> 
>> On 8/26/21 10:21 AM, ste...@eissing.org wrote:
>>> 
>>> 
 Am 26.08.2021 um 10:16 schrieb Joe Orton :
 
 On Thu, Aug 26, 2021 at 09:51:07AM +0200, ste...@eissing.org wrote:
> https://app.travis-ci.com/github/apache/httpd
> 
>>assert env.apache_restart() == 0
> 3026E   assert 1 == 0
> 3027E+  where 1 =  >()
> 3028E+where  > =  at 0x7ffb03366a60>.apache_restart
> 3029
> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
> 3031 Captured stderr setup 
> -
> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: 
> AH00072: make_sock: could not bind to address [::]:40001
> 3033(98)Address already in use: AH00072: make_sock: could not bind to 
> address 0.0.0.0:40001
> 3034AH03272: no listening sockets available, shutting down
> 3035
> 
> Looks like the stop/start cycle instead of reload is not timing safe. 
> If httpd is no longer responding, it does not mean that the ports are 
> available again for the new process. Reloads are just better here, as 
> wait loops can be avoided. Hmmthinking...
 
 Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2) 
 after running apachectl, that was sufficient for me when I was testing 
 locally before.
>>> 
>>> Can we add a configuration directive for this server behaviour? Like
>>> 
>>> AcceptOnGraceful on|off
>>> 
>>> So that people who do not want the server to accept new connections
>>> during graceful reloads can influence this?
>> 
>> What about a graceful stop first and a start afterwards?
> 
> Just tested: "graceful-stop" seems to make it worse. Connections are 
> being served using the old configuration.
> 
>> 
>> Regards
>> 
>> Rüdiger



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread ste...@eissing.org



> Am 26.08.2021 um 10:27 schrieb Ruediger Pluem :
> 
> 
> 
> On 8/26/21 10:21 AM, ste...@eissing.org wrote:
>> 
>> 
>>> Am 26.08.2021 um 10:16 schrieb Joe Orton :
>>> 
>>> On Thu, Aug 26, 2021 at 09:51:07AM +0200, ste...@eissing.org wrote:
 https://app.travis-ci.com/github/apache/httpd
 
> assert env.apache_restart() == 0
 3026E   assert 1 == 0
 3027E+  where 1 = >>> >()
 3028E+where >>> > = >>> 0x7ffb03366a60>.apache_restart
 3029
 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
 3031 Captured stderr setup 
 -
 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: 
 AH00072: make_sock: could not bind to address [::]:40001
 3033(98)Address already in use: AH00072: make_sock: could not bind to 
 address 0.0.0.0:40001
 3034AH03272: no listening sockets available, shutting down
 3035
 
 Looks like the stop/start cycle instead of reload is not timing safe. 
 If httpd is no longer responding, it does not mean that the ports are 
 available again for the new process. Reloads are just better here, as 
 wait loops can be avoided. Hmmthinking...
>>> 
>>> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2) 
>>> after running apachectl, that was sufficient for me when I was testing 
>>> locally before.
>> 
>> Can we add a configuration directive for this server behaviour? Like
>> 
>> AcceptOnGraceful on|off
>> 
>> So that people who do not want the server to accept new connections
>> during graceful reloads can influence this?
> 
> What about a graceful stop first and a start afterwards?

Just tested: "graceful-stop" seems to make it worse. Connections are 
being served using the old configuration.

> 
> Regards
> 
> Rüdiger
> 



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread Ruediger Pluem



On 8/26/21 10:21 AM, ste...@eissing.org wrote:
> 
> 
>> Am 26.08.2021 um 10:16 schrieb Joe Orton :
>>
>> On Thu, Aug 26, 2021 at 09:51:07AM +0200, ste...@eissing.org wrote:
>>> https://app.travis-ci.com/github/apache/httpd
>>>
  assert env.apache_restart() == 0
>>> 3026E   assert 1 == 0
>>> 3027E+  where 1 = >> >()
>>> 3028E+where >> > = >> 0x7ffb03366a60>.apache_restart
>>> 3029
>>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>>> 3031 Captured stderr setup 
>>> -
>>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: 
>>> make_sock: could not bind to address [::]:40001
>>> 3033(98)Address already in use: AH00072: make_sock: could not bind to 
>>> address 0.0.0.0:40001
>>> 3034AH03272: no listening sockets available, shutting down
>>> 3035
>>>
>>> Looks like the stop/start cycle instead of reload is not timing safe. 
>>> If httpd is no longer responding, it does not mean that the ports are 
>>> available again for the new process. Reloads are just better here, as 
>>> wait loops can be avoided. Hmmthinking...
>>
>> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2) 
>> after running apachectl, that was sufficient for me when I was testing 
>> locally before.
> 
> Can we add a configuration directive for this server behaviour? Like
> 
> AcceptOnGraceful on|off
> 
> So that people who do not want the server to accept new connections
> during graceful reloads can influence this?

What about a graceful stop first and a start afterwards?

Regards

Rüdiger



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread ste...@eissing.org



> Am 26.08.2021 um 10:16 schrieb Joe Orton :
> 
> On Thu, Aug 26, 2021 at 09:51:07AM +0200, ste...@eissing.org wrote:
>> https://app.travis-ci.com/github/apache/httpd
>> 
>>>  assert env.apache_restart() == 0
>> 3026E   assert 1 == 0
>> 3027E+  where 1 = > >()
>> 3028E+where > > = > 0x7ffb03366a60>.apache_restart
>> 3029
>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>> 3031 Captured stderr setup 
>> -
>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: 
>> make_sock: could not bind to address [::]:40001
>> 3033(98)Address already in use: AH00072: make_sock: could not bind to 
>> address 0.0.0.0:40001
>> 3034AH03272: no listening sockets available, shutting down
>> 3035
>> 
>> Looks like the stop/start cycle instead of reload is not timing safe. 
>> If httpd is no longer responding, it does not mean that the ports are 
>> available again for the new process. Reloads are just better here, as 
>> wait loops can be avoided. Hmmthinking...
> 
> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2) 
> after running apachectl, that was sufficient for me when I was testing 
> locally before.

Can we add a configuration directive for this server behaviour? Like

AcceptOnGraceful on|off

So that people who do not want the server to accept new connections
during graceful reloads can influence this?

- Stefan

Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread Joe Orton
On Thu, Aug 26, 2021 at 09:51:07AM +0200, ste...@eissing.org wrote:
> https://app.travis-ci.com/github/apache/httpd
> 
> >   assert env.apache_restart() == 0
> 3026E   assert 1 == 0
> 3027E+  where 1 =  >()
> 3028E+where  > =  0x7ffb03366a60>.apache_restart
> 3029
> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
> 3031 Captured stderr setup 
> -
> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: 
> make_sock: could not bind to address [::]:40001
> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 
> 0.0.0.0:40001
> 3034AH03272: no listening sockets available, shutting down
> 3035
> 
> Looks like the stop/start cycle instead of reload is not timing safe. 
> If httpd is no longer responding, it does not mean that the ports are 
> available again for the new process. Reloads are just better here, as 
> wait loops can be avoided. Hmmthinking...

Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2) 
after running apachectl, that was sufficient for me when I was testing 
locally before.

Regards, Joe



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-26 Thread ste...@eissing.org



> Am 25.08.2021 um 17:34 schrieb Joe Orton :
> 
> On Wed, Aug 25, 2021 at 05:08:06PM +0200, ste...@eissing.org wrote:
>> 
>> 
>>> Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
>>> 
>>> Thanks, looks good, it probably needs a backport to 2.4.x since the
>>> first travis failure was there (Build #1819).
>>> Let's see how it works after several runs before maybe.
>> 
>> +1
>> 
>> Let's see what Joe and Travis can do to wreck it.
> 
> Well I could accidentally cancel your builds, how's that for starters?
> 
> (I restarted the most recent one, was trying to cancel the older PR 
> build, but hit the wrong button on the wrong page, sorry!)
> 
> Nice job getting this one fixed.
> 
> Regards, Joe
> 
> p.s. did you forget the APLOGNO()?
> p.p.s just kidding
> 

https://app.travis-ci.com/github/apache/httpd

>   assert env.apache_restart() == 0
3026E   assert 1 == 0
3027E+  where 1 = >()
3028E+where > = .apache_restart
3029
3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
3031 Captured stderr setup 
-
3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: 
make_sock: could not bind to address [::]:40001
3033(98)Address already in use: AH00072: make_sock: could not bind to address 
0.0.0.0:40001
3034AH03272: no listening sockets available, shutting down
3035

Looks like the stop/start cycle instead of reload is not timing safe. If httpd 
is no longer responding, it does not mean that the ports are available again 
for the new process. Reloads are just better here, as wait loops can be 
avoided. Hmmthinking...



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 17:34 schrieb Joe Orton :
> 
> On Wed, Aug 25, 2021 at 05:08:06PM +0200, ste...@eissing.org wrote:
>> 
>> 
>>> Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
>>> 
>>> Thanks, looks good, it probably needs a backport to 2.4.x since the
>>> first travis failure was there (Build #1819).
>>> Let's see how it works after several runs before maybe.
>> 
>> +1
>> 
>> Let's see what Joe and Travis can do to wreck it.
> 
> Well I could accidentally cancel your builds, how's that for starters?
> 
> (I restarted the most recent one, was trying to cancel the older PR 
> build, but hit the wrong button on the wrong page, sorry!)
> 
> Nice job getting this one fixed.
> 
> Regards, Joe
> 
> p.s. did you forget the APLOGNO()?
> p.p.s just kidding

I see that we are getting into the right spirit here! ;-)


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Joe Orton
On Wed, Aug 25, 2021 at 05:08:06PM +0200, ste...@eissing.org wrote:
> 
> 
> > Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
> > 
> > Thanks, looks good, it probably needs a backport to 2.4.x since the
> > first travis failure was there (Build #1819).
> > Let's see how it works after several runs before maybe.
> 
> +1
> 
> Let's see what Joe and Travis can do to wreck it.

Well I could accidentally cancel your builds, how's that for starters?

(I restarted the most recent one, was trying to cancel the older PR 
build, but hit the wrong button on the wrong page, sorry!)

Nice job getting this one fixed.

Regards, Joe

p.s. did you forget the APLOGNO()?
p.p.s just kidding



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
> 
> Thanks, looks good, it probably needs a backport to 2.4.x since the
> first travis failure was there (Build #1819).
> Let's see how it works after several runs before maybe.

+1

Let's see what Joe and Travis can do to wreck it.

> On Wed, Aug 25, 2021 at 5:02 PM ste...@eissing.org  wrote:
>> 
>> It should be fixed now with r1892599, I believe.
>> 
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
 
 apache / httpd
 
 trunk
 
 Build #1831 was broken
 21 mins and 33 secs
 Yann Ylavic
 243c5fa CHANGESET →
 
 mpm_{event,worker,prefork}: late stop of children processes on restart.
>>> 
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>> 
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>> 
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>> 
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>> 
>>> A better fix maybe, Stefan?
>>> 
>>> Cheers;
>>> Yann.
>> 



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
Thanks, looks good, it probably needs a backport to 2.4.x since the
first travis failure was there (Build #1819).
Let's see how it works after several runs before maybe.

On Wed, Aug 25, 2021 at 5:02 PM ste...@eissing.org  wrote:
>
> It should be fixed now with r1892599, I believe.
>
> > Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> >
> > On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
> >>
> >> apache / httpd
> >>
> >> trunk
> >>
> >> Build #1831 was broken
> >> 21 mins and 33 secs
> >> Yann Ylavic
> >> 243c5fa CHANGESET →
> >>
> >> mpm_{event,worker,prefork}: late stop of children processes on restart.
> >
> > (unrelated to this change, Build #1819 failed the same earlier).
> >
> > Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> >
> > It seems that when exiting a stream can be destroyed while in
> >  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> > when the mplx lock is released (thus stream->id faults).
> >
> > Should the caller(s) of mst_check_data_for() pass stream->id (under
> > the lock) instead?
> > This would avoid the fault but that's still a potentially destroyed
> > stream being fifo'ed, though we are exiting so it might not matter..
> >
> > A better fix maybe, Stefan?
> >
> > Cheers;
> > Yann.
>


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org
It should be fixed now with r1892599, I believe.

> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>> 
>> apache / httpd
>> 
>> trunk
>> 
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET →
>> 
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
> 
> (unrelated to this change, Build #1819 failed the same earlier).
> 
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> 
> It seems that when exiting a stream can be destroyed while in
>  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
> 
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
> 
> A better fix maybe, Stefan?
> 
> Cheers;
> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org
Ok, brute-forced it a bit. The test suite now always does
a stop/start cycle on config changes as reload now longer
works reliably. Takes a bit more time, but not overly so
on my machine.

Cheers,
Stefan

> Am 25.08.2021 um 16:22 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 16:16 schrieb Yann Ylavic :
>> 
>> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>>> 
>>> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  
>>> wrote:
 
 I could add that in the test config as header to a certain location.
 
 However, will all new connections get to new gen children once the
 new number has been spotted?
>>> 
>>> Yes, that's guaranteed by the MPM which increases the gen number just
>>> before creating new children (after having asked the old ones to
>>> stop), no old child will ever be created with a new generation number.
>>> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
>>> always return the one that was forked.
>> 
>> Hmm, you asked for a different question actually, the new
>> *connections* done after the gen bump are not guaranteed to be handled
>> by a new gen child, because it may take some (short) time before the
>> old gen listeners stop listening.
>> There is no way to guarantee where connections will go in the first
>> times of a restart I'm afraid.
>> At least your test client can know whether it's handled by an old or
>> new gen and drop the connection in the former case..
> 
> Thanks for confirming what I suspected. Concurrency is a bitch.
> 
> Hmm, hate to add timed waits here, as who knows how Travis will
> work with those. Will experiment a bit.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 16:16 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>> 
>> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  
>> wrote:
>>> 
>>> I could add that in the test config as header to a certain location.
>>> 
>>> However, will all new connections get to new gen children once the
>>> new number has been spotted?
>> 
>> Yes, that's guaranteed by the MPM which increases the gen number just
>> before creating new children (after having asked the old ones to
>> stop), no old child will ever be created with a new generation number.
>> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
>> always return the one that was forked.
> 
> Hmm, you asked for a different question actually, the new
> *connections* done after the gen bump are not guaranteed to be handled
> by a new gen child, because it may take some (short) time before the
> old gen listeners stop listening.
> There is no way to guarantee where connections will go in the first
> times of a restart I'm afraid.
> At least your test client can know whether it's handled by an old or
> new gen and drop the connection in the former case..

Thanks for confirming what I suspected. Concurrency is a bitch.

Hmm, hate to add timed waits here, as who knows how Travis will
work with those. Will experiment a bit.

Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>
> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
> >
> > I could add that in the test config as header to a certain location.
> >
> > However, will all new connections get to new gen children once the
> > new number has been spotted?
>
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
> always return the one that was forked.

Hmm, you asked for a different question actually, the new
*connections* done after the gen bump are not guaranteed to be handled
by a new gen child, because it may take some (short) time before the
old gen listeners stop listening.
There is no way to guarantee where connections will go in the first
times of a restart I'm afraid.
At least your test client can know whether it's handled by an old or
new gen and drop the connection in the former case..


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 16:04 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
>> 
>> 
>> 
>>> Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  
>>> wrote:
 
 Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
 test suite, it works fine.
 
 Speculating: my test suite does commonly a GET after a reload to
 see if the reloaded server is responding. And then fires requests
 against the new configuration.
 
 Now, if the server is always responding, the test sill get - sometimes -
 send request to an old gen child which then answers incorrectly. So
 I see 404 responses where a 200 should be as the config does not match.
 
 In this new world of always responding servers, when is a test suite
 supposed to know that now the server will respond with the newly laoded
 config?
>>> 
>>> Maybe we could have access to the generation number
>>> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
>>> mod_headers could use to set in a response header?
>>> Or a test only module which does that specifically could be compiled
>>> by the framework and LoadModule'd?
>> 
>> I could add that in the test config as header to a certain location.
>> 
>> However, will all new connections get to new gen children once the
>> new number has been spotted?
> 
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
> always return the one that was forked.

Thanks, I'll give this a shot. 

Joe: as long as that is not solved, the tests will not be reliable, I'm afraid.

- Stefan

Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
>
>
>
> > Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
> >
> > On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  
> > wrote:
> >>
> >> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> >> test suite, it works fine.
> >>
> >> Speculating: my test suite does commonly a GET after a reload to
> >> see if the reloaded server is responding. And then fires requests
> >> against the new configuration.
> >>
> >> Now, if the server is always responding, the test sill get - sometimes -
> >> send request to an old gen child which then answers incorrectly. So
> >> I see 404 responses where a 200 should be as the config does not match.
> >>
> >> In this new world of always responding servers, when is a test suite
> >> supposed to know that now the server will respond with the newly laoded
> >> config?
> >
> > Maybe we could have access to the generation number
> > (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> > mod_headers could use to set in a response header?
> > Or a test only module which does that specifically could be compiled
> > by the framework and LoadModule'd?
>
> I could add that in the test config as header to a certain location.
>
> However, will all new connections get to new gen children once the
> new number has been spotted?

Yes, that's guaranteed by the MPM which increases the gen number just
before creating new children (after having asked the old ones to
stop), no old child will ever be created with a new generation number.
ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
always return the one that was forked.


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  wrote:
>> 
>> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
>> test suite, it works fine.
>> 
>> Speculating: my test suite does commonly a GET after a reload to
>> see if the reloaded server is responding. And then fires requests
>> against the new configuration.
>> 
>> Now, if the server is always responding, the test sill get - sometimes -
>> send request to an old gen child which then answers incorrectly. So
>> I see 404 responses where a 200 should be as the config does not match.
>> 
>> In this new world of always responding servers, when is a test suite
>> supposed to know that now the server will respond with the newly laoded
>> config?
> 
> Maybe we could have access to the generation number
> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> mod_headers could use to set in a response header?
> Or a test only module which does that specifically could be compiled
> by the framework and LoadModule'd?

I could add that in the test config as header to a certain location.

However, will all new connections get to new gen children once the
new number has been spotted?

> 
> Cheers;
> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  wrote:
>
> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> test suite, it works fine.
>
> Speculating: my test suite does commonly a GET after a reload to
> see if the reloaded server is responding. And then fires requests
> against the new configuration.
>
> Now, if the server is always responding, the test sill get - sometimes -
> send request to an old gen child which then answers incorrectly. So
> I see 404 responses where a 200 should be as the config does not match.
>
> In this new world of always responding servers, when is a test suite
> supposed to know that now the server will respond with the newly laoded
> config?

Maybe we could have access to the generation number
(AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
mod_headers could use to set in a response header?
Or a test only module which does that specifically could be compiled
by the framework and LoadModule'd?

Cheers;
Yann.


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:15 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 15:06 schrieb ste...@eissing.org:
>> 
>> 
>> 
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
 
 apache / httpd
 
 trunk
 
 Build #1831 was broken
 21 mins and 33 secs
 Yann Ylavic
 243c5fa CHANGESET →
 
 mpm_{event,worker,prefork}: late stop of children processes on restart.
>>> 
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>> 
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>> 
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>> 
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>> 
>>> A better fix maybe, Stefan?
>> 
>> I think the "stream->id" needs to be save before giving up the lock. 
>> I'll make the change, but currently my test suite throws several 
>> other errors in trunk. Something changed...
>> 
>> Btw. I am working on a real fix of this messy code in http2, using 
>> a apr_pollset for monitoring connections and ongoing requests. Also
>> cleaning up other parts of the code, reducing complexity.
>> 
>> 
>> 
>> I'll see what version of trunk still runs my tests and report back.
> 
> 
> Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587 
> broke things. Note that the http2 test suite does a lot of server
> reloads.

Follow up: if I do a "time.sleep(2)" after each "apachectl" in my 
test suite, it works fine.

Speculating: my test suite does commonly a GET after a reload to 
see if the reloaded server is responding. And then fires requests
against the new configuration.

Now, if the server is always responding, the test sill get - sometimes -
send request to an old gen child which then answers incorrectly. So
I see 404 responses where a 200 should be as the config does not match.

In this new world of always responding servers, when is a test suite
supposed to know that now the server will respond with the newly laoded
config?

- Stefan

> 
> Stefan
> 
>> 
>> - Stefan
>> 
>>> 
>>> Cheers;
>>> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:06 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>> 
>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>>> 
>>> apache / httpd
>>> 
>>> trunk
>>> 
>>> Build #1831 was broken
>>> 21 mins and 33 secs
>>> Yann Ylavic
>>> 243c5fa CHANGESET →
>>> 
>>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>> 
>> (unrelated to this change, Build #1819 failed the same earlier).
>> 
>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>> 
>> It seems that when exiting a stream can be destroyed while in
>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>> when the mplx lock is released (thus stream->id faults).
>> 
>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>> the lock) instead?
>> This would avoid the fault but that's still a potentially destroyed
>> stream being fifo'ed, though we are exiting so it might not matter..
>> 
>> A better fix maybe, Stefan?
> 
> I think the "stream->id" needs to be save before giving up the lock. 
> I'll make the change, but currently my test suite throws several 
> other errors in trunk. Something changed...
> 
> Btw. I am working on a real fix of this messy code in http2, using 
> a apr_pollset for monitoring connections and ongoing requests. Also
> cleaning up other parts of the code, reducing complexity.
> 
> 
> 
> I'll see what version of trunk still runs my tests and report back.


Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587 
broke things. Note that the http2 test suite does a lot of server
reloads.

Stefan

> 
> - Stefan
> 
>> 
>> Cheers;
>> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>> 
>> apache / httpd
>> 
>> trunk
>> 
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET →
>> 
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
> 
> (unrelated to this change, Build #1819 failed the same earlier).
> 
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> 
> It seems that when exiting a stream can be destroyed while in
>  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
> 
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
> 
> A better fix maybe, Stefan?

I think the "stream->id" needs to be save before giving up the lock. 
I'll make the change, but currently my test suite throws several 
other errors in trunk. Something changed...

Btw. I am working on a real fix of this messy code in http2, using 
a apr_pollset for monitoring connections and ongoing requests. Also
cleaning up other parts of the code, reducing complexity.



I'll see what version of trunk still runs my tests and report back.

- Stefan

> 
> Cheers;
> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>
> apache / httpd
>
> trunk
>
> Build #1831 was broken
> 21 mins and 33 secs
> Yann Ylavic
> 243c5fa CHANGESET →
>
> mpm_{event,worker,prefork}: late stop of children processes on restart.

(unrelated to this change, Build #1819 failed the same earlier).

Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103

It seems that when exiting a stream can be destroyed while in
  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
when the mplx lock is released (thus stream->id faults).

Should the caller(s) of mst_check_data_for() pass stream->id (under
the lock) instead?
This would avoid the fault but that's still a potentially destroyed
stream being fifo'ed, though we are exiting so it might not matter..

A better fix maybe, Stefan?

Cheers;
Yann.