Re: Frequent wake-ups for mpm_event

2017-01-17 Thread Stefan Priebe

Hi Yann,

while testing V6 i'm experiencing segfaults.

exit signal Segmentation

server-error.log:
AH00052: child pid 14110 exit signal Segmentation fault (11)

currently i'm trying to grab a core dump.

Greets,
Stefan

Am 26.12.2016 um 21:18 schrieb Stefan Priebe - Profihost AG:

Am 23.12.2016 um 01:41 schrieb Yann Ylavic:

Hi Stefan,

On Tue, Dec 20, 2016 at 1:52 PM, Stefan Priebe - Profihost AG
 wrote:


Today i had another server giving no answers to any requests. apache
fullstatus did not respond.


Since v5 of the patch, I committed another related change in trunk,
namely: http://svn.apache.org/r1774538
It's about lingering keepalive connections on graceful restart which
may not cause a wakeup.
Does it help?


I'll try that but wanted to rebuild based on http 2.4.25. But your
mpm_event_listener_wakeup_bug57399_V5 patch does no longer apply to http
2.2.25. Can you rebase it?


gdb bt shows this for all httpd childs:


These backtraces are the ones of the main thread, probably not the culprit.
What does "thread apply all bt" say?


Will redo / save that output next time.

Thanks!

Greets,
Stefan


Regards,
Yann.



Re: Frequent wake-ups for mpm_event

2016-12-26 Thread Stefan Priebe - Profihost AG
Am 23.12.2016 um 01:41 schrieb Yann Ylavic:
> Hi Stefan,
> 
> On Tue, Dec 20, 2016 at 1:52 PM, Stefan Priebe - Profihost AG
>  wrote:
>>
>> Today i had another server giving no answers to any requests. apache
>> fullstatus did not respond.
> 
> Since v5 of the patch, I committed another related change in trunk,
> namely: http://svn.apache.org/r1774538
> It's about lingering keepalive connections on graceful restart which
> may not cause a wakeup.
> Does it help?

I'll try that but wanted to rebuild based on http 2.4.25. But your
mpm_event_listener_wakeup_bug57399_V5 patch does no longer apply to http
2.2.25. Can you rebase it?

>> gdb bt shows this for all httpd childs:
> 
> These backtraces are the ones of the main thread, probably not the culprit.
> What does "thread apply all bt" say?

Will redo / save that output next time.

Thanks!

Greets,
Stefan

> Regards,
> Yann.
> 


Re: Frequent wake-ups for mpm_event

2016-12-22 Thread Yann Ylavic
Hi Stefan,

On Tue, Dec 20, 2016 at 1:52 PM, Stefan Priebe - Profihost AG
 wrote:
>
> Today i had another server giving no answers to any requests. apache
> fullstatus did not respond.

Since v5 of the patch, I committed another related change in trunk,
namely: http://svn.apache.org/r1774538
It's about lingering keepalive connections on graceful restart which
may not cause a wakeup.
Does it help?

>
> gdb bt shows this for all httpd childs:

These backtraces are the ones of the main thread, probably not the culprit.
What does "thread apply all bt" say?

Regards,
Yann.


Re: Frequent wake-ups for mpm_event

2016-12-20 Thread Stefan Priebe - Profihost AG
Hi Yann,

here is a follow up on this old thread with some problems again. See below.

Am 23.09.2016 um 21:26 schrieb Stefan Priebe - Profihost AG:
> Hi Yann,
>   Hi Luca,
> 
> right now i had another server which shows:
> AH00485: scoreboard is full, not at MaxRequestWorkers msg.
> 
> I never saw that before applying the event patch.
> 
> A fullstatus view is not answered - so the whole apache hangs. Also it
> has 84 running processes instead of normally just 4-8.
> # ps aux|grep httpd | wc -l
> 84
> 
> There are also not many connections:
> # netstat -na | egrep ":80|:443"|wc -l
> 110
> 
> Even after a restart all and even more httpd processes are there:
> # ps aux|grep httpd | wc -l
> 89
> 
> ps aux also tell me that there a lot of pretty old processes:
> # ps aux|grep httpd | grep Sep22 | wc -l
> 40
> 
> and even a lot of processes are from Sep 23 many hours ago. So i don't
> think this is bug #53555.
> 
> It just seems a lot of threads are hanging / deadlocked.
> 
> gdb says:
> (gdb) bt
> #0  0x7f9235b634db in pthread_join () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x7f9235d9e9bb in apr_thread_join () from
> /usr/lib/x86_64-linux-gnu/libapr-1.so.0
> #2  0x0055b283 in join_workers ()
> #3  0x0055b8cc in child_main ()
> #4  0x0055ba3f in make_child ()
> #5  0x0055c21e in perform_idle_server_maintenance ()
> #6  0x0055c619 in server_main_loop ()
> #7  0x0055c959 in event_run ()
> #8  0x00445f91 in ap_run_mpm ()
> #9  0x0043dd19 in main ()
> 
> 
> bt from all threads of an old httpd process:
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> 0x7f9235b634db in pthread_join () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> 
> Thread 52 (Thread 0x7f92325ee700 (LWP 13704)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 51 (Thread 0x7f9231ded700 (LWP 13705)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 50 (Thread 0x7f92315ec700 (LWP 13706)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 49 (Thread 0x7f9230deb700 (LWP 13707)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 48 (Thread 0x7f92305ea700 (LWP 13708)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 47 (Thread 0x7f922fde9700 (LWP 13709)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 46 (Thread 0x7f922f5e8700 (LWP 13710)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Thread 45 (Thread 0x7f922ede7700 (LWP 13711)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x005286f9 in get_mplx_next ()
> #2  0x0052fb34 in execute ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f

Re: Frequent wake-ups for mpm_event

2016-10-12 Thread Luca Toscano
2016-10-10 22:01 GMT+02:00 Stefan Priebe - Profihost AG <
[email protected]>:

> Hi Luca,
> Am 10.10.2016 um 16:48 schrieb Luca Toscano:
> > Hi Stefan,
> >
> > 2016-10-07 10:11 GMT+02:00 Stefan Priebe - Profihost AG
> > mailto:[email protected]>>:
> >
> > Am 05.10.2016  um 12:50 schrieb Luca Toscano:
> > > Hi Stefan,
> > >
> > > 2016-09-26 14:26 GMT+02:00 Stefan Priebe - Profihost AG
> > > mailto:[email protected]>
> > >>:
> > >
> > > currently no deadlocks since V5 also no old httpd processes
> anymore.
> > >
> > >
> > > if you still have patience and time, I'd like to ask you a
> question:
> > > have you noticed any performance improvement after applying the
> patch?
> > > For example (Yann correct me if I am wrong) I'd expect some
> reduction in
> > > system CPU utilization since the number of wake ups / context
> switches
> > > has been reduced dramatically. Moreover, it would be great to know
> some
> > > info about the status of the worker threads over time from the
> > > scoreboard, since it would be great not to introduce weird
> regressions
> > > with this patch :)
> >
> > Dear Luca,
> >
> > sure what can i exactly provide?
> >
> >
> > I would be interested in scoreboard related metrics over time, namely if
> > the number of busy/idle/graceful/etc.. worker statuses change from a
> > "regular" 2.4.23 httpd. I don't expect a bit difference, but I'd really
> > like to make sure that no big regression is introduced (like let's say,
> > an increase over time of worker threads stuck in a certain status like G
> > or more busy workers at request peak load time).
> >
> > Related to the above: have you noticed any improvement in
> > latency/throughput metrics? For example, an improvement in total amount
> > of requests handled in the same period of time o more generally in the
> > responsiveness of the websites served?
>
> It seems i cannot provide any of the requested data ;-(
>

You have already provided an awesome support to this patch Stefan, thanks a
lot!

Luca


Re: Frequent wake-ups for mpm_event

2016-10-10 Thread Stefan Priebe - Profihost AG
Hi Luca,
Am 10.10.2016 um 16:48 schrieb Luca Toscano:
> Hi Stefan,
> 
> 2016-10-07 10:11 GMT+02:00 Stefan Priebe - Profihost AG
> mailto:[email protected]>>:
> 
> Am 05.10.2016  um 12:50 schrieb Luca Toscano:
> > Hi Stefan,
> >
> > 2016-09-26 14:26 GMT+02:00 Stefan Priebe - Profihost AG
> > mailto:[email protected]>
> >>:
> >
> > currently no deadlocks since V5 also no old httpd processes anymore.
> >
> >
> > if you still have patience and time, I'd like to ask you a question:
> > have you noticed any performance improvement after applying the patch?
> > For example (Yann correct me if I am wrong) I'd expect some reduction in
> > system CPU utilization since the number of wake ups / context switches
> > has been reduced dramatically. Moreover, it would be great to know some
> > info about the status of the worker threads over time from the
> > scoreboard, since it would be great not to introduce weird regressions
> > with this patch :)
> 
> Dear Luca,
> 
> sure what can i exactly provide?
> 
> 
> I would be interested in scoreboard related metrics over time, namely if
> the number of busy/idle/graceful/etc.. worker statuses change from a
> "regular" 2.4.23 httpd. I don't expect a bit difference, but I'd really
> like to make sure that no big regression is introduced (like let's say,
> an increase over time of worker threads stuck in a certain status like G
> or more busy workers at request peak load time). 
> 
> Related to the above: have you noticed any improvement in
> latency/throughput metrics? For example, an improvement in total amount
> of requests handled in the same period of time o more generally in the
> responsiveness of the websites served?

It seems i cannot provide any of the requested data ;-(

This is an example output of a server running with the patch:
   Current Time: Monday, 10-Oct-2016 21:57:49 CEST
   Restart Time: Wednesday, 05-Oct-2016 06:30:31 CEST
   Parent Server Config. Generation: 12
   Parent Server MPM Generation: 11
   Server uptime: 5 days 15 hours 27 minutes 17 seconds
   Server load: 4.01 4.22 4.10
   Total accesses: 6485371 - Total Traffic: 139.0 GB
   CPU Usage: u2686.2 s1651.82 cu0 cs0 - .89% CPU load
   13.3 requests/sec - 298.9 kB/second - 22.5 kB/request
   10 requests currently being processed, 115 idle workers

   Slot  PID  Stopping   ConnectionsThreads  Async connections
   total accepting busy idle writing keep-alive closing
   06044  no   8 yes   223   0   1  4
   115254 no   3 yes   223   0   0  2
   215189 no   5 yes   322   0   0  2
   315081 no   5 yes   025   0   0  4
   415082 no   1 yes   322   0   0  0
   Sum  5 022  10   115  0   1  12

L__W__W__W_W__W_
__W_W__L_W___...







Stefan


Re: Frequent wake-ups for mpm_event

2016-10-10 Thread Luca Toscano
Hi Stefan,

2016-10-07 10:11 GMT+02:00 Stefan Priebe - Profihost AG <
[email protected]>:

> Am 05.10.2016 um 12:50 schrieb Luca Toscano:
> > Hi Stefan,
> >
> > 2016-09-26 14:26 GMT+02:00 Stefan Priebe - Profihost AG
> > mailto:[email protected]>>:
> >
> > currently no deadlocks since V5 also no old httpd processes anymore.
> >
> >
> > if you still have patience and time, I'd like to ask you a question:
> > have you noticed any performance improvement after applying the patch?
> > For example (Yann correct me if I am wrong) I'd expect some reduction in
> > system CPU utilization since the number of wake ups / context switches
> > has been reduced dramatically. Moreover, it would be great to know some
> > info about the status of the worker threads over time from the
> > scoreboard, since it would be great not to introduce weird regressions
> > with this patch :)
>
> Dear Luca,
>
> sure what can i exactly provide?
>

I would be interested in scoreboard related metrics over time, namely if
the number of busy/idle/graceful/etc.. worker statuses change from a
"regular" 2.4.23 httpd. I don't expect a bit difference, but I'd really
like to make sure that no big regression is introduced (like let's say, an
increase over time of worker threads stuck in a certain status like G or
more busy workers at request peak load time).

Related to the above: have you noticed any improvement in
latency/throughput metrics? For example, an improvement in total amount of
requests handled in the same period of time o more generally in the
responsiveness of the websites served?

I don't expect big changes but I am curious :)

Thanks again!

Regards,

Luca


Re: Frequent wake-ups for mpm_event

2016-10-07 Thread Stefan Priebe - Profihost AG
Am 05.10.2016 um 12:50 schrieb Luca Toscano:
> Hi Stefan,
> 
> 2016-09-26 14:26 GMT+02:00 Stefan Priebe - Profihost AG
> mailto:[email protected]>>:
> 
> currently no deadlocks since V5 also no old httpd processes anymore. 
> 
> 
> if you still have patience and time, I'd like to ask you a question:
> have you noticed any performance improvement after applying the patch?
> For example (Yann correct me if I am wrong) I'd expect some reduction in
> system CPU utilization since the number of wake ups / context switches
> has been reduced dramatically. Moreover, it would be great to know some
> info about the status of the worker threads over time from the
> scoreboard, since it would be great not to introduce weird regressions
> with this patch :)

Dear Luca,

sure what can i exactly provide?

I don't see a difference in CPU usage but the machines are pretty big
(Dual Xeon E5 2667) so i'm not sure if it is noticable.

Greets,
Stefan



Re: Frequent wake-ups for mpm_event

2016-10-05 Thread Luca Toscano
Hi Stefan,

2016-09-26 14:26 GMT+02:00 Stefan Priebe - Profihost AG <
[email protected]>:

> currently no deadlocks since V5 also no old httpd processes anymore.
>

if you still have patience and time, I'd like to ask you a question: have
you noticed any performance improvement after applying the patch? For
example (Yann correct me if I am wrong) I'd expect some reduction in system
CPU utilization since the number of wake ups / context switches has been
reduced dramatically. Moreover, it would be great to know some info about
the status of the worker threads over time from the scoreboard, since it
would be great not to introduce weird regressions with this patch :)

Thanks again!

Luca


Re: Frequent wake-ups for mpm_event

2016-09-26 Thread Stefan Priebe - Profihost AG
currently no deadlocks since V5 also no old httpd processes anymore.

Stefan
Am 24.09.2016 um 17:32 schrieb Yann Ylavic:
> On Sat, Sep 24, 2016 at 5:13 PM, Stefan Priebe - Profihost AG
>  wrote:
>>
>> saw v5 attached to bugzilla. Which one should I take?
> 
> Either, v5 is the same as v4 plus the fix (hopefully) for when no
> wakeable poll method exists on the system, which shouldn't be the case
> for you (although it was triggered in your previous testing with v3
> where linux' epoll was not detected as wakeable (this is addressed in
> both v4 et v5).
> 
> Regards,
> Yann.
> 


Re: Frequent wake-ups for mpm_event

2016-09-24 Thread Yann Ylavic
On Sat, Sep 24, 2016 at 5:13 PM, Stefan Priebe - Profihost AG
 wrote:
>
> saw v5 attached to bugzilla. Which one should I take?

Either, v5 is the same as v4 plus the fix (hopefully) for when no
wakeable poll method exists on the system, which shouldn't be the case
for you (although it was triggered in your previous testing with v3
where linux' epoll was not detected as wakeable (this is addressed in
both v4 et v5).

Regards,
Yann.


Re: Frequent wake-ups for mpm_event

2016-09-24 Thread Stefan Priebe - Profihost AG
Hi,

saw v5 attached to bugzilla. Which one should I take?

Greets,
Stefan

Excuse my typo sent from my mobile phone.

> Am 24.09.2016 um 15:05 schrieb Yann Ylavic :
> 
> Hi Stefan,
> 
> thanks again for the tests and very useful feedbacks!
> 
> On Fri, Sep 23, 2016 at 9:26 PM, Stefan Priebe - Profihost AG
>  wrote:
>> 
>> It just seems a lot of threads are hanging / deadlocked.
> 
> Clearly...
> 
>> 
>> gdb says:
> [...]
>> 
>> Thread 27 (Thread 0x7f92255d4700 (LWP 13783)):
>> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0055fa26 in ap_queue_pop_something ()
>> #2  0x0055a633 in worker_thread ()
>> #3  0x7f9235b620a4 in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> [...]
>> 
>> Thread 2 (Thread 0x7f92147e8700 (LWP 13808)):
>> #0  0x7f9235897bb3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x7f9235d99fc3 in ?? () from /usr/lib/x86_64-linux-gnu/libapr-1.so.0
>> #2  0x005595bd in listener_thread ()
>> #3  0x7f9235b620a4 in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> 
>> Thread 1 (Thread 0x7f9237159780 (LWP 13703)):
>> #0  0x7f9235b634db in pthread_join () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x7f9235d9e9bb in apr_thread_join () from
>> /usr/lib/x86_64-linux-gnu/libapr-1.so.0
>> #2  0x0055b283 in join_workers ()
>> #3  0x0055b8cc in child_main ()
>> #4  0x0055ba3f in make_child ()
>> #5  0x0055c21e in perform_idle_server_maintenance ()
>> #6  0x0055c619 in server_main_loop ()
>> #7  0x0055c959 in event_run ()
>> #8  0x00445f91 in ap_run_mpm ()
>> #9  0x0043dd19 in main ()
> 
> So the listener is not woken up while the child process is being
> gracefuly stopped, and no worker seems to be working.
> 
> It seems that 2.4.x is missing an important fix for my patch to work
> correctly (namely http://svn.apache.org/r1732228, which was never
> backported to 2.4).
> Without it, the listener/poll()ing wake-up thing is not enabled on linux...
> 
> This says that my patch is broken when no "good" polling method is
> available on the system (I can probably fix that later), but this
> shouldn't the case for linux' epoll.
> 
> Could you please give a(nother) try to this new v4 patch (attached)?
> This is simply v3 (from PR 57399) plus trunk's r1732228 mentioned above.
> 
> Regards,
> Yann.
> 


Re: Frequent wake-ups for mpm_event

2016-09-24 Thread Yann Ylavic
Hi Stefan,

thanks again for the tests and very useful feedbacks!

On Fri, Sep 23, 2016 at 9:26 PM, Stefan Priebe - Profihost AG
 wrote:
>
> It just seems a lot of threads are hanging / deadlocked.

Clearly...

>
> gdb says:
[...]
>
> Thread 27 (Thread 0x7f92255d4700 (LWP 13783)):
> #0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0055fa26 in ap_queue_pop_something ()
> #2  0x0055a633 in worker_thread ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
[...]
>
> Thread 2 (Thread 0x7f92147e8700 (LWP 13808)):
> #0  0x7f9235897bb3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7f9235d99fc3 in ?? () from /usr/lib/x86_64-linux-gnu/libapr-1.so.0
> #2  0x005595bd in listener_thread ()
> #3  0x7f9235b620a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> Thread 1 (Thread 0x7f9237159780 (LWP 13703)):
> #0  0x7f9235b634db in pthread_join () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x7f9235d9e9bb in apr_thread_join () from
> /usr/lib/x86_64-linux-gnu/libapr-1.so.0
> #2  0x0055b283 in join_workers ()
> #3  0x0055b8cc in child_main ()
> #4  0x0055ba3f in make_child ()
> #5  0x0055c21e in perform_idle_server_maintenance ()
> #6  0x0055c619 in server_main_loop ()
> #7  0x0055c959 in event_run ()
> #8  0x00445f91 in ap_run_mpm ()
> #9  0x0043dd19 in main ()

So the listener is not woken up while the child process is being
gracefuly stopped, and no worker seems to be working.

It seems that 2.4.x is missing an important fix for my patch to work
correctly (namely http://svn.apache.org/r1732228, which was never
backported to 2.4).
Without it, the listener/poll()ing wake-up thing is not enabled on linux...

This says that my patch is broken when no "good" polling method is
available on the system (I can probably fix that later), but this
shouldn't the case for linux' epoll.

Could you please give a(nother) try to this new v4 patch (attached)?
This is simply v3 (from PR 57399) plus trunk's r1732228 mentioned above.

Regards,
Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1761356)
+++ server/mpm/event/event.c	(working copy)
@@ -100,6 +100,8 @@
 #include  /* for INT_MAX */
 
 
+#define VOLATILE_READ(T, x) (*(volatile T *)&(x))
+
 /* Limit on the total --- clients will be locked out if more servers than
  * this are needed.  It is intended solely to keep the server from crashing
  * when things get out of hand.
@@ -174,6 +176,7 @@ static int dying = 0;
 static int workers_may_exit = 0;
 static int start_thread_may_exit = 0;
 static int listener_may_exit = 0;
+static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
in listener thread */
@@ -194,6 +197,17 @@ module AP_MODULE_DECLARE_DATA mpm_event_module;
 struct event_srv_cfg_s;
 typedef struct event_srv_cfg_s event_srv_cfg;
 
+/*
+ * The pollset for sockets that are in any of the timeout queues. Currently
+ * we use the timeout_mutex to make sure that connections are added/removed
+ * atomically to/from both event_pollset and a timeout queue. Otherwise
+ * some confusion can happen under high load if timeout queues and pollset
+ * get out of sync.
+ * XXX: It should be possible to make the lock unnecessary in many or even all
+ * XXX: cases.
+ */
+static apr_pollset_t *event_pollset;
+
 struct event_conn_state_t {
 /** APR_RING of expiration timeouts */
 APR_RING_ENTRY(event_conn_state_t) timeout_list;
@@ -239,24 +253,52 @@ static struct timeout_queue *write_completion_q,
 *keepalive_q,
 *linger_q,
 *short_linger_q;
+static apr_time_t queues_next_expiry;
 
 static apr_pollfd_t *listener_pollfd;
 
+/* Prevent extra poll/wakeup calls for timeouts close in the future (queues
+ * have the granularity of a second anyway).
+ * XXX: Wouldn't 0.5s (instead of 0.1s) be "enough"?
+ */
+#define TIMEOUT_FUDGE_FACTOR APR_TIME_C(10) /* 100 ms */
+
+/* Same goal as for TIMEOUT_FUDGE_FACTOR (avoid extra poll calls), but applied
+ * to timers. Since their timeouts are custom (user defined), we can't be too
+ * approximative here (hence using 0.01s).
+ */
+#define EVENT_FUDGE_FACTOR APR_TIME_C(1) /* 10 ms */
+
 /*
  * Macros for accessing struct timeout_queue.
  * For TO_QUEUE_APPEND and TO_QUEUE_REMOVE, timeout_mutex must be held.
  */
-#define TO_QUEUE_APPEND(q, el)   

Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Stefan Priebe - Profihost AG
Hi Yann,
  Hi Luca,

right now i had another server which shows:
AH00485: scoreboard is full, not at MaxRequestWorkers msg.

I never saw that before applying the event patch.

A fullstatus view is not answered - so the whole apache hangs. Also it
has 84 running processes instead of normally just 4-8.
# ps aux|grep httpd | wc -l
84

There are also not many connections:
# netstat -na | egrep ":80|:443"|wc -l
110

Even after a restart all and even more httpd processes are there:
# ps aux|grep httpd | wc -l
89

ps aux also tell me that there a lot of pretty old processes:
# ps aux|grep httpd | grep Sep22 | wc -l
40

and even a lot of processes are from Sep 23 many hours ago. So i don't
think this is bug #53555.

It just seems a lot of threads are hanging / deadlocked.

gdb says:
(gdb) bt
#0  0x7f9235b634db in pthread_join () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x7f9235d9e9bb in apr_thread_join () from
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#2  0x0055b283 in join_workers ()
#3  0x0055b8cc in child_main ()
#4  0x0055ba3f in make_child ()
#5  0x0055c21e in perform_idle_server_maintenance ()
#6  0x0055c619 in server_main_loop ()
#7  0x0055c959 in event_run ()
#8  0x00445f91 in ap_run_mpm ()
#9  0x0043dd19 in main ()


bt from all threads of an old httpd process:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f9235b634db in pthread_join () from
/lib/x86_64-linux-gnu/libpthread.so.0

Thread 52 (Thread 0x7f92325ee700 (LWP 13704)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 51 (Thread 0x7f9231ded700 (LWP 13705)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 50 (Thread 0x7f92315ec700 (LWP 13706)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 49 (Thread 0x7f9230deb700 (LWP 13707)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 48 (Thread 0x7f92305ea700 (LWP 13708)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 47 (Thread 0x7f922fde9700 (LWP 13709)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 46 (Thread 0x7f922f5e8700 (LWP 13710)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 45 (Thread 0x7f922ede7700 (LWP 13711)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f92358975dd in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 44 (Thread 0x7f922e5e6700 (LWP 13712)):
#0  0x7f9235b6608f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005286f9 in get_mplx_next ()
#2  0x0052fb34 in execute ()
#3  0x7f9235b620a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7f923589

Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Stefan Priebe - Profihost AG
Thanks Yann. I'll wait a few days to check first if i'll see this on
other hosts too. i'll also try to grab the server-status but i'm not
sure if the server will answer to this request if the scoreboard is full.

Greets,
Stefan

Am 23.09.2016 um 16:53 schrieb Yann Ylavic:
> With the patch this time...
> 
> On Fri, Sep 23, 2016 at 4:53 PM, Yann Ylavic  wrote:
>> On Fri, Sep 23, 2016 at 3:13 PM, Yann Ylavic  wrote:
>>>
>>> If you can try another (additive) patch on the faulty server, it would
>>> be awesome to test the one from [1], not sure it applies correctly
>>> with my 2.4.x patch though, possibly better with the trunk version
>>> over [1] (let me know otherwise).
>>
>> Not a big deal actually, the attached patch includes both [1] and [2] 
>> combined.
>>
>> Regards,
>> Yann.
>>
>> [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c55
>> [2] https://bz.apache.org/bugzilla/show_bug.cgi?id=57399#c9


Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Yann Ylavic
With the patch this time...

On Fri, Sep 23, 2016 at 4:53 PM, Yann Ylavic  wrote:
> On Fri, Sep 23, 2016 at 3:13 PM, Yann Ylavic  wrote:
>>
>> If you can try another (additive) patch on the faulty server, it would
>> be awesome to test the one from [1], not sure it applies correctly
>> with my 2.4.x patch though, possibly better with the trunk version
>> over [1] (let me know otherwise).
>
> Not a big deal actually, the attached patch includes both [1] and [2] 
> combined.
>
> Regards,
> Yann.
>
> [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c55
> [2] https://bz.apache.org/bugzilla/show_bug.cgi?id=57399#c9
Index: modules/generators/mod_status.c
===
--- modules/generators/mod_status.c	(revision 1761356)
+++ modules/generators/mod_status.c	(working copy)
@@ -531,7 +531,7 @@ static int status_handler(request_rec *r)
 
 if (is_async) {
 int write_completion = 0, lingering_close = 0, keep_alive = 0,
-connections = 0;
+connections = 0, stopping = 0, procs = 0;
 /*
  * These differ from 'busy' and 'ready' in how gracefully finishing
  * threads are counted. XXX: How to make this clear in the html?
@@ -539,13 +539,15 @@ static int status_handler(request_rec *r)
 int busy_workers = 0, idle_workers = 0;
 if (!short_report)
 ap_rputs("\n\n\n"
- "PID"
+ "Slot"
+ "PID"
+ "Stopping"
  "Connections\n"
  "Threads"
- "Async connections\n"
+ "Async connections\n"
  "totalaccepting"
- "busyidlewriting"
- "keep-aliveclosing\n", r);
+ "busyidle"
+ "writingkeep-aliveclosing\n", r);
 for (i = 0; i < server_limit; ++i) {
 ps_record = ap_get_scoreboard_process(i);
 if (ps_record->pid) {
@@ -555,26 +557,45 @@ static int status_handler(request_rec *r)
 lingering_close  += ps_record->lingering_close;
 busy_workers += thread_busy_buffer[i];
 idle_workers += thread_idle_buffer[i];
-if (!short_report)
-ap_rprintf(r, "%" APR_PID_T_FMT "%u"
-  "%s%u%u"
+if (!short_report) {
+const char *dying = "no";
+const char *old = "";
+if (ps_record->quiescing) {
+dying = "yes";
+stopping++;
+}
+if (ps_record->generation != mpm_generation)
+old = " (old gen)";
+procs++;
+ap_rprintf(r, "%u%" APR_PID_T_FMT ""
+  "%s%s"
+  "%u%s"
+  "%u%u"
   "%u%u%u"
   "\n",
-   ps_record->pid, ps_record->connections,
+   i, ps_record->pid,
+   dying, old,
+   ps_record->connections,
ps_record->not_accepting ? "no" : "yes",
-   thread_busy_buffer[i], thread_idle_buffer[i],
+   thread_busy_buffer[i],
+   thread_idle_buffer[i],
ps_record->write_completion,
ps_record->keep_alive,
ps_record->lingering_close);
+}
 }
 }
 if (!short_report) {
-ap_rprintf(r, "Sum%d %d"
-  "%d%d%d%d"
+ap_rprintf(r, "Sum"
+  "%d%d"
+  "%d "
+  "%d%d"
+  "%d%d%d"
   "\n\n",
-  connections, busy_workers, idle_workers,
+  procs, stopping,
+  connections,
+  busy_workers, idle_workers,
   write_completion, keep_alive, lingering_close);
-
 }
 else {
 ap_rprintf(r, "ConnsTotal: %d\n"
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1761356)
+++ server/mpm/event/event.c	(working copy)
@@ -100,6 +100,8 @@
 #include  /* for INT_MAX */
 
 
+#define VOLATILE_READ(T, x) (*(volatile T *)&(x))
+
 /* Limit on the total --- clients will be locked out if more servers than
  * this are needed.  It is intended solely to keep the s

Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Yann Ylavic
On Fri, Sep 23, 2016 at 3:13 PM, Yann Ylavic  wrote:
>
> If you can try another (additive) patch on the faulty server, it would
> be awesome to test the one from [1], not sure it applies correctly
> with my 2.4.x patch though, possibly better with the trunk version
> over [1] (let me know otherwise).

Not a big deal actually, the attached patch includes both [1] and [2] combined.

Regards,
Yann.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c55
[2] https://bz.apache.org/bugzilla/show_bug.cgi?id=57399#c9


Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Yann Ylavic
Hi Stefan,

On Fri, Sep 23, 2016 at 1:15 PM, Stefan Priebe - Profihost AG
 wrote:
>
> a applied the latest patch (incl. fudge factor) to 1500 real life servers.

Thank you very much for testing in real conditions.

>
> Today i had the situation that no websites were delivered. The error.log
> says:
>
> [Fri Sep 23 09:02:41.974245 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:42.975267 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:43.976217 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:44.977276 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:45.978314 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:46.979222 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:47.980304 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:48.981213 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
>
> Not sure if this is related to your patch.

Hm, this never happened before you applied the patch?
Did you observe a difference in the overall traffic (req/s, latency,
...) and/or CPU usage before it stopped responding?

What's your MaxRequestsPerChild setting, and if it's zero, do you
issue frequent graceful restarts?

It could of course be caused by (a bug in )the patch, but I don't see
for now how it could change the number of active/idle worker threads
on its own, but with a different traffic.
If it never happened before it may well be the case, though...
It would help to know the (mod_)status of these workers.

> I also found this one:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

If you can try another (additive) patch on the faulty server, it would
be awesome to test the one from [1], not sure it applies correctly
with my 2.4.x patch though, possibly better with the trunk version
over [1] (let me know otherwise).


Regards,
Yann.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c55


Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Stefan Priebe - Profihost AG
Hi Luca,
Am 23.09.2016 um 13:47 schrieb Luca Toscano:
> Hi Stefan,
> 
> 2016-09-23 13:15 GMT+02:00 Stefan Priebe - Profihost AG
> mailto:[email protected]>>:
> 
> Hi Yann,
> 
> a applied the latest patch (incl. fudge factor) to 1500 real life
> servers.
> 
> Today i had the situation that no websites were delivered. The error.log
> says:
> 
> [Fri Sep 23 09:02:41.974245 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:42.975267 2016] [mpm_event:error] [pid 6495:tid
> [..]
> [Fri Sep 23 09:02:48.981213 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> 
> Not sure if this is related to your patch. I also found this one:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=53555
> 
> 
> 
> Thanks a lot for the test and really sorry for the trouble. I have a
> couple of followup questions for you:

No problem - was just one server.

> 1) Did it happen to all the patched servers or just to one of them?
Just to one.

> Moreover, did it happen right after the deploy/upgrade or after a long
> time?
After 24hours.

> I am asking because I don't know your settings and it would help
> to know the scale of the impact to the 1500 servers.
> 
> 2) Have you checked the scoreboard in mod_status when the issue happened
> by any chance? It would be useful to know the status of the workers when
> they reached AH00485.
Ah no i did not.

Regards,
Stefan

> It could be an occurrence of bug 53555 or maybe a new behavior of
> mpm_event after this patch..
> 
> Regards,
> 
> Luca
> 
>  


Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Luca Toscano
Hi Stefan,

2016-09-23 13:15 GMT+02:00 Stefan Priebe - Profihost AG <
[email protected]>:

> Hi Yann,
>
> a applied the latest patch (incl. fudge factor) to 1500 real life servers.
>
> Today i had the situation that no websites were delivered. The error.log
> says:
>
> [Fri Sep 23 09:02:41.974245 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
> [Fri Sep 23 09:02:42.975267 2016] [mpm_event:error] [pid 6495:tid
> [..]
> [Fri Sep 23 09:02:48.981213 2016] [mpm_event:error] [pid 6495:tid
> 139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
>
> Not sure if this is related to your patch. I also found this one:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=53555
>
>
Thanks a lot for the test and really sorry for the trouble. I have a couple
of followup questions for you:

1) Did it happen to all the patched servers or just to one of them?
Moreover, did it happen right after the deploy/upgrade or after a long
time? I am asking because I don't know your settings and it would help to
know the scale of the impact to the 1500 servers.

2) Have you checked the scoreboard in mod_status when the issue happened by
any chance? It would be useful to know the status of the workers when they
reached AH00485.

It could be an occurrence of bug 53555 or maybe a new behavior of mpm_event
after this patch..

Regards,

Luca


Re: Frequent wake-ups for mpm_event

2016-09-23 Thread Stefan Priebe - Profihost AG
Hi Yann,

a applied the latest patch (incl. fudge factor) to 1500 real life servers.

Today i had the situation that no websites were delivered. The error.log
says:

[Fri Sep 23 09:02:41.974245 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:42.975267 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:43.976217 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:44.977276 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:45.978314 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:46.979222 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:47.980304 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers
[Fri Sep 23 09:02:48.981213 2016] [mpm_event:error] [pid 6495:tid
139689321596800] AH00485: scoreboard is full, not at MaxRequestWorkers

Not sure if this is related to your patch. I also found this one:
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Greets,
Stefan

Am 18.09.2016 um 14:16 schrieb Yann Ylavic:
> On Sun, Sep 18, 2016 at 12:16 PM, Luca Toscano  wrote:
>>
>> 2016-09-15 11:41 GMT+02:00 Stefan Priebe - Profihost AG
>> :
>>>
>>> that sounds great. Is there a version available that applies to 2.4 ?
>>
>> I tried to study/port Yann's patch to 2.4.x. but it is not that
>> straightforward since there are a lot of differences between the two
>> branches, most of them afaics are related to the skiplist's handling and
>> usage.
> 
> Patch against 2.4.x attached.
> 
> Regards,
> Yann.
> 


Re: Frequent wake-ups for mpm_event

2016-09-18 Thread Stefan Priebe - Profihost AG
Hi Yann,
Am 18.09.2016 um 14:16 schrieb Yann Ylavic:
> On Sun, Sep 18, 2016 at 12:16 PM, Luca Toscano  wrote:
>>
>> 2016-09-15 11:41 GMT+02:00 Stefan Priebe - Profihost AG
>> :
>>>
>>> that sounds great. Is there a version available that applies to 2.4 ?
>>
>> I tried to study/port Yann's patch to 2.4.x. but it is not that
>> straightforward since there are a lot of differences between the two
>> branches, most of them afaics are related to the skiplist's handling and
>> usage.
> 
> Patch against 2.4.x attached.

thanks a lot.

Stefan


Re: Frequent wake-ups for mpm_event

2016-09-18 Thread Stefan Priebe - Profihost AG
Hi Luca,

Am 18.09.2016 um 12:16 schrieb Luca Toscano:
> Hi Stefan,
> 
> 2016-09-15 11:41 GMT+02:00 Stefan Priebe - Profihost AG
> mailto:[email protected]>>:
> 
> Hi,
> 
> Am 15.09.2016 um 11:20 schrieb Stefan Eissing:
> > The patch works nicely on my OS X workhorse. Without it, I see 5-15 
> reactivations of the child processes every second, with the patch it stays at 
> 0 most of the time.
> >
> > Very nice work, Yann!
> 
> that sounds great. Is there a version available that applies to 2.4 ?
> 
> 
> I tried to study/port Yann's patch to 2.4.x. but it is not that
> straightforward since there are a lot of differences between the two
> branches, most of them afaics are related to the skiplist's handling and
> usage. In my opinion it would be great to use this patch to also make
> future backports easier, but I'll wait other opinions :)
> 
> If you have a way to test the 2.4.x patch (when it will be ready) in a
> "real" environment and give us feedback it would be really awesome.

Yes will do so. Already installed it on 3 production machines.


Greets,
Stefan
> 
> 
> 


Re: Frequent wake-ups for mpm_event

2016-09-18 Thread Yann Ylavic
On Sun, Sep 18, 2016 at 12:16 PM, Luca Toscano  wrote:
>
> 2016-09-15 11:41 GMT+02:00 Stefan Priebe - Profihost AG
> :
>>
>> that sounds great. Is there a version available that applies to 2.4 ?
>
> I tried to study/port Yann's patch to 2.4.x. but it is not that
> straightforward since there are a lot of differences between the two
> branches, most of them afaics are related to the skiplist's handling and
> usage.

Patch against 2.4.x attached.

Regards,
Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1761316)
+++ server/mpm/event/event.c	(working copy)
@@ -100,6 +100,8 @@
 #include  /* for INT_MAX */
 
 
+#define VOLATILE_READ(T, x) (*(volatile T *)&(x));
+
 /* Limit on the total --- clients will be locked out if more servers than
  * this are needed.  It is intended solely to keep the server from crashing
  * when things get out of hand.
@@ -174,6 +176,7 @@ static int dying = 0;
 static int workers_may_exit = 0;
 static int start_thread_may_exit = 0;
 static int listener_may_exit = 0;
+static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
in listener thread */
@@ -239,21 +242,45 @@ static struct timeout_queue *write_completion_q,
 *keepalive_q,
 *linger_q,
 *short_linger_q;
+static apr_time_t queues_next_expiry;
 
 static apr_pollfd_t *listener_pollfd;
 
 /*
+ * The pollset for sockets that are in any of the timeout queues. Currently
+ * we use the timeout_mutex to make sure that connections are added/removed
+ * atomically to/from both event_pollset and a timeout queue. Otherwise
+ * some confusion can happen under high load if timeout queues and pollset
+ * get out of sync.
+ * XXX: It should be possible to make the lock unnecessary in many or even all
+ * XXX: cases.
+ */
+static apr_pollset_t *event_pollset;
+
+/*
  * Macros for accessing struct timeout_queue.
  * For TO_QUEUE_APPEND and TO_QUEUE_REMOVE, timeout_mutex must be held.
  */
-#define TO_QUEUE_APPEND(q, el)\
-do {  \
-APR_RING_INSERT_TAIL(&(q)->head, el, event_conn_state_t,  \
- timeout_list);   \
-++*(q)->total;\
-++(q)->count; \
-} while (0)
+static void TO_QUEUE_APPEND(struct timeout_queue *q, event_conn_state_t *el)
+{
+APR_RING_INSERT_TAIL(&q->head, el, event_conn_state_t, timeout_list);
+++*q->total;
+++q->count;
 
+/* Cheaply update the overall queues' next expiry according to the
+ * first entry of this queue (oldest), if necessary.
+ */
+el = APR_RING_FIRST(&q->head);
+if (!queues_next_expiry
+|| queues_next_expiry > el->queue_timestamp + q->timeout) {
+queues_next_expiry = el->queue_timestamp + q->timeout;
+/* Unblock the listener if it's waiting on a longer timeout. */
+if (listener_is_wakeable) {
+apr_pollset_wakeup(event_pollset);
+}
+}
+}
+
 #define TO_QUEUE_REMOVE(q, el)\
 do {  \
 APR_RING_REMOVE(el, timeout_list);\
@@ -462,6 +489,11 @@ static void wakeup_listener(void)
 return;
 }
 
+/* unblock the listener if it's poll()ing */
+if (listener_is_wakeable) {
+apr_pollset_wakeup(event_pollset);
+}
+
 /* unblock the listener if it's waiting for a worker */
 ap_queue_info_term(worker_queue_info);
 
@@ -656,7 +688,11 @@ static apr_status_t decrement_connection_count(voi
 default:
 break;
 }
-apr_atomic_dec32(&connection_count);
+/* Unblock the listener if it's waiting for connection_count = 0 */
+if (!apr_atomic_dec32(&connection_count)
+ && listener_is_wakeable && listener_may_exit) {
+apr_pollset_wakeup(event_pollset);
+}
 return APR_SUCCESS;
 }
 
@@ -819,6 +855,7 @@ static void notify_resume(event_conn_state_t *cs,
 
 static int start_lingering_close_common(event_conn_state_t *cs, int in_worker)
 {
+int done = 0;
 apr_status_t rv;
 struct timeout_queue *q;
 apr_socket_t *csd = cs->pfd.desc.s;
@@ -830,7 +867,6 @@ static int start_lingering_close_common(event_conn
 #else
 apr_socket_timeout_set(csd, 0);
 #endif
-cs->queue_timestamp = apr_time_now();
 /*
  * If some module requested a shortened waiting period, only wait for
  * 2s

Re: Frequent wake-ups for mpm_event

2016-09-18 Thread Luca Toscano
Hi Stefan,

2016-09-15 11:41 GMT+02:00 Stefan Priebe - Profihost AG <
[email protected]>:

> Hi,
>
> Am 15.09.2016 um 11:20 schrieb Stefan Eissing:
> > The patch works nicely on my OS X workhorse. Without it, I see 5-15
> reactivations of the child processes every second, with the patch it stays
> at 0 most of the time.
> >
> > Very nice work, Yann!
>
> that sounds great. Is there a version available that applies to 2.4 ?


I tried to study/port Yann's patch to 2.4.x. but it is not that
straightforward since there are a lot of differences between the two
branches, most of them afaics are related to the skiplist's handling and
usage. In my opinion it would be great to use this patch to also make
future backports easier, but I'll wait other opinions :)

If you have a way to test the 2.4.x patch (when it will be ready) in a
"real" environment and give us feedback it would be really awesome.

Thanks!

Regards,

Luca


Re: Frequent wake-ups for mpm_event

2016-09-15 Thread Stefan Priebe - Profihost AG
Hi,

Am 15.09.2016 um 11:20 schrieb Stefan Eissing:
> The patch works nicely on my OS X workhorse. Without it, I see 5-15 
> reactivations of the child processes every second, with the patch it stays at 
> 0 most of the time.
> 
> Very nice work, Yann!

that sounds great. Is there a version available that applies to 2.4 ?

Greets,
Stefan

>> Am 14.09.2016 um 18:47 schrieb Luca Toscano :
>>
>>
>>
>> 2016-08-25 16:45 GMT+02:00 Yann Ylavic :
>>
>> Possibly others will test it too, and report...
>>
>>
>>
>> Ping to everybody interested in the list to check Yann's patch in 
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=57399 and provide feedback :)
>>
>> The end goal would be to reduce as much as possible the amount of wake-ups 
>> done by the mpm-event's listener thread. 
>>
>> Thanks!
>>
>> Regards,
>>
>> Luca
>>
> 


Re: Frequent wake-ups for mpm_event

2016-09-15 Thread Stefan Eissing
The patch works nicely on my OS X workhorse. Without it, I see 5-15 
reactivations of the child processes every second, with the patch it stays at 0 
most of the time.

Very nice work, Yann!

> Am 14.09.2016 um 18:47 schrieb Luca Toscano :
> 
> 
> 
> 2016-08-25 16:45 GMT+02:00 Yann Ylavic :
> 
> Possibly others will test it too, and report...
> 
> 
> 
> Ping to everybody interested in the list to check Yann's patch in 
> https://bz.apache.org/bugzilla/show_bug.cgi?id=57399 and provide feedback :)
> 
> The end goal would be to reduce as much as possible the amount of wake-ups 
> done by the mpm-event's listener thread. 
> 
> Thanks!
> 
> Regards,
> 
> Luca
> 



Re: Frequent wake-ups for mpm_event

2016-09-14 Thread Luca Toscano
2016-08-25 16:45 GMT+02:00 Yann Ylavic :

>
> Possibly others will test it too, and report...
>
>

Ping to everybody interested in the list to check Yann's patch in
https://bz.apache.org/bugzilla/show_bug.cgi?id=57399 and provide feedback :)

The end goal would be to reduce as much as possible the amount of wake-ups
done by the mpm-event's listener thread.

Thanks!

Regards,

Luca


Re: Frequent wake-ups for mpm_event

2016-08-25 Thread Yann Ylavic
Hi Luca,

On Thu, Aug 25, 2016 at 3:33 PM, Luca Toscano  wrote:
>
> It looks really great, I like a lot the overall solution but I haven't
> reviewed (and understood) the whole set of changes yet.

Thanks, there is room for improvement still I think, for example we
can probably avoid some close wakeups from TO_QUEUE_APPEND() by using
the TIMEOUT_FUDGE_FACTOR (likewise in event_get_timer_event() with
EVENT_FUDGE_FACTOR, provided timers are sensibly above it).

> Other than pure code
> review, I would like to know how changes like this one have been
> tested/accepted/rejected in the past, since it would be really great not to
> loose momentum imho.

AFAICT no change has ever been rejected if it:
- is enough tested,
- improves things (performances here, supported by numbers),
- does not break API/ABI (not an issue here).

So let's work on testing and see if it helps.
I have no hardware these days to do such tests, I can't start before September.
In the meantime, I also proposed the patch to PR 60006 (the OP seems
to be testing this area), hopefully we'll have some numbers from there
already.
Possibly others will test it too, and report...

The latest patch is currently in [1], but I think I'll push updates in
PR 60006 instead, which is more related to performances.


Regards,
Yann.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=57399#c5


Re: Frequent wake-ups for mpm_event

2016-08-25 Thread Luca Toscano
Hi Yann,

2016-08-21 4:14 GMT+02:00 Yann Ylavic :

> On Thu, Aug 18, 2016 at 3:17 PM, Yann Ylavic  wrote:
> >
> > I was thinking of something like the attached patch, where we compute
> > the exact/minimal timeout needed for the queues.
> >
> > It requires walking/locking the queues (looking at the first item
> > only), but besides getting the exact poll() timeout which helps
> > spurious wakeups, it also allows to avoid walking this queues after
> > the poll() (previously each 0.1s, for maintenance) when not necessary.
> > So overall I think it's a gain.
>
> Finally I came up with another way which requires no walking (hence
> locking) in the listener outside queues/timers processing (patch
> attached).
> It looks good to me, not heavily tested/debugged still...
>
> The approach is, for both timeout queues and timers, to maintain the
> next (lowest) expiry time of all entries (namely queues_next_expiry
> and timers_next_expiry).
> These global values are kept up to date in TO_QUEUE_APPEND (for
> timeout queues) and event_get_timer_event (for timers, where the
> skiplist is filled) by comparing the expiry of newly added entries
> (already protected by their respective timeout/skiplist mutex).
> This happens in the workers, and is cheap (locking was there already
> to protect from concurrent workers), it just requires an
> apr_pollset_wakeup to notice the listener whenever one of these
> expiries is updated.
>
> So now, in the listener, we can read these values directly to compute
> the min poll() timeout to use, and also for the timeout queues to
> determine whether or not it's time to process them.
> If an expiry is updated while poll()ing, it's woken up and the timeout
> is also updated.
> No locking needed here, I use volatile reads which is enough (no
> atomic/ordered read required, no apr_atomic_read64 to handle an
> apr_time_t anyway) since the listener is garanteed to be woken up when
> needed.
>
> This also benefits the !listener_is_wakeable case (with a maximum
> poll() timeout of 100ms), since we avoid the lock when no timer is
> armed (we don't use timers in 2.4.x afaict), and also avoid spurious
> timeout queues processing.
>
> The patch includes Luca's http://apaste.info/mke (base of
> APR_POLLSET_WAKEABLE handling) and also tries to make existing
> critical sections as short as possible (barely related, but while at
> it :)
>
> Thoughts?
>

It looks really great, I like a lot the overall solution but I haven't
reviewed (and understood) the whole set of changes yet. Other than pure
code review, I would like to know how changes like this one have been
tested/accepted/rejected in the past, since it would be really great not to
loose momentum imho.

Thanks a lot!

Luca


Re: Frequent wake-ups for mpm_event

2016-08-22 Thread Stefan Eissing
+1 Interesting.

If this proves to work reliably for mpm_event, it could be reused in mod_http2 
as a stepping stone for further mpm integration. ATM mod_http2 uses a 
conditional for wait/signalling with short timeouts and read attempts on its 
main connection. If it used a wakeup pipe instead of the conditional, no 
timeout stuttering would be needed.

> Am 21.08.2016 um 04:14 schrieb Yann Ylavic :
> 
> On Thu, Aug 18, 2016 at 3:17 PM, Yann Ylavic  wrote:
>> 
>> I was thinking of something like the attached patch, where we compute
>> the exact/minimal timeout needed for the queues.
>> 
>> It requires walking/locking the queues (looking at the first item
>> only), but besides getting the exact poll() timeout which helps
>> spurious wakeups, it also allows to avoid walking this queues after
>> the poll() (previously each 0.1s, for maintenance) when not necessary.
>> So overall I think it's a gain.
> 
> Finally I came up with another way which requires no walking (hence
> locking) in the listener outside queues/timers processing (patch
> attached).
> It looks good to me, not heavily tested/debugged still...
> 
> The approach is, for both timeout queues and timers, to maintain the
> next (lowest) expiry time of all entries (namely queues_next_expiry
> and timers_next_expiry).
> These global values are kept up to date in TO_QUEUE_APPEND (for
> timeout queues) and event_get_timer_event (for timers, where the
> skiplist is filled) by comparing the expiry of newly added entries
> (already protected by their respective timeout/skiplist mutex).
> This happens in the workers, and is cheap (locking was there already
> to protect from concurrent workers), it just requires an
> apr_pollset_wakeup to notice the listener whenever one of these
> expiries is updated.
> 
> So now, in the listener, we can read these values directly to compute
> the min poll() timeout to use, and also for the timeout queues to
> determine whether or not it's time to process them.
> If an expiry is updated while poll()ing, it's woken up and the timeout
> is also updated.
> No locking needed here, I use volatile reads which is enough (no
> atomic/ordered read required, no apr_atomic_read64 to handle an
> apr_time_t anyway) since the listener is garanteed to be woken up when
> needed.
> 
> This also benefits the !listener_is_wakeable case (with a maximum
> poll() timeout of 100ms), since we avoid the lock when no timer is
> armed (we don't use timers in 2.4.x afaict), and also avoid spurious
> timeout queues processing.
> 
> The patch includes Luca's http://apaste.info/mke (base of
> APR_POLLSET_WAKEABLE handling) and also tries to make existing
> critical sections as short as possible (barely related, but while at
> it :)
> 
> Thoughts?
> 
> 
> Regards,
> Yann.
> 



Re: Frequent wake-ups for mpm_event

2016-08-20 Thread Yann Ylavic
On Thu, Aug 18, 2016 at 3:17 PM, Yann Ylavic  wrote:
>
> I was thinking of something like the attached patch, where we compute
> the exact/minimal timeout needed for the queues.
>
> It requires walking/locking the queues (looking at the first item
> only), but besides getting the exact poll() timeout which helps
> spurious wakeups, it also allows to avoid walking this queues after
> the poll() (previously each 0.1s, for maintenance) when not necessary.
> So overall I think it's a gain.

Finally I came up with another way which requires no walking (hence
locking) in the listener outside queues/timers processing (patch
attached).
It looks good to me, not heavily tested/debugged still...

The approach is, for both timeout queues and timers, to maintain the
next (lowest) expiry time of all entries (namely queues_next_expiry
and timers_next_expiry).
These global values are kept up to date in TO_QUEUE_APPEND (for
timeout queues) and event_get_timer_event (for timers, where the
skiplist is filled) by comparing the expiry of newly added entries
(already protected by their respective timeout/skiplist mutex).
This happens in the workers, and is cheap (locking was there already
to protect from concurrent workers), it just requires an
apr_pollset_wakeup to notice the listener whenever one of these
expiries is updated.

So now, in the listener, we can read these values directly to compute
the min poll() timeout to use, and also for the timeout queues to
determine whether or not it's time to process them.
If an expiry is updated while poll()ing, it's woken up and the timeout
is also updated.
No locking needed here, I use volatile reads which is enough (no
atomic/ordered read required, no apr_atomic_read64 to handle an
apr_time_t anyway) since the listener is garanteed to be woken up when
needed.

This also benefits the !listener_is_wakeable case (with a maximum
poll() timeout of 100ms), since we avoid the lock when no timer is
armed (we don't use timers in 2.4.x afaict), and also avoid spurious
timeout queues processing.

The patch includes Luca's http://apaste.info/mke (base of
APR_POLLSET_WAKEABLE handling) and also tries to make existing
critical sections as short as possible (barely related, but while at
it :)

Thoughts?


Regards,
Yann.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1757019)
+++ server/mpm/event/event.c	(working copy)
@@ -107,6 +107,8 @@
 #include "serf.h"
 #endif
 
+#define VOLATILE_READ(T, x) (*(volatile T *)&(x));
+
 /* Limit on the total --- clients will be locked out if more servers than
  * this are needed.  It is intended solely to keep the server from crashing
  * when things get out of hand.
@@ -182,6 +184,7 @@ static int dying = 0;
 static int workers_may_exit = 0;
 static int start_thread_may_exit = 0;
 static int listener_may_exit = 0;
+static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */
 static int num_listensocks = 0;
 static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access
in listener thread */
@@ -204,6 +207,17 @@ module AP_MODULE_DECLARE_DATA mpm_event_module;
 struct event_srv_cfg_s;
 typedef struct event_srv_cfg_s event_srv_cfg;
 
+/*
+ * The pollset for sockets that are in any of the timeout queues. Currently
+ * we use the timeout_mutex to make sure that connections are added/removed
+ * atomically to/from both event_pollset and a timeout queue. Otherwise
+ * some confusion can happen under high load if timeout queues and pollset
+ * get out of sync.
+ * XXX: It should be possible to make the lock unnecessary in many or even all
+ * XXX: cases.
+ */
+static apr_pollset_t *event_pollset;
+
 struct event_conn_state_t {
 /** APR_RING of expiration timeouts */
 APR_RING_ENTRY(event_conn_state_t) timeout_list;
@@ -249,6 +263,7 @@ static struct timeout_queue *write_completion_q,
 *keepalive_q,
 *linger_q,
 *short_linger_q;
+static apr_time_t queues_next_expiry;
 
 static apr_pollfd_t *listener_pollfd;
 
@@ -256,14 +271,26 @@ static apr_pollfd_t *listener_pollfd;
  * Macros for accessing struct timeout_queue.
  * For TO_QUEUE_APPEND and TO_QUEUE_REMOVE, timeout_mutex must be held.
  */
-#define TO_QUEUE_APPEND(q, el)\
-do {  \
-APR_RING_INSERT_TAIL(&(q)->head, el, event_conn_state_t,  \
- timeout_list);   \
-++*(q)->total;\
-++(q)->count; \
-} while (0)
+static void TO_QUEUE_APPEND(struct timeout_queue *q, event_conn_state_t *el)
+{
+APR_RING_INSERT_TAIL(&q->head, e

Re: Frequent wake-ups for mpm_event

2016-08-18 Thread Yann Ylavic
On Thu, Aug 18, 2016 at 3:53 PM, Plüm, Rüdiger, Vodafone Group
 wrote:
>
> From a brief view looks good, but you miss to adjust ps->keep_alive in case
> that keepalives are killed due to busy workers.

Thanks for the review, fixed in the version I'm currently testing.

> And I don't get why we need this additional num = 0.

When apr_pollset_poll() returns an error, 'num' is not really/always
reliable (c.f. recent r1755758, and corresponding backports to APR
1.5/1.6]).
We better not fall through num > 0 here...


Regards,
Yann.


Re: Frequent wake-ups for mpm_event

2016-08-18 Thread Yann Ylavic
On Thu, Aug 18, 2016 at 11:04 AM, Plüm, Rüdiger, Vodafone Group
 wrote:
>
> From: Luca Toscano [mailto:[email protected]]
> Sent: Freitag, 12. August 2016 15:42
> To: Apache HTTP Server Development List
> Subject: Re: Frequent wake-ups for mpm_event
>
>> This patch might also miss another point, namely the calls to
>> process_timeout_queue to manage Keep Alive timeouts, lingering closes and
>> write completion. IIUC mpm-event explicitly process them after each wake up
>> (every 100ms).
>>
>> Will work more on it, in the meantime thanks to all for the feedback!
>
> Does it make sense to get the shortest timeout from keepalive_q,
> write_completion_q, linger_q and short_linger_q and set this value instead
> of -1?

I was thinking of something like the attached patch, where we compute
the exact/minimal timeout needed for the queues.

It requires walking/locking the queues (looking at the first item
only), but besides getting the exact poll() timeout which helps
spurious wakeups, it also allows to avoid walking this queues after
the poll() (previously each 0.1s, for maintenance) when not necessary.
So overall I think it's a gain.

The patch is only compile-tested and may not work as is, I just wanted
to show the principle for feedbacks. WDYT?


Regards,
Yann.

PS: I seem to have read some proposal on our bz (edit: PR 60006) to
replace the g_timer_skiplist_mtx with a spinlock for performances.
This looks like a good idea if the contention is low (acquired for
each polling loop), possibly also for the queues' mutex...
APR lacks spinlocks, though.
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c(revision 1756556)
+++ server/mpm/event/event.c(working copy)
@@ -1742,6 +1742,13 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 apr_time_t timeout_time = 0, last_log;
 int closed = 0, listeners_disabled = 0;
 int have_idle_worker = 0;
+struct timeout_queue *timeout_queues[] = {
+keepalive_q,
+write_completion_q,
+short_linger_q,
+linger_q,
+NULL
+};
 
 last_log = apr_time_now();
 free(ti);
@@ -1776,6 +1783,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 apr_int32_t num = 0;
 apr_uint32_t c_count, l_count, i_count;
 apr_interval_time_t timeout_interval;
+struct timeout_queue **qp;
 apr_time_t now;
 int workers_were_busy = 0;
 if (listener_may_exit) {
@@ -1821,20 +1829,9 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 #endif
 
 now = apr_time_now();
+timeout_time = timeout_interval = -1;
 apr_thread_mutex_lock(g_timer_skiplist_mtx);
-te = apr_skiplist_peek(timer_skiplist);
-if (te) {
-if (te->when > now) {
-timeout_interval = te->when - now;
-}
-else {
-timeout_interval = 1;
-}
-}
-else {
-timeout_interval = apr_time_from_msec(100);
-}
-while (te) {
+while ((te = apr_skiplist_peek(timer_skiplist))) {
 if (te->when < now + EVENT_FUDGE_FACTOR) {
 apr_skiplist_pop(timer_skiplist, NULL);
 if (!te->canceled) { 
@@ -1853,12 +1850,35 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 }
 }
 else {
+timeout_interval = te->when - now;
 break;
 }
-te = apr_skiplist_peek(timer_skiplist);
 }
 apr_thread_mutex_unlock(g_timer_skiplist_mtx);
 
+apr_thread_mutex_lock(timeout_mutex);
+for (qp = timeout_queues; *qp; ++qp) {
+struct timeout_queue *q = *qp;
+do {
+event_conn_state_t *cs = APR_RING_FIRST(&q->head);
+if (cs != APR_RING_SENTINEL(&q->head, event_conn_state_t,
+timeout_list)) {
+apr_time_t left = cs->queue_timestamp + q->timeout - now;
+if (left <= 0 || cs->queue_timestamp > now + q->timeout) {
+timeout_time = now + 1;
+timeout_interval = 1;
+}
+else if (timeout_interval < 0 || timeout_interval > left) {
+timeout_interval = left;
+}
+if (timeout_time < 0 || timeout_time > now + left) {
+timeout_time = now + left;
+}
+}
+} while ((q = q->next));
+}
+apr_thread_mutex_unlock(timeout_mutex);
+
 rc = apr_pollset_poll(event_pollset, timeout_interval, &num, &out_pfd);
 if (rc != APR_SUCCESS) {
 if (APR_ST

RE: Frequent wake-ups for mpm_event

2016-08-18 Thread Plüm , Rüdiger , Vodafone Group


From: Luca Toscano [mailto:[email protected]]
Sent: Freitag, 12. August 2016 15:42
To: Apache HTTP Server Development List
Subject: Re: Frequent wake-ups for mpm_event


This patch might also miss another point, namely the calls to 
process_timeout_queue to manage Keep Alive timeouts, lingering closes and write 
completion. IIUC mpm-event explicitly process them after each wake up (every 
100ms).

Will work more on it, in the meantime thanks to all for the feedback!

Does it make sense to get the shortest timeout from keepalive_q, 
write_completion_q, linger_q and short_linger_q and set this value instead of 
-1?

Regards

Rüdiger

Luca



Re: Frequent wake-ups for mpm_event

2016-08-12 Thread Luca Toscano
2016-08-11 8:43 GMT+02:00 Luca Toscano :

>
>
> 2016-08-09 15:02 GMT+02:00 Luca Toscano :
>
>> Thanks a lot for the feedback, trying to answer to everybody:
>>
>> 2016-08-05 15:19 GMT+02:00 Jim Jagielski :
>>
>>> If APR_POLLSET_WAKEABLE was more universal and, therefore,
>>> more widely tested, I'd be +1... as it is, let's see what
>>> the feedback is.
>>
>>
>> Definitely, we should find a good way to test this if we'll reach
>> consensus on a patch. Maybe some stress tests plus "incubation" in trunk
>> for a while could be good enough, but I agree that it would be an important
>> change to make with extreme care.
>>
>> 2016-08-05 15:26 GMT+02:00 Yann Ylavic :
>>
>>> On Fri, Aug 5, 2016 at 3:19 PM, Yann Ylavic 
>>> wrote:
>>> > Hi Luca,
>>> >
>>> > On Fri, Aug 5, 2016 at 2:58 PM, Luca Toscano 
>>> wrote:
>>> >>
>>> >> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
>>> >>>
>>> >>> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
>>> >>> event_pollset flags and calling apr_pollset_wakeup right after
>>> >>> apr_skiplist_insert?
>>> >
>>> > That might be possible, but there is more than apr_skiplist_insert()
>>> > to handle I'm afraid, see below.
>>> >
>>> >>
>>> >>  Very high level idea: http://apaste.info/MKE
>>> >
>>> > The timeout_interval also handles the listener_may_exit (and
>>> > connection_count == 0 in graceful mode), your patch seems not handle
>>> > them...
>>>
>>
>> You are right, and IIUC these are the use cases to cover:
>>
>> 1) listener thread(s) blocked on apr_pollset_poll and a call to
>> wakeup_listener (that afaiu is the only one setting listener_may_exit to 1);
>> 2) listener thread(s) blocked on apr_pollset_poll, listener_may_exit = 1
>> and connection_count still greater than zero. The last connection cleanup
>> (that calls decrement_connection_count registered previously
>> with apr_pool_cleanup_register) should be able to wake up the listener
>> that in turns will be able to execute the break in the following block:
>>
>> if (listener_may_exit) {
>> close_listeners(process_slot, &closed);
>> if (terminate_mode == ST_UNGRACEFUL
>> || apr_atomic_read32(&connection_count) == 0)
>> break;
>> }
>>
>> I would also add another one, namely if the pollset create does not
>> support APR_POLLSET_WAKEABLE. This would mean that in absence of
>> apr_pollset_wakeup we should rely on apr_pollset_poll with the 100 ms
>> timeout, keeping it as "fallback" to ensure compatibility with most of the
>> systems.
>>
>>
>>> > The listener_may_exit case may be easily handled in wakeup_listener
>>> > (one more apr_pollset_wakeup() there), however the connection_count
>>> > looks more complicated.
>>> >
>>> > IOW, not sure something like the below would be enough...
>>> >
>>> > Index: server/mpm/event/event.c
>>> > ===
>>> > --- server/mpm/event/event.c(revision 1755288)
>>> > +++ server/mpm/event/event.c(working copy)
>>> > @@ -484,6 +484,7 @@ static void close_worker_sockets(void)
>>> >  static void wakeup_listener(void)
>>> >  {
>>> >  listener_may_exit = 1;
>>> > +apr_pollset_wakeup(event_pollset);
>>> >  if (!listener_os_thread) {
>>> >  /* XXX there is an obscure path that this doesn't handle
>>> perfectly:
>>> >   * right after listener thread is created but before
>>>
>>> Or rather, in wakeup_listener():
>>>
>>> @@ -493,6 +493,9 @@ static void wakeup_listener(void)
>>>  return;
>>>  }
>>>
>>> +/* unblock the listener if it's waiting for a timer */
>>> +apr_pollset_wakeup(event_pollset);
>>> +
>>>  /* unblock the listener if it's waiting for a worker */
>>>  ap_queue_info_term(worker_queue_info);
>>>
>>> _
>>>
>>> > @@ -696,7 +697,9 @@ static apr_status_t decrement_connection_count(voi
>>> >  default:
>>> >  break;
>>> >  }
>>> > -apr_atomic_dec32(&connection_count);
>>> > +if (!apr_atomic_dec32(&connection_count) && listener_may_exit) {
>>> > +apr_pollset_wakeup(event_pollset);
>>> > +}
>>> >  return APR_SUCCESS;
>>> >  }
>>>
>>
>>  I like these suggestions, they look good! (This assuming that what I
>> have written above is correct, otherwise I didn't get them :)
>>
>
> I applied Yann's suggestions and added some checks to allow wakeable and
> non wakeable listeners: http://apaste.info/mke
>
> This is a very high level prototype, please let me know if you have more
> suggestions (of course the discussion around testing and reliability of
> APR_POLLSET_WAKEABLE still holds).
>

This patch might also miss another point, namely the calls to
process_timeout_queue to manage Keep Alive timeouts, lingering closes and
write completion. IIUC mpm-event explicitly process them after each wake up
(every 100ms).

Will work more on it, in the meantime thanks to all for the feedback!

Luca


Re: Frequent wake-ups for mpm_event

2016-08-10 Thread Luca Toscano
2016-08-09 15:02 GMT+02:00 Luca Toscano :

> Thanks a lot for the feedback, trying to answer to everybody:
>
> 2016-08-05 15:19 GMT+02:00 Jim Jagielski :
>
>> If APR_POLLSET_WAKEABLE was more universal and, therefore,
>> more widely tested, I'd be +1... as it is, let's see what
>> the feedback is.
>
>
> Definitely, we should find a good way to test this if we'll reach
> consensus on a patch. Maybe some stress tests plus "incubation" in trunk
> for a while could be good enough, but I agree that it would be an important
> change to make with extreme care.
>
> 2016-08-05 15:26 GMT+02:00 Yann Ylavic :
>
>> On Fri, Aug 5, 2016 at 3:19 PM, Yann Ylavic  wrote:
>> > Hi Luca,
>> >
>> > On Fri, Aug 5, 2016 at 2:58 PM, Luca Toscano 
>> wrote:
>> >>
>> >> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
>> >>>
>> >>> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
>> >>> event_pollset flags and calling apr_pollset_wakeup right after
>> >>> apr_skiplist_insert?
>> >
>> > That might be possible, but there is more than apr_skiplist_insert()
>> > to handle I'm afraid, see below.
>> >
>> >>
>> >>  Very high level idea: http://apaste.info/MKE
>> >
>> > The timeout_interval also handles the listener_may_exit (and
>> > connection_count == 0 in graceful mode), your patch seems not handle
>> > them...
>>
>
> You are right, and IIUC these are the use cases to cover:
>
> 1) listener thread(s) blocked on apr_pollset_poll and a call to
> wakeup_listener (that afaiu is the only one setting listener_may_exit to 1);
> 2) listener thread(s) blocked on apr_pollset_poll, listener_may_exit = 1
> and connection_count still greater than zero. The last connection cleanup
> (that calls decrement_connection_count registered previously
> with apr_pool_cleanup_register) should be able to wake up the listener
> that in turns will be able to execute the break in the following block:
>
> if (listener_may_exit) {
> close_listeners(process_slot, &closed);
> if (terminate_mode == ST_UNGRACEFUL
> || apr_atomic_read32(&connection_count) == 0)
> break;
> }
>
> I would also add another one, namely if the pollset create does not
> support APR_POLLSET_WAKEABLE. This would mean that in absence of
> apr_pollset_wakeup we should rely on apr_pollset_poll with the 100 ms
> timeout, keeping it as "fallback" to ensure compatibility with most of the
> systems.
>
>
>> > The listener_may_exit case may be easily handled in wakeup_listener
>> > (one more apr_pollset_wakeup() there), however the connection_count
>> > looks more complicated.
>> >
>> > IOW, not sure something like the below would be enough...
>> >
>> > Index: server/mpm/event/event.c
>> > ===
>> > --- server/mpm/event/event.c(revision 1755288)
>> > +++ server/mpm/event/event.c(working copy)
>> > @@ -484,6 +484,7 @@ static void close_worker_sockets(void)
>> >  static void wakeup_listener(void)
>> >  {
>> >  listener_may_exit = 1;
>> > +apr_pollset_wakeup(event_pollset);
>> >  if (!listener_os_thread) {
>> >  /* XXX there is an obscure path that this doesn't handle
>> perfectly:
>> >   * right after listener thread is created but before
>>
>> Or rather, in wakeup_listener():
>>
>> @@ -493,6 +493,9 @@ static void wakeup_listener(void)
>>  return;
>>  }
>>
>> +/* unblock the listener if it's waiting for a timer */
>> +apr_pollset_wakeup(event_pollset);
>> +
>>  /* unblock the listener if it's waiting for a worker */
>>  ap_queue_info_term(worker_queue_info);
>>
>> _
>>
>> > @@ -696,7 +697,9 @@ static apr_status_t decrement_connection_count(voi
>> >  default:
>> >  break;
>> >  }
>> > -apr_atomic_dec32(&connection_count);
>> > +if (!apr_atomic_dec32(&connection_count) && listener_may_exit) {
>> > +apr_pollset_wakeup(event_pollset);
>> > +}
>> >  return APR_SUCCESS;
>> >  }
>>
>
>  I like these suggestions, they look good! (This assuming that what I have
> written above is correct, otherwise I didn't get them :)
>

I applied Yann's suggestions and added some checks to allow wakeable and
non wakeable listeners: http://apaste.info/mke

This is a very high level prototype, please let me know if you have more
suggestions (of course the discussion around testing and reliability of
APR_POLLSET_WAKEABLE still holds).

Thanks!

Regards,

Luca


Re: Frequent wake-ups for mpm_event

2016-08-09 Thread Jim Jagielski

> On Aug 9, 2016, at 9:02 AM, Luca Toscano  wrote:
> 
> 
>  I like these suggestions, they look good! (This assuming that what I have 
> written above is correct, otherwise I didn't get them :)
> 

+1



Re: Frequent wake-ups for mpm_event

2016-08-09 Thread Luca Toscano
Thanks a lot for the feedback, trying to answer to everybody:

2016-08-05 15:19 GMT+02:00 Jim Jagielski :

> If APR_POLLSET_WAKEABLE was more universal and, therefore,
> more widely tested, I'd be +1... as it is, let's see what
> the feedback is.


Definitely, we should find a good way to test this if we'll reach consensus
on a patch. Maybe some stress tests plus "incubation" in trunk for a while
could be good enough, but I agree that it would be an important change to
make with extreme care.

2016-08-05 15:26 GMT+02:00 Yann Ylavic :

> On Fri, Aug 5, 2016 at 3:19 PM, Yann Ylavic  wrote:
> > Hi Luca,
> >
> > On Fri, Aug 5, 2016 at 2:58 PM, Luca Toscano 
> wrote:
> >>
> >> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
> >>>
> >>> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
> >>> event_pollset flags and calling apr_pollset_wakeup right after
> >>> apr_skiplist_insert?
> >
> > That might be possible, but there is more than apr_skiplist_insert()
> > to handle I'm afraid, see below.
> >
> >>
> >>  Very high level idea: http://apaste.info/MKE
> >
> > The timeout_interval also handles the listener_may_exit (and
> > connection_count == 0 in graceful mode), your patch seems not handle
> > them...
>

You are right, and IIUC these are the use cases to cover:

1) listener thread(s) blocked on apr_pollset_poll and a call to
wakeup_listener (that afaiu is the only one setting listener_may_exit to 1);
2) listener thread(s) blocked on apr_pollset_poll, listener_may_exit = 1
and connection_count still greater than zero. The last connection cleanup
(that calls decrement_connection_count registered previously
with apr_pool_cleanup_register) should be able to wake up the listener that
in turns will be able to execute the break in the following block:

if (listener_may_exit) {
close_listeners(process_slot, &closed);
if (terminate_mode == ST_UNGRACEFUL
|| apr_atomic_read32(&connection_count) == 0)
break;
}

I would also add another one, namely if the pollset create does not support
APR_POLLSET_WAKEABLE. This would mean that in absence of apr_pollset_wakeup
we should rely on apr_pollset_poll with the 100 ms timeout, keeping it as
"fallback" to ensure compatibility with most of the systems.


> > The listener_may_exit case may be easily handled in wakeup_listener
> > (one more apr_pollset_wakeup() there), however the connection_count
> > looks more complicated.
> >
> > IOW, not sure something like the below would be enough...
> >
> > Index: server/mpm/event/event.c
> > ===
> > --- server/mpm/event/event.c(revision 1755288)
> > +++ server/mpm/event/event.c(working copy)
> > @@ -484,6 +484,7 @@ static void close_worker_sockets(void)
> >  static void wakeup_listener(void)
> >  {
> >  listener_may_exit = 1;
> > +apr_pollset_wakeup(event_pollset);
> >  if (!listener_os_thread) {
> >  /* XXX there is an obscure path that this doesn't handle
> perfectly:
> >   * right after listener thread is created but before
>
> Or rather, in wakeup_listener():
>
> @@ -493,6 +493,9 @@ static void wakeup_listener(void)
>  return;
>  }
>
> +/* unblock the listener if it's waiting for a timer */
> +apr_pollset_wakeup(event_pollset);
> +
>  /* unblock the listener if it's waiting for a worker */
>  ap_queue_info_term(worker_queue_info);
>
> _
>
> > @@ -696,7 +697,9 @@ static apr_status_t decrement_connection_count(voi
> >  default:
> >  break;
> >  }
> > -apr_atomic_dec32(&connection_count);
> > +if (!apr_atomic_dec32(&connection_count) && listener_may_exit) {
> > +apr_pollset_wakeup(event_pollset);
> > +}
> >  return APR_SUCCESS;
> >  }
>

 I like these suggestions, they look good! (This assuming that what I have
written above is correct, otherwise I didn't get them :)

Regards,

 Luca


Re: Frequent wake-ups for mpm_event

2016-08-05 Thread Yann Ylavic
On Fri, Aug 5, 2016 at 3:19 PM, Yann Ylavic  wrote:
> Hi Luca,
>
> On Fri, Aug 5, 2016 at 2:58 PM, Luca Toscano  wrote:
>>
>> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
>>>
>>> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
>>> event_pollset flags and calling apr_pollset_wakeup right after
>>> apr_skiplist_insert?
>
> That might be possible, but there is more than apr_skiplist_insert()
> to handle I'm afraid, see below.
>
>>
>>  Very high level idea: http://apaste.info/MKE
>
> The timeout_interval also handles the listener_may_exit (and
> connection_count == 0 in graceful mode), your patch seems not handle
> them...
>
> The listener_may_exit case may be easily handled in wakeup_listener
> (one more apr_pollset_wakeup() there), however the connection_count
> looks more complicated.
>
> IOW, not sure something like the below would be enough...
>
> Index: server/mpm/event/event.c
> ===
> --- server/mpm/event/event.c(revision 1755288)
> +++ server/mpm/event/event.c(working copy)
> @@ -484,6 +484,7 @@ static void close_worker_sockets(void)
>  static void wakeup_listener(void)
>  {
>  listener_may_exit = 1;
> +apr_pollset_wakeup(event_pollset);
>  if (!listener_os_thread) {
>  /* XXX there is an obscure path that this doesn't handle perfectly:
>   * right after listener thread is created but before

Or rather, in wakeup_listener():

@@ -493,6 +493,9 @@ static void wakeup_listener(void)
 return;
 }

+/* unblock the listener if it's waiting for a timer */
+apr_pollset_wakeup(event_pollset);
+
 /* unblock the listener if it's waiting for a worker */
 ap_queue_info_term(worker_queue_info);

_

> @@ -696,7 +697,9 @@ static apr_status_t decrement_connection_count(voi
>  default:
>  break;
>  }
> -apr_atomic_dec32(&connection_count);
> +if (!apr_atomic_dec32(&connection_count) && listener_may_exit) {
> +apr_pollset_wakeup(event_pollset);
> +}
>  return APR_SUCCESS;
>  }
>
> _
>
> Regards,
> Yann.


Re: Frequent wake-ups for mpm_event

2016-08-05 Thread Yann Ylavic
Hi Luca,

On Fri, Aug 5, 2016 at 2:58 PM, Luca Toscano  wrote:
>
> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
>>
>> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
>> event_pollset flags and calling apr_pollset_wakeup right after
>> apr_skiplist_insert?

That might be possible, but there is more than apr_skiplist_insert()
to handle I'm afraid, see below.

>
>  Very high level idea: http://apaste.info/MKE

The timeout_interval also handles the listener_may_exit (and
connection_count == 0 in graceful mode), your patch seems not handle
them...

The listener_may_exit case may be easily handled in wakeup_listener
(one more apr_pollset_wakeup() there), however the connection_count
looks more complicated.

IOW, not sure something like the below would be enough...

Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c(revision 1755288)
+++ server/mpm/event/event.c(working copy)
@@ -484,6 +484,7 @@ static void close_worker_sockets(void)
 static void wakeup_listener(void)
 {
 listener_may_exit = 1;
+apr_pollset_wakeup(event_pollset);
 if (!listener_os_thread) {
 /* XXX there is an obscure path that this doesn't handle perfectly:
  * right after listener thread is created but before
@@ -696,7 +697,9 @@ static apr_status_t decrement_connection_count(voi
 default:
 break;
 }
-apr_atomic_dec32(&connection_count);
+if (!apr_atomic_dec32(&connection_count) && listener_may_exit) {
+apr_pollset_wakeup(event_pollset);
+}
 return APR_SUCCESS;
 }

_

Regards,
Yann.


Re: Frequent wake-ups for mpm_event

2016-08-05 Thread Jim Jagielski
If APR_POLLSET_WAKEABLE was more universal and, therefore,
more widely tested, I'd be +1... as it is, let's see what
the feedback is.

> On Aug 5, 2016, at 8:58 AM, Luca Toscano  wrote:
> 
> 
> 
> 2016-08-04 17:56 GMT+02:00 Luca Toscano :
> Hi Apache Devs,
> 
> there is an interesting bugzilla ticket about mpm_event and frequent 
> wake-ups: https://bz.apache.org/bugzilla/show_bug.cgi?id=57399
> 
> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the 
> event_pollset flags and calling apr_pollset_wakeup right after 
> apr_skiplist_insert?
> 
> I might have completely misunderstood the code, but I wanted to bring up the 
> issue to have more expert opinions from the list. The skiplist use case is 
> still a bit unclear to me even if we discussed it a while ago in another 
> email thread.
> 
>  Very high level idea: http://apaste.info/MKE
> 
> Remarks from APR:
> 
>  * @remark If flags contains APR_POLLSET_WAKEABLE, then a pollset is
>  * created with additional internal pipe object used for the
>  * apr_pollset_wakeup() call. The actual size of pollset is
>  * in that case size + 1. This feature is only supported on some
>  * platforms; the apr_pollset_create_ex() call will fail with
>  * APR_ENOTIMPL on platforms where it is not supported.
> 
> Moreover for apr_pollset_wakeup: "If the pollset was not created with 
> APR_POLLSET_WAKEABLE the return value is APR_EINIT."
> 
> This is only an idea of course, I am probably missing a lot of pieces, but 
> let me know if you have any feedback.
> 
> Regards,
> 
> Luca
> 



Re: Frequent wake-ups for mpm_event

2016-08-05 Thread Luca Toscano
2016-08-04 17:56 GMT+02:00 Luca Toscano :

> Hi Apache Devs,
>
> there is an interesting bugzilla ticket about mpm_event and frequent
> wake-ups: https://bz.apache.org/bugzilla/show_bug.cgi?id=57399
>
> Would it be possible to avoid them adding APR_POLLSET_WAKEABLE to the
> event_pollset flags and calling apr_pollset_wakeup right after
> apr_skiplist_insert?
>
> I might have completely misunderstood the code, but I wanted to bring up
> the issue to have more expert opinions from the list. The skiplist use case
> is still a bit unclear to me even if we discussed it a while ago in another
> email thread.
>

 Very high level idea: http://apaste.info/MKE

Remarks from APR:

 * @remark If flags contains APR_POLLSET_WAKEABLE, then a pollset is
 * created with additional internal pipe object used for the
 * apr_pollset_wakeup() call. The actual size of pollset is
 * in that case size + 1. This feature is only supported on some
 * platforms; the apr_pollset_create_ex() call will fail with
 * APR_ENOTIMPL on platforms where it is not supported.

Moreover for apr_pollset_wakeup: "If the pollset was not created with
APR_POLLSET_WAKEABLE the return value is APR_EINIT."

This is only an idea of course, I am probably missing a lot of pieces, but
let me know if you have any feedback.

Regards,

Luca