Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On Tue, 27 Oct 2015 11:10:08 -0500 William A Rowe Jr wrote: > In general, the thread safety does work, but is not as efficient as it > could be. Last I looked, PHP throws in quite a kitchen-sink, including components like old libraries (like libgif, libjpeg) written back in the 1980s for commandline and desktop programs. Far from thread-safe. It also did some Bad Things like global customisation of libraries like libxml2, so that another application might unintentionally get PHP's substitute handlers leading usually to segfault and potentially worse. Though that is very out-of-date. Hence, always best to use it in its own fastcgi environment where it won't mess with anything else in the server. -- Nick Kew
Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On Oct 27, 2015 05:38, "Arkadiusz Miśkiewicz" wrote: > > On Monday 26 of October 2015, Yehezkel Horowitz wrote: > > First, thanks Nick for the feedback. > > > > I have submitted https://bz.apache.org/bugzilla/show_bug.cgi?id=58550 as > > you suggested. > > > > >If a threaded MPM really isn't an option (for most users the obvious > > >solution), then the question is what works for you. > > > > I can't use threaded MPM as PHP (at least my version) doesn't support it. > > Not only yours. php doesn't support thread safety for normal usage... it's > marked experimental for ages: > > From php 5.6/7.0 configure help: > > " --enable-maintainer-zts Enable thread safety - for code maintainers only!!" In general, the thread safety does work, but is not as efficient as it could be. Which is why most php developers and users rely on fastcgi (in httpd, either through mod_fcgid or mod_proxy_fcgi). It is generally cleaner and more efficient to run a smaller pool of php fcgi responders to service most applications, keep the benefits of the event (or worker) mpm in httpd. mod_php is a bit heavyweight to hold memory reservations on every httpd worker. Properly tuned you should see excellent performance, depending on whether your php scripts tend to block (remote SQL access, for example) - but efficient php can probably be tuned at 2 workers per core and adjust from there.
Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On Monday 26 of October 2015, Yehezkel Horowitz wrote: > First, thanks Nick for the feedback. > > I have submitted https://bz.apache.org/bugzilla/show_bug.cgi?id=58550 as > you suggested. > > >If a threaded MPM really isn't an option (for most users the obvious > >solution), then the question is what works for you. > > I can't use threaded MPM as PHP (at least my version) doesn't support it. Not only yours. php doesn't support thread safety for normal usage... it's marked experimental for ages: From php 5.6/7.0 configure help: " --enable-maintainer-zts Enable thread safety - for code maintainers only!!" -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On Mon, Oct 26, 2015 at 12:45 PM, Yehezkel Horowitz wrote: > >>The following patch was recently backported to v2.4, how similar is your >> patch to this one? > >> *) MPMs: Support SO_REUSEPORT to create multiple duplicated listener > > records for scalability. [Yingqi Lu , > > Jeff Trawick, Jim Jagielski, Yann Ylavic] > > Both patches might come to solve similar problem, but SO_REUSEPORT requires > Linux 3.9 (which is quite new in Linux terms). Maybe this requirement could be relaxed so that the listeners buckets (and their own accept mutex) would be available even without the SO_REUSEPORT option. Could you test this? Regards, Yann.
RE: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
>Just to clarify, all updates to httpd need to be made to trunk first, which >then allow it to be backported to v2.4, and then v2.2. >The reason for this is we don’t want features added to v2.2 that then >subsequently vanish when v2.4 comes out (and so on). Understood. I just want to get some feedback about my initial implementation. If there will be interest – I’ll be glad to write an updated patch to be applied to trunk. >The following patch was recently backported to v2.4, how similar is your patch >to this one? > *) MPMs: Support SO_REUSEPORT to create multiple duplicated listener records for scalability. [Yingqi Lu mailto:yingqi...@intel.com>>, Jeff Trawick, Jim Jagielski, Yann Ylavic] Both patches might come to solve similar problem, but SO_REUSEPORT requires Linux 3.9 (which is quite new in Linux terms). Thanks for the feedback, Yehezkel Horowitz Check Point Software Technologies Ltd.
RE: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
First, thanks Nick for the feedback. I have submitted https://bz.apache.org/bugzilla/show_bug.cgi?id=58550 as you suggested. >If a threaded MPM really isn't an option (for most users the obvious >solution), then the question is what works for you. I can't use threaded MPM as PHP (at least my version) doesn't support it. The patch worked for me very well, but I'm not sure I didn't missed some pitfalls, which someone with much more knowledge about Apache internals (specially on Linux) will easily see. >How well does your patch apply to trunk? You can't apply my patch to 2.4 or trunk, as since 2.4 there is a "prefork_child_bucket" concept, which I don't fully understand its role (and how it relate to other MPMs). I'll be happy to write an updated patch if someone could explain me the role of the "prefork_child_bucket". Regards, Yehezkel Horowitz Check Point Software Technologies Ltd.
Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On 26 Oct 2015, at 10:45 AM, Yehezkel Horowitz wrote: > Any chance someone could take a short look and provide me a feedback (of any > kind)? > > I know your focus is on 2.4 and trunk, but there are still many 2.2 servers > out there… > > Patch attached again for you convenience.… Just to clarify, all updates to httpd need to be made to trunk first, which then allow it to be backported to v2.4, and then v2.2. The reason for this is we don’t want features added to v2.2 that then subsequently vanish when v2.4 comes out (and so on). The following patch was recently backported to v2.4, how similar is your patch to this one? *) MPMs: Support SO_REUSEPORT to create multiple duplicated listener records for scalability. [Yingqi Lu , Jeff Trawick, Jim Jagielski, Yann Ylavic] Regards, Graham —
Re: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
On Mon, 2015-10-26 at 08:45 +, Yehezkel Horowitz wrote: > Any chance someone could take a short look and provide me a feedback > (of any kind)? A patch posted here may get lost, especially if it's not simple and obvious enough for instant review and understanding. Posting it as an Enhancement request in Bugzilla would leave a record of it. > 1. Do you think this is a good implementation of the suggested > idea? If a threaded MPM really isn't an option (for most users the obvious solution), then the question is what works for you. > 3. Would you consider accepting this patch to the project? > If so, could you guide me what else needs to be done for acceptances? > I know there is a need for configuration & documentation work - I’ll > work on once the patch will be approved… Unlikely it would get in to a future 2.2 release unless it fixed something much more than an arcane performance problem (arcane because because it only happens when you reject conventional ways to boost performance, like another MPM). How well does your patch apply to trunk? If you don't want to go in that direction, you could post somewhere always available for anyone interested. Our bugzilla would serve, as would somewhere else you publish from, like github or a personal site. -- Nick Kew
RE: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
Any chance someone could take a short look and provide me a feedback (of any kind)? I know your focus is on 2.4 and trunk, but there are still many 2.2 servers out there... Patch attached again for you convenience Yehezkel Horowitz Check Point Software Technologies Ltd. From: Yehezkel Horowitz Sent: Monday, October 19, 2015 6:14 PM To: dev@httpd.apache.org Subject: Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached) Hello Apache gurus. I was working on a project which used Apache 2.2.x with prefork MPM (using flock as mutex method) on Linux machine (with 20 cores), and run into the following problem. During load, when number of Apache child processes get beyond some point (~3000 processes) - Apache didn't accept the incoming connections in reasonable time (seen in netstat as SYN_RECV). I found a document about Apache Performance Tuning [1], in which there is an idea to improve the performance by: "Another solution that has been considered but never implemented is to partially serialize the loop -- that is, let in a certain number of processes. This would only be of interest on multiprocessor boxes where it's possible that multiple children could run simultaneously, and the serialization actually doesn't take advantage of the full bandwidth. This is a possible area of future investigation, but priority remains low because highly parallel web servers are not the norm." I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 mutexes and spread the child processes across the mutexes (by getpid() % mutex_number). So in any given time - 4 ideal child processes are expected [2] to wait in the "select loop". Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed to accept the socket (and will release his mutex) and 3 will return to the "select loop". This solved my specific problem and allowed me to get more load on the machine. My questions to this forum are: 1. Do you think this is a good implementation of the suggested idea? 2. Any pitfalls I missed? 3. Would you consider accepting this patch to the project? If so, could you guide me what else needs to be done for acceptances? I know there is a need for configuration & documentation work - I'll work on once the patch will be approved... 4. Do you think '4' is a good default for the mutexes number? What should be the considerations to set the default? 5. Does such implementation relevant for other MPMs (worker/event)? Any other feedback is welcome. [1] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html, accept Serialization - Multiple Sockets section. [2] There is no guarantee that exactly 4 processes will wait as all processes of "getpid() % mutex_number == 0" might be busy in a given time. But this sounds to me like a fair limitation. Note: flock give me the best results, still it seems to be with n^2 complexity (where 'n' is the number of waiting processes), so reducing the number of processes waiting on each mutex give exponential improvement. Regards, Yehezkel Horowitz Check Point Software Technologies Ltd. multi-accept-mutexes.patch Description: multi-accept-mutexes.patch
Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
Hello Apache gurus. I was working on a project which used Apache 2.2.x with prefork MPM (using flock as mutex method) on Linux machine (with 20 cores), and run into the following problem. During load, when number of Apache child processes get beyond some point (~3000 processes) - Apache didn't accept the incoming connections in reasonable time (seen in netstat as SYN_RECV). I found a document about Apache Performance Tuning [1], in which there is an idea to improve the performance by: "Another solution that has been considered but never implemented is to partially serialize the loop -- that is, let in a certain number of processes. This would only be of interest on multiprocessor boxes where it's possible that multiple children could run simultaneously, and the serialization actually doesn't take advantage of the full bandwidth. This is a possible area of future investigation, but priority remains low because highly parallel web servers are not the norm." I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 mutexes and spread the child processes across the mutexes (by getpid() % mutex_number). So in any given time - 4 ideal child processes are expected [2] to wait in the "select loop". Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed to accept the socket (and will release his mutex) and 3 will return to the "select loop". This solved my specific problem and allowed me to get more load on the machine. My questions to this forum are: 1. Do you think this is a good implementation of the suggested idea? 2. Any pitfalls I missed? 3. Would you consider accepting this patch to the project? If so, could you guide me what else needs to be done for acceptances? I know there is a need for configuration & documentation work - I'll work on once the patch will be approved... 4. Do you think '4' is a good default for the mutexes number? What should be the considerations to set the default? 5. Does such implementation relevant for other MPMs (worker/event)? Any other feedback is welcome. [1] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html, accept Serialization - Multiple Sockets section. [2] There is no guarantee that exactly 4 processes will wait as all processes of "getpid() % mutex_number == 0" might be busy in a given time. But this sounds to me like a fair limitation. Note: flock give me the best results, still it seems to be with n^2 complexity (where 'n' is the number of waiting processes), so reducing the number of processes waiting on each mutex give exponential improvement. Regards, Yehezkel Horowitz Check Point Software Technologies Ltd. multi-accept-mutexes.patch Description: multi-accept-mutexes.patch