On 11/03/2010 10:24 PM, [email protected] wrote: > On Wed, 3 Nov 2010, Paul Graydon wrote: > >> On 11/3/2010 6:32 PM, [email protected] wrote: >>> On Wed, 3 Nov 2010, Paul Graydon wrote: >>> >>>> I'm facing an interesting challenge at the moment. Our Apache httpd >>>> based load balancers are starting to show signs of strain. Nothing >>>> too >>>> bad but a good indicator that as the amount of traffic to our sites >>>> increases there will come a point when they can't cope. I've been >>>> expecting this but at the moment as a "Standalone" sysadmin I've >>>> got too >>>> much on my plate to even get on to anything pro-active that requires >>>> more than a few hours work.. with inevitable consequences, though I'm >>>> making favourable progress. Load is now reaching a stage where it's >>>> spawning enough httpd sessions to be of some concern and at a level >>>> that >>>> seems to be resulting in latency for requests. >>> >>> a couple quick comments >>> >>> 1. Nginx is single-threaded, so while it's screaming fast, it won't >>> use more than one core. >> >> Hmm.. given we use 2 IP addresses for what I'm assuming are >> historical reasons (1 virtually does nothing), I suppose I could do >> an ugly manual load balance and run two instances of nginx, but >> that's not so ideal! > > you could run LVS to load balance across multiple instances of nginx > if you had to, but that starts getting ugly. the question is where are > you spending your time. If all of your time is being spent in openssl > libraries doing SSL, then changing to nginx won't buy you very much, > but going to faster cpus (or more cores) or adding a SSL accelerator > card will give you huge benifits. > > it could also be that your apache boxes end up with a lot of processes > running because they are waiting for the back-end servers to respond. > If this is the case then no change you make on the front-end will help. > >>> >>> 2. If you are doing a lot of SSL operations, consider adding a SSL >>> accelerator card, that can effectivly eliminate the overhead of SSL. >>> >>> 3. how tuned is your apache instance? >>> >>> I've seen 10x performance improvements by doing things like >>> compiling the modules I need in (instead of using .so modules) and >>> not loading any modules that I don't need. Combined with newer >>> hardware (two sockets will get you 12 cores nowdays), you could >>> easily scale quite a bit from your existing capibilities without >>> having to take the risk of changing technologies. >> >> At the moment it's using stock CentOS packages, I was hoping to avoid >> compiling from source but if that's the best bet and will have that >> kind of an impact it'll be worth the trade off. > > it depends on how many things you need, but there are a lot of things > in the stock apache config that really hurt performance. > > for example, just supporting .htaccess means that for every hit, > apache must check to see if a .htaccess file exists in that directory, > or any parent directory, before being able to service the request. > > definantly give this a try. I only use distro apache packages when the > apache instance is being used for very low-volume tasks. If apache is > the central reason for the box existing, I always take the time to > compile and optimize it. > >>> apache is pretty inefficient in how it logs, try logging to a >>> ramdisk and see if that makes any difference. >> Hmm.. I'd really rather not run any risk of losing logs, but one of >> the logs could probably go that way. > > try it for a short time to see if it hurts you or not. If this > matters, get yourself a raid card with battery backup on it and you > will get speeds almost as fast as a ramdisk (and if you are that > worried about logs, you need to be writing them to a mirrored array at > least anyway) > >>> >>> check what you have set for your ssl session cache, if it's not in >>> shared memory, move it there (the overhead of filesystem operations >>> for shared disk, even if you almost always operate in ram disk >>> buffers, can be noticable at high traffic levels >> Hmm.. /var/cache/mod_ssl . That's definitely something that can be >> easily moved. Thanks. >>> >>> Definantly measure where your latency is happening. It could be that >>> apache is the problem, but it could also be that you are running >>> into something else. > > take a look at ab (apache benchmark, part of apache) and httperf for > tools to allow you to throw load at the system, create a custom log > format that logs the performance stats (%D among others). you need > more info to see what's going on.
A (weird, but effective) anti-scraper script that runs on the box collects certain data and throws it off to a back-end database (extremely lightweight, not tied in to apache directly and is run as low priority). Whilst it's grabbing response times it wasn't actually pushing the data into the database. Quick modification this morning and I'm now logging relevant data. Left column is AVG response times in seconds, right the hostnames (names altered to protect the innocent). Web1 and 2 are both older boxes that handle the bulk of the traffic currently, web5 is a new box that is slowly getting stuff migrated to it once various bits of testing have been carried out (different JVM and a few other bits). Web4 doesn't actually host live sites, it's got a CMS on there that generates static content which web1 & 2 host. | 0.03180483 | web5 | | 0.11877206 | web2 | | 0.12424236 | web3 | | 0.14441832 | web1 | | 0.21145667 | web4 | That's looking pretty reasonable response times to me, I'm happy with that in general. A quick look at css and jpg specifically shows < 0.09s responses, mostly even < 0.009s. > also look at autobench, it uses httperf to run a series of tests > against the box, increasing the number of simultanious connections so > you can map where the bos (or in your case, system of boxes) start to > fail. then you can try changing things and see that number shift. >>> how many processes are you seeing that is making you concerned? >> >> I couldn't give you an solid figure, but based on memory usage >> compared to current I'd guesstimate at 120+ and I swear we're not >> doing that much traffic. I've added that to zabbix so I'll have a >> better idea tomorrow. Even now during what is a quiet time for us I'm >> bouncing between 50 and 80, tuned: >> >> StartServers 8 >> MinSpareServers 5 >> MaxSpareServers 20 >> ServerLimit 256 >> MaxClients 512 >> MaxRequestsPerChild 20000 > > set maxspare servers _much_ higher, if this box is dedicated to the > task, set the maxspare to the same as your serverlimit, and set > startservers pretty high as well. > > even with the phenominal forking speed that linux has, there is still > a lot of overhead in starting and stopping a apache process (less in > the fork than in all the other setup for starting, but it hurts). just > avoiding the thrashing can gain you quite a bit. > > also install the sysstat package and run iostat during heavy load to > see what your disk I/O is looking like, you may be surprised at how > much you are hitting it. > > depending on what you are using it for and how you have it configured, > apache can handle from hundreds of connections/sec to 10s of thousands > of connections a sec on the same hardware (admittedly, at 10s of > thousands, all it can do is serve static or cached content, but > sometimes that's what you need) > > David Lang I've followed your advice and bumped maxspare to the same as ServerLimit (and fixed that MaxClients entry). CPU usage dropped a good 10% on each core, and we seem to have peaked at 216. I'll be keeping an eye on it over the next few days and tweaking it some more. Tests of MPM Worker in the dev environment are fairly encouraging, doesn't look like anything is broken so far, but it'll need at least another couple of weeks before I'll be happy with it. Is it generally considered worth pursuing or am I wasting my time with it? Paul _______________________________________________ Tech mailing list [email protected] http://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
