Re: Is async the answer?
On Jan 28, 2008 9:57 PM, William A. Rowe, Jr. <[EMAIL PROTECTED]> wrote: > Olaf van der Spek wrote: > > > > I agree that FastCGI is the better technical solution, I'm just > > stating that neither the Apache documentation nor the PHP > > documentation seems to state that. Even worse, they hardly document > > the FastCGI way at all. > > FastCGI is a technically subpar way to execute trusted, valid PHP. Why? Isn't memory (and other resource) consumption a lot lower because you don't need a PHP 'engine' for every thread/process? Even valid PHP code can crash, given bugs in PHP itself. And I think tons of users sometimes run untrusted or invalid PHP. > People have always been under some preconception that it's good to run > untrusted code in-process within httpd, while numerous "vulnerability" > reports in the past (and many to appear over the future) all bear out > that it's a really stupid idea. Given that the alternatives (FastCGI) isn't well documented, I don't think that's strange. > FastCGI is also a so-so way to get around libraries which aren't thread- > safe, running worker or event mpm's. Of course, using the 21st century > equivalents of those libraries probably isn't a bad solution either. Olaf
Re: Is async the answer?
On Jan 28, 2008, at 15:41, Akins, Brian wrote: On 1/28/08 3:29 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: I agree that FastCGI is the better technical solution, I'm just stating that neither the Apache documentation nor the PHP documentation seems to state that. Even worse, they hardly document the FastCGI way at all. The only reason I know is because at Apachecon in Austin (?) the php and httpd guys kissed and made up and said a bunch of stuff about fastcgi in a presentation. Unfortunately, neither he (John Coggeshall) nor I followed up by doing anything useful to the documentation of either product to reflect that. :-( -- Speech is conveniently located midway between thought and action, where it often substitutes for both. John Andrew Holmes
Re: Is async the answer?
Olaf van der Spek wrote: I agree that FastCGI is the better technical solution, I'm just stating that neither the Apache documentation nor the PHP documentation seems to state that. Even worse, they hardly document the FastCGI way at all. FastCGI is a technically subpar way to execute trusted, valid PHP. So is the handler method, the most efficient is the httpd 2 filter method which should work fine since John and I spent a bunch of time on it. However, only a CGI sapi or FastCGI can compartmentalize your untrusted PHP applications. People have always been under some preconception that it's good to run untrusted code in-process within httpd, while numerous "vulnerability" reports in the past (and many to appear over the future) all bear out that it's a really stupid idea. FastCGI is also a so-so way to get around libraries which aren't thread- safe, running worker or event mpm's. Of course, using the 21st century equivalents of those libraries probably isn't a bad solution either. Bill
Re: Is async the answer?
On 1/28/08 3:29 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > I agree that FastCGI is the better technical solution, I'm just > stating that neither the Apache documentation nor the PHP > documentation seems to state that. Even worse, they hardly document > the FastCGI way at all. The only reason I know is because at Apachecon in Austin (?) the php and httpd guys kissed and made up and said a bunch of stuff about fastcgi in a presentation. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer?
On Jan 28, 2008 9:22 PM, Jim Jagielski <[EMAIL PROTECTED]> wrote: > >> http://www.php.net/manual/en/ > >> faq.installation.php#faq.installation.apache2 > > > > "If you feel you have to use a threaded MPM, look at a FastCGI > > configuration where PHP is running in its own memory space." > > > > Is that what is meant by "Fastcgi is the "recommended way of using php > > and httpd 2, AFAIK. Isn't it???"? > > A single line seems a bit odd for the recommended approach. > > > > Consider that, for many people, the main advantage of Apache2 over > Apache1 is the worker MPM. Also consider that a threaded MPM and > mod_php aren't a happy couple. If using prefork, mod_php works > just dandy... but for other reasons you'd likely want to > consider FastCGI anyway... I agree that FastCGI is the better technical solution, I'm just stating that neither the Apache documentation nor the PHP documentation seems to state that. Even worse, they hardly document the FastCGI way at all.
Re: Is async the answer?
On Jan 28, 2008, at 2:35 PM, Olaf van der Spek wrote: On Jan 28, 2008 8:04 PM, Eric Covener <[EMAIL PROTECTED]> wrote: On Jan 28, 2008 12:36 PM, Olaf van der Spek <[EMAIL PROTECTED]> wrote: On Jan 25, 2008 6:18 PM, Akins, Brian <[EMAIL PROTECTED]> wrote: On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: Working on making a FastCGI based setup the recommended approach instead of mod_php is probably more important then async. Actually, it's a prerequisite. Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it??? Where can I read about that recommendation? I can't find it in the Apache or PHP manuals. mod_php appears to be *the* solution. http://www.php.net/manual/en/ faq.installation.php#faq.installation.apache2 "If you feel you have to use a threaded MPM, look at a FastCGI configuration where PHP is running in its own memory space." Is that what is meant by "Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it???"? A single line seems a bit odd for the recommended approach. Consider that, for many people, the main advantage of Apache2 over Apache1 is the worker MPM. Also consider that a threaded MPM and mod_php aren't a happy couple. If using prefork, mod_php works just dandy... but for other reasons you'd likely want to consider FastCGI anyway...
Re: Is async the answer?
On Jan 28, 2008 8:04 PM, Eric Covener <[EMAIL PROTECTED]> wrote: > On Jan 28, 2008 12:36 PM, Olaf van der Spek <[EMAIL PROTECTED]> wrote: > > On Jan 25, 2008 6:18 PM, Akins, Brian <[EMAIL PROTECTED]> wrote: > > > On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Working on making a FastCGI based setup the recommended approach > > > > instead of mod_php is probably more important then async. Actually, > > > > it's a prerequisite. > > > > > > Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't > > > it??? > > > > Where can I read about that recommendation? > > I can't find it in the Apache or PHP manuals. > > mod_php appears to be *the* solution. > > http://www.php.net/manual/en/faq.installation.php#faq.installation.apache2 "If you feel you have to use a threaded MPM, look at a FastCGI configuration where PHP is running in its own memory space." Is that what is meant by "Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it???"? A single line seems a bit odd for the recommended approach.
Re: Is async the answer?
On Jan 28, 2008 12:36 PM, Olaf van der Spek <[EMAIL PROTECTED]> wrote: > On Jan 25, 2008 6:18 PM, Akins, Brian <[EMAIL PROTECTED]> wrote: > > On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > > > > > > > Working on making a FastCGI based setup the recommended approach > > > instead of mod_php is probably more important then async. Actually, > > > it's a prerequisite. > > > > Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it??? > > Where can I read about that recommendation? > I can't find it in the Apache or PHP manuals. > mod_php appears to be *the* solution. http://www.php.net/manual/en/faq.installation.php#faq.installation.apache2 -- Eric Covener [EMAIL PROTECTED]
Re: Is async the answer?
On Jan 25, 2008 6:18 PM, Akins, Brian <[EMAIL PROTECTED]> wrote: > On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > > > > Working on making a FastCGI based setup the recommended approach > > instead of mod_php is probably more important then async. Actually, > > it's a prerequisite. > > Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it??? Where can I read about that recommendation? I can't find it in the Apache or PHP manuals. mod_php appears to be *the* solution. > > What about a hybrid approach? > > Async for network IO and other stuff that doesn't require sync calls, > > worker threads for other parts? > > That's kind of what I was thinking after Apachecon US this year. I won't > speak for others, but it seemed reasonable to most. However, after doing > several real world tests, I just don't honestly see that async will be a > huge improvement. Please prove me wrong with real world results. I'd be > more than happy to be wrong on this, really. I don't have real world test results. Have you tested the 30k scenario with an async web server? And do all platforms have such cheap threading as your test platform? > To be honest, I don't have strong feelings either way. I was surprised by > my results. I, now, think that completely rewriting the core to be async > *may be* a "waste of resources." If it fits nicely into some ideas on > reengineering buckets and brigades (ala serf stuff), and does not actually > decrease overall performance, then by all means do it. > > Remember, I'm partially playing devil's advocate as well... I noticed. ;)
Re: Is async the answer?
On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > Working on making a FastCGI based setup the recommended approach > instead of mod_php is probably more important then async. Actually, > it's a prerequisite. Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it??? > Having 30k threads still seems like a waste of resource to me though. Not if system is handling the load very well and "needs" 30k threads. My point was that 30k threads did not seem to be a "waste of resources." I doubt an async server would have used a significantly lower amount of resources because worker did not use a significant amount of resources. > What about a hybrid approach? > Async for network IO and other stuff that doesn't require sync calls, > worker threads for other parts? That's kind of what I was thinking after Apachecon US this year. I won't speak for others, but it seemed reasonable to most. However, after doing several real world tests, I just don't honestly see that async will be a huge improvement. Please prove me wrong with real world results. I'd be more than happy to be wrong on this, really. To be honest, I don't have strong feelings either way. I was surprised by my results. I, now, think that completely rewriting the core to be async *may be* a "waste of resources." If it fits nicely into some ideas on reengineering buckets and brigades (ala serf stuff), and does not actually decrease overall performance, then by all means do it. Remember, I'm partially playing devil's advocate as well... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer?
> We were using normal worker MPM with keepalives for this test. The current > "stable" event would have helped with idle keepalive threads, but the system > didn't seem to care. But when using mod_php, worker is not recommended, right? I doubt prefork scales as well as worker. Working on making a FastCGI based setup the recommended approach instead of mod_php is probably more important then async. Actually, it's a prerequisite. Having 30k threads still seems like a waste of resource to me though. What about a hybrid approach? Async for network IO and other stuff that doesn't require sync calls, worker threads for other parts? Olaf
Re: Is async the answer
On 1/20/08 10:44 AM, "Graham Leggett" <[EMAIL PROTECTED]> wrote: > In terms of space, caches are not infinite in size, but then neither are > the majority of backend websites either. 73GB is pretty big for a reverse proxy cache. And fast SAS drives are pretty cheap. > Sure, but I think the point that Brian was making was that you could > support the kind of large load sizes that are traditionally associated > with event based models using a prefork or worker setup, simply by > making sure you have enough RAM. And to stimulate some conversation. I just don't want us to "buy into" the "async is better" because that's the "trend" in servers nowadays. If async truly is better, then let's us it. Just don't want to do it "just because everyone else is." Also, this test included all sorts of clients (slow, fast, in between). A blocking thread didn't seem to hurt the server. I'm guessing that 48k blocking threads wouldn't hurt it too bad either. Also, I'm going to look at the serf "buckets" when I get time. Story of my life, though, no time... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/19/08 6:29 PM, "Davi Arnaut" <[EMAIL PROTECTED]> wrote: > This is true for expensive hardware and very well designed operating > systems and file systems.. and the space is not infinite. It depends on your definition of "expensive." All of our servers are fairly "commodity." The new linux fileserver I built at home is faster than most of ours, and it cost me less that $1k. It's all the "management/redundancy" stuff that makes "real servers" so expensive. I dual dual-core opteron with 8GB RAM is not all that "exotic." -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
Davi Arnaut wrote: This is true for expensive hardware and very well designed operating systems and file systems.. and the space is not infinite. Not at all - commodity hardware will serve just as well. The real killer in this case is the slow client, which can be one, two or three orders of magnitude slower than the average client. This means that it will hog one, two or three orders of magnitude more of the server backend's resources than the average request, and this is where a cache can be most effective. In terms of space, caches are not infinite in size, but then neither are the majority of backend websites either. But... OK. Back to the topic I thought that one of the key points of async/event based servers were that we use software to scale and not hardware (so that hardware is not the bottleneck)... like serving thousands of slow clients from commodity hardware. Sure, but I think the point that Brian was making was that you could support the kind of large load sizes that are traditionally associated with event based models using a prefork or worker setup, simply by making sure you have enough RAM. Very useful information to know. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Is async the answer
Graham Leggett wrote: > Davi Arnaut wrote: > >>> The proxy that the LiveJournal folks wrote, I think, copies all the data >>> from the origin server into a file and then uses sendfile to send to the >>> client... >> Doesn't this limit the network bandwidth to the bandwidth of the disk >> and/or file system? > > Yes, and the effective bandwidth of the disk can be significantly higher > than both the cache backend (which is often expensive) and the network > frontend (which has slow potential slow clients typing up your resources). > > Don't forget that your cache disk is most often RAM backed, meaning > effectively your cache disk is a ramdisk, with all the speed advantages > that go with it. > This is true for expensive hardware and very well designed operating systems and file systems.. and the space is not infinite. But... OK. Back to the topic I thought that one of the key points of async/event based servers were that we use software to scale and not hardware (so that hardware is not the bottleneck)... like serving thousands of slow clients from commodity hardware. -- Davi Arnaut
Re: Is async the answer
lör 2008-01-19 klockan 09:57 -0500 skrev Davi Arnaut: > Doesn't this limit the network bandwidth to the bandwidth of the disk > and/or file system? Depends on the working set and your amount of memory. If it's just temporary storage then no, as most data won't even hit the disk. If it's more of a cache then partially. Updates will use write bandwidth to the disks, and not so frequently accessed objects will use read bandwidth as well. the filesystem "bandwidth" is pretty negiable these days. Very close to raw I/O + memory cache. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Is async the answer
fre 2008-01-18 klockan 16:17 -0500 skrev Akins, Brian: > Paul Q and I have been kicking around the idea that even if we go to a > completely async core, etc. that modules could mark some hooks as "blocking" > and they would run basically how they do today. (One day, Paul, I'll > actually think about this more...) In the end you need a bit of mixture between the models to work out. threads or even processes for complex processing or libraries outside your control, and async for the basic core to keep it lightweight in resources/request/connection. There is quite a bit of research in programming models supporting mixed async/threaded/tasklet scheduling without forcing the programmer to know all details. Quite interesting reading if you haven't read those papers yet. For example the tame approach (C++ preprocessor using libasync) used by OKWS and it's related cousin tamer (more lightweight library) is quite fun to work with, at least in theory. Regarding CPU performance then you need a more complex workload than pure sendfile() shuffling of data to see much of a difference between threaded or async models. Especially if you look at smaller requests where the two almost diverge to the same model.. (N threads doing fast successive batch processing one request at a time with no wait time, or a event loop doing pretty much the same batching..). Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Is async the answer
Davi Arnaut wrote: The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Doesn't this limit the network bandwidth to the bandwidth of the disk and/or file system? Yes, and the effective bandwidth of the disk can be significantly higher than both the cache backend (which is often expensive) and the network frontend (which has slow potential slow clients typing up your resources). Don't forget that your cache disk is most often RAM backed, meaning effectively your cache disk is a ramdisk, with all the speed advantages that go with it. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Is async the answer
On Jan 18, 2008, at 2:16 PM, Justin Erenkrantz wrote: On Jan 18, 2008 10:52 AM, Akins, Brian <[EMAIL PROTECTED]> wrote: Which is why I hate to see a ton of work go into async core if it actually does very little to help performance (or if it hurts it) and makes writing modules harder. It braindead simple nowadays to write well behaved high performance modules (well, mostly) bcs you rarely worry about threads, reads/writes, etc. Full async programming is just as challenging as handling a ton of threads yourself. Speaking for myself, I think writing and using buckets with serf is more straightforward than our complicated bucket brigade system with mixed push/pull paradigms. +1... Although the whole concept of buckets and their brigades has some cool advantages, they are also a semi-constant source of issues...
Re: Is async the answer?
On Jan 18, 2008, at 12:03 PM, Akins, Brian wrote: This is just some ramblings based on some observations, theories, and tests. Partially "devil's advocate" as well. Most of us seem to have convinced our self that high performance network applications (including web servers) must be asynchronous in order to scale. Is this still valid? For that matter, was it ever? http://www.jimjag.com/imo/index.php?/archives/150-Long-time.html
Re: Is async the answer
Akins, Brian wrote: > On 1/18/08 3:07 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: >> That's not even a consideration, >> async is really for dynamic content, proxies, and other non-sendfile >> content. > > For dynamic stuff, "X-sendfile" works well. (Just really starting to play > with that, liking it so far). > > The proxy that the LiveJournal folks wrote, I think, copies all the data > from the origin server into a file and then uses sendfile to send to the > client... Doesn't this limit the network bandwidth to the bandwidth of the disk and/or file system? -- Davi Arnaut
Re: Is async the answer
Akins, Brian wrote: The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... The proxy enhancements that Niklas contributed do exactly this as well. It has a number of other advantages, such as slow clients not tying up a fast backend. It works very well. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Is async the answer
On Fri, 18 Jan 2008, Ruediger Pluem wrote: The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Erm, so does the one we wrote, mod_disk_cache ;p IMHO it doesn't for the first request of the entity (the request that causes the entity to be cached) Which is why it doesn't scale with large files, and I hacked it to do that to be usable with DVD images on ftp.acc.umu.se (http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 - you might remember the first try to merge some of it). Yes, it has its flaws, but it solves the problem for us. I think that some people has tried it in a proxy setting too with pretty OK result. But this was really off-topic ;) Getting to the point, I share Brians concerns with going async just for the async sake, for similar reasons: - People are having problems with making modules even thread safe (see mod_example), forcibly adding async to the mix will raise the bar even higher for people who needs to whip up a simple module. - Callback semantics are messy when they go wrong, debugging can be a pain. - Threads are rather cheap, even on linux since the advent of NPTL. - Performance benefits are unclear. Given that, there are obvious optimisations that can be, and have been, made. The ones in trunk aimed at not hogging a worker thread for simply writing the remaining data to the client for example. From what I've understood this class of changes doesn't really affect modules. Also, if there is a way of adding async having it optional in modules then I see no problem with adding it as long as there are cases where it actually helps, other than adding it to the supported buzzwords list ;) /Nikke - who probably ended up off topic after all ;) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- I'd love to, but I'm worried about my vertical hold. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: Is async the answer
On Jan 18, 2008 2:30 PM, Ruediger Pluem <[EMAIL PROTECTED]> wrote: > IMHO it doesn't for the first request of the entity (the request that causes > the entity to be cached) I'd expect the predominance of large numbers would reduce the impact of the one-time performance hit...but that conversion away from a fd and into flat buffers, I feel, is more of a bucket brigade problem than an intrinsic fault of mod_disk_cache. -- justin
Re: Is async the answer
On 01/18/2008 10:29 PM, Colm MacCarthaigh wrote: > On Fri, Jan 18, 2008 at 04:17:16PM -0500, Akins, Brian wrote: >> For dynamic stuff, "X-sendfile" works well. (Just really starting to play >> with that, liking it so far). > > It's not a solve-all though, I mean even though CGI's or whatever > /could/ write their output to a file and then call X-sendfile, it'd be a > disaster latency-wise. Ironically enough the only way to solve that is > ... async ;-) > >> The proxy that the LiveJournal folks wrote, I think, copies all the data >> from the origin server into a file and then uses sendfile to send to the >> client... > > Erm, so does the one we wrote, mod_disk_cache ;p IMHO it doesn't for the first request of the entity (the request that causes the entity to be cached) Regards Rüdiger
Re: Is async the answer
On Fri, Jan 18, 2008 at 04:17:16PM -0500, Akins, Brian wrote: > For dynamic stuff, "X-sendfile" works well. (Just really starting to play > with that, liking it so far). It's not a solve-all though, I mean even though CGI's or whatever /could/ write their output to a file and then call X-sendfile, it'd be a disaster latency-wise. Ironically enough the only way to solve that is ... async ;-) > The proxy that the LiveJournal folks wrote, I think, copies all the data > from the origin server into a file and then uses sendfile to send to the > client... Erm, so does the one we wrote, mod_disk_cache ;p -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: Is async the answer
On 1/18/08 3:07 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > That's not even a consideration, > async is really for dynamic content, proxies, and other non-sendfile > content. For dynamic stuff, "X-sendfile" works well. (Just really starting to play with that, liking it so far). The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Also, we have driven apache as a proxy as far as we have squid... Paul Q and I have been kicking around the idea that even if we go to a completely async core, etc. that modules could mark some hooks as "blocking" and they would run basically how they do today. (One day, Paul, I'll actually think about this more...) Having a request tied to one thread for its lifetime does make some things easier. If the underlying IO is asynchronous and its faster/scalable/fun, then, all the better. I just am not a big fan of the "callback" method that squid uses (or used last time I looked at it). Yes, its doable, but just seems "not quite right" to me. That's just my opinion. I'd like to be able to say, "hey httpd, write this stuff to the client" and it just happen wonderfully fast :) Currently, worker is doing a great job for us. Maybe async would be fine as well, especially if the serf buckets are as easy to use as Justin says. I just don't want us to say "we must be async" with no real reason other than "we must." -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On Fri, Jan 18, 2008 at 02:31:11PM -0500, Akins, Brian wrote: > On 1/18/08 2:20 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > > > I think so, in some environments anyway. If you have a server tuned for > > high throughput accross large bandwidth-delay product links then you > > have the general problem of equal-priority threads sitting around with > > quite a lot of large impending writes. > > Doesn't sendfile (and others) help in that case? Also RAM is cheap, > bandwidth isn't :) Oh if you can use sendfile, you use it sure, and whether its used async or not isn't going to make a big deal, all of the benefits are the zero copy, the DMA, the TOE, and so on. That's not even a consideration, async is really for dynamic content, proxies, and other non-sendfile content. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: Is async the answer
On 1/18/08 2:20 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > I think so, in some environments anyway. If you have a server tuned for > high throughput accross large bandwidth-delay product links then you > have the general problem of equal-priority threads sitting around with > quite a lot of large impending writes. Doesn't sendfile (and others) help in that case? Also RAM is cheap, bandwidth isn't :) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/18/08 2:16 PM, "Justin Erenkrantz" <[EMAIL PROTECTED]> wrote: > Speaking for myself, I think writing and using buckets with serf is > more straightforward than our complicated bucket brigade system with > mixed push/pull paradigms. It very well may be. Async may be easy. Except when my db connection blocks.. On stat calls.. Etc. I am by no means defending the buckets! Or anything for that matter... Just some observations. I just no longer buy into the idea that async is somehow inherently superior. It sounds good in theory, but in the "real world" I am just not seeing it. The whole reason I brought this up was to stimulate discussion. I really really would hate for us to spend many months porting everything over to async to discover that it made no positive impact on performance. Worse, it made extending httpd (or "D") much harder. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On Fri, Jan 18, 2008 at 01:52:02PM -0500, Akins, Brian wrote: > On 1/18/08 12:18 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > > Hmmm, it depends what you mean by scale really. Async doesn't help a > > daemon scale in terms of concurrency or throughput, if anything it might > > even impede it, but it certainly can help improve latency and > > responsivity greatly. On the whole, it's easy to see how it might make > > the end user experience of a very busy server much more pleasant. > > I also wonder is that has actually been tested or if it's just a "factoid"? I've tested, and it met my expectations on Linux 2.6 on Itanium, but I can't guarantee that the experiments were free from my own bias I guess. > >> Response time never increased in any measurable amount. > > > > I suspect it might though if the scheduler became bound, async would > > route the interupts more efficiently. > > But, I wonder if the scheduler would become bound in a "reasonable" amount > of traffic. I think so, in some environments anyway. If you have a server tuned for high throughput accross large bandwidth-delay product links then you have the general problem of equal-priority threads sitting around with quite a lot of large impending writes. Having them all in the polling loop is inefficient, and async is going to reduce the latency a little, though granted these days we may be talking about nanoseconds. And I guess responsivity and high BDP don't go together anyway, due to the speed of light. > > The scalability wars should really be over, > > everyone won - kernel's rule :-) > > Which is why I hate to see a ton of work go into async core if it actually > does very little to help performance (or if it hurts it) and makes writing > modules harder. It braindead simple nowadays to write well behaved high > performance modules (well, mostly) bcs you rarely worry about threads, > reads/writes, etc. Full async programming is just as challenging as > handling a ton of threads yourself. I think if it interests people and they want to work on it, cool stuff, but don't neccessarily expect any actual pay-off in terms of performance. One of the great things about an open source project is that sometimes what gets worked on isn't driven by considerations other than what people feel like working on. I'd be less worried about the effect on modules, many module authors already can't be bothered to make their modules thread-safe, but prefork still exists (and scales quite well, on many platforms). -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: Is async the answer
On Jan 18, 2008 10:52 AM, Akins, Brian <[EMAIL PROTECTED]> wrote: > Which is why I hate to see a ton of work go into async core if it actually > does very little to help performance (or if it hurts it) and makes writing > modules harder. It braindead simple nowadays to write well behaved high > performance modules (well, mostly) bcs you rarely worry about threads, > reads/writes, etc. Full async programming is just as challenging as > handling a ton of threads yourself. Speaking for myself, I think writing and using buckets with serf is more straightforward than our complicated bucket brigade system with mixed push/pull paradigms. YMMV. -- justin
Re: Is async the answer
On 1/18/08 12:18 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > Hmmm, it depends what you mean by scale really. Async doesn't help a > daemon scale in terms of concurrency or throughput, if anything it might > even impede it, but it certainly can help improve latency and > responsivity greatly. On the whole, it's easy to see how it might make > the end user experience of a very busy server much more pleasant. I also wonder is that has actually been tested or if it's just a "factoid"? >> Response time never increased in any measurable amount. > > I suspect it might though if the scheduler became bound, async would > route the interupts more efficiently. But, I wonder if the scheduler would become bound in a "reasonable" amount of traffic. > discussions on scalability baffling, the reality is that modern hardware > can outscale pretty much any amount of bandwidth you can buy regardless > of the software. Bandwidth generally isn't an issue for us anymore (thanks to gzip). We can still overrun the CPU with small objects requests/responses. On "large" objects (ie, over 16k or so), the CPU is bored when multiple gig interfaces are full. > The scalability wars should really be over, > everyone won - kernel's rule :-) Which is why I hate to see a ton of work go into async core if it actually does very little to help performance (or if it hurts it) and makes writing modules harder. It braindead simple nowadays to write well behaved high performance modules (well, mostly) bcs you rarely worry about threads, reads/writes, etc. Full async programming is just as challenging as handling a ton of threads yourself. My $.02 US worth (which ain't much). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer?
On Fri, Jan 18, 2008 at 12:03:02PM -0500, Akins, Brian wrote: > Most of us seem to have convinced our self that high performance network > applications (including web servers) must be asynchronous in order to scale. > Is this still valid? For that matter, was it ever? Hmmm, it depends what you mean by scale really. Async doesn't help a daemon scale in terms of concurrency or throughput, if anything it might even impede it, but it certainly can help improve latency and responsivity greatly. On the whole, it's easy to see how it might make the end user experience of a very busy server much more pleasant. > It seems that modern OS's (this was Linux 2.6.something) deal with the > "thread overhead" and all the context switches very well. All the stuff > mentioned in the "the c10k problem" ( http://www.kegel.com/c10k.html) didn't > seem to apply. We could have easily doubled the amount of connections to > the server, I think. The c10k page has been hopelessly out of date for a long long time, I wrote to Dan Kegel some time about (maybe 3 or 4 years) pointing this out, but there's been no update :/ > Response time never increased in any measurable amount. I suspect it might though if the scheduler became bound, async would route the interupts more efficiently. > Yes, we are using sendfile, mmap, etc., so zero-copy helps us a lot. > > So, do we need apache 3 (or whatever it's called) to be fully asynchronous? > Is that just us reacting to "the market" trends, ie, lighttpd? Who knows, no harm in doing it anyway, if it's what interests people, cool. Personally I find comparisons between webservers and most discussions on scalability baffling, the reality is that modern hardware can outscale pretty much any amount of bandwidth you can buy regardless of the software. And to that end, the software is all near identical in the pipelines of syscall's used (hell even IIS) - which is what really matters. Most discussions seem to centre on some mindlessly ignorant comparison based on the suitable of defaults to a particular set of circumstances coupled with religion. The scalability wars should really be over, everyone won - kernel's rule :-) > All the apache httpd "is bloated and slow" is just plain horse crap. It's > not that hard to configure apache to be "fast." C programming is my > "hobby," and it's not that hard to write modules that don't do stupid things > and kill the performance. Yep! -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Is async the answer?
This is just some ramblings based on some observations, theories, and tests. Partially "devil's advocate" as well. Most of us seem to have convinced our self that high performance network applications (including web servers) must be asynchronous in order to scale. Is this still valid? For that matter, was it ever? We just ran a large scale test on a busy website (won't mention the name...) and ran about 95% of production traffic on a single server. This was about 30k connections. We set maxclients to 50k. Server did fine, had about 4GB Ram free and 55% cpu idle. This was the full production config, not stripped down or anything, using our cache and proxy and several "stock" modules. Granted these were fairly "beefy" servers, but nothing extraordinary: 2xdual core 2.4 Ghz CPU's with 8GB RAM, normal non-TOE Ethernet (but with checksum's on card). It seems that modern OS's (this was Linux 2.6.something) deal with the "thread overhead" and all the context switches very well. All the stuff mentioned in the "the c10k problem" ( http://www.kegel.com/c10k.html) didn't seem to apply. We could have easily doubled the amount of connections to the server, I think. We were using normal worker MPM with keepalives for this test. The current "stable" event would have helped with idle keepalive threads, but the system didn't seem to care. Response time never increased in any measurable amount. Yes, we are using sendfile, mmap, etc., so zero-copy helps us a lot. So, do we need apache 3 (or whatever it's called) to be fully asynchronous? Is that just us reacting to "the market" trends, ie, lighttpd? All the apache httpd "is bloated and slow" is just plain horse crap. It's not that hard to configure apache to be "fast." C programming is my "hobby," and it's not that hard to write modules that don't do stupid things and kill the performance. The biggest thing we do in our modules is to make trade-offs to avoid locking. Ie, we are happy to "waste" a few MB of RAM with some "per-module scoreboards" than to use per-proc or global locking. Most of our counters are per thread, and we just add them up when someone access the counter (ie, via mod_status). This made a huge difference in our performance. Also, don't get in httpd's way. Let the core handlers handle as much as possible. Just "encourage" them when needed. They are battle tested and improvements made there help everything out if you haven't tried to rewrite it in your own module. Like I said, just some ramblings. We may experiment with the current event MPM some more, but I honestly do not see the huge benefit to moving to a fully async IO architecture. It's very easy to write modules in the current "one thread per request" model. People will screw up the async thing and make it slower anyway, probably. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies