Re: [Catalyst] Scalable Catalyst
I think that the mod_perl mailing list would also be interested in this - there are very few people on that list with practical examples of multi-thread. As far as I'm aware pre-fork is still pretty much the only model recommended. Alejandro Imass wrote: Ok. What would you have done? - not meant as a defensive question but really, we would like to hear options for this application! I would've probably pushed for a change in the architecture, so that the browser makes a request then polls for results. Don't under-estimate the ability of users to hammer the F5 button because the page has taken 2 seconds longer to come back than they expected! However I do find your choice of solution interesting, as you've essentially managed to get a fairly out-of-the-box solution working. There are a bunch of things that could be done to process this type of workload quicker, but with the disadvantage that you've got a bigger custom code-base to maintain. I'm curious about the memory differences between pre-fork and threaded in mod_perl from your testing. General mod_perl advice is to pre-load as much perl code and data as possible and take advantage of the copy-on-write aspects of VM. Did you push this? How much difference was there between the models? Carl ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
On Thu, Jul 2, 2009 at 4:08 AM, Carl Johnstonecatal...@fadetoblack.me.uk wrote: I think that the mod_perl mailing list would also be interested in this - there are very few people on that list with practical examples of multi-thread. As far as I'm aware pre-fork is still pretty much the only model recommended. Hmmm. Well I think that many people are still stuck to the paradigms of mod_perl previous to 2.0 and Apache 2.0.x I didn't try this by mere chance... after some research we found two interesting references that led me in this direction: 1) Scaling Apache 2.x beyond 20,000 concurrent downloads by Colm MacCárthaigh c...@stdlib.net 21st July 2005. Look at 3.1 Choosing an MPM. The results in 2005 were promising, I figured that by 2008 many of these things were overcome. This paper also helped me in fine-tuning the Linux VM and led me to try FreeBSD. 2) Practical mod_perl O'Reilly Edition May 2003. Look at 24.3 What's new in mod_perl 2.0, specifically 24.3.1 Thread Support. When I read this section, I decided to give this a try, as it was perfect for our blocking processes problem (i.e. having many light-weight threads blocking for several seconds each). Alejandro Imass wrote: Ok. What would you have done? - not meant as a defensive question but really, we would like to hear options for this application! I would've probably pushed for a change in the architecture, so that the browser makes a request then polls for results. Don't under-estimate the ability of users to hammer the F5 button because the page has taken 2 seconds longer to come back than they expected! There was no change of architecture possible. The other service takes 1 to 7 seconds to respond. We are a B2B server: there is no final user here. We are just an intermediate server that needs to hold the load to the slow service. However I do find your choice of solution interesting, as you've essentially managed to get a fairly out-of-the-box solution working. There are a bunch of things that could be done to process this type of workload quicker, but with the disadvantage that you've got a bigger custom code-base to maintain. Not at all the case. We chose Catalyst for it's design pattern and multiple deployment options, plus the fact that we do a lot of work in Catalyst and in Perl in general. Our own performance was not an issue. The issue was the blocking calls to the other service. I'm curious about the memory differences between pre-fork and threaded in mod_perl from your testing. General mod_perl advice is to pre-load as much perl code and data as possible and take advantage of the copy-on-write aspects of VM. Did you push this? How much difference was there between the models? Oh, of course. We did some profiling to revise and reduce memory problems, including valgrind and alike and we stripped our Catalyst app to the bare essentials: For example, our API has HTML form-based as well as XML capabilities, and FormBuilder was used for form validations, etc. ; this proved particularly costly so FormBuilder so it had to be removed, as well as several other plugins for the same reason. For your reference, the app was about 30MB in the stand-alone Catalyst server. Curiously, not very much different in 64Bits and in pre-fork the average size of each apache process was not very different either I don't have the exact number w/ me but they were around 40MB each RSS. With 64 Threads the base Apache process grew to 115MB (170MB VSZ) and each actual serving process flattened out to 200MB RSS (750 VSZ) after several thousand hits. So if you have enough CPU power, the RAM benefits are great. What I have seen happening many times, including our case is that CPU was either sub-utilized or very busy waiting for swap disk. Just FYI: A similar prototype in POE was about 20MB, so no major savings there! That's when we looked to AnyEvent with EV (from Perl). This last prototype did prove a viable option, but the TCO was worrying and we had already invested and completed the Catalyst app. If I were to re-write from scratch I would probably revisit the AnyEvent/EV path. For an app that has to block thousands of incoming HTTP requests for several seconds there are really very few options and at some point you have to dedicate a process or thread (LWP on Linux and alike) to each one. When we realized this, we started looking for references on this and just decided to try with threaded mod_perl. The results, as you can see, were really amazing. Furthermore, the ability to easily develop the app logic with the stand-alone server (the MVC design pattern, an ORM like CDBI, etc.) and being able to deal with the deployment problems without (or little) change in the code was just phenomenal, proving that Catalyst is a very good choice (if not the best) for developing any high-end application. IMHO a lot of the thread paranoia is true before Perl 5.8, mod_perl 2.0 and Apache 2.0.x. Can't say for sure, as I am no expert by any means,
Re: [Catalyst] Scalable Catalyst
Hi! Sorry for the lethargy, I've buried in a project and just recently saw the light of day :-) Yes, you are correct [Tomas], BUT it all depends on the type of application. Web concurrency is often misinterpreted. The application I was referring to needs the ability to have many, many concurrent processes waiting for a response from another service which has a long response time. So in this case, having many, many threads sitting there waiting for a response is the way to go. Web concurrency is usually a balance between: 1) The available RAM which limits the number of processes/threads you can load. 2) The time it takes for you to process a given request, and the CPU power required to do it. Concurrency tests with HTTPerf and AB will give you a good idea on requests/per second on a number of given processes/threads and monitoring CPU usage with top or similar, it's empirical but it works. This way you can estimate the limits on the actual number of parallel processes (be it processes or threads) that your machine is actually able to crank. Multi-threaded mod_perl (with Apache mod_worker) will only be an advantage if you actually have the CPU power to process the threads in parallel. If not, it just becomes sequential on the available CPU time per thread. On the other hand, the usual case is that the CPU load is low with respect to RAM usage in the traditional process-only model (pre-fork), because each process being so large, your RAM fills up with very few processes, so your not taking full advantage of your CPU power. By using mod_perl under mod_worker you can use considerably less RAM and put more actual work on the CPUs, but that comes back to my original comment at the top: it all dependes on the application. There are always too many things to consider, such as static content, file uploads, streaming content and other stuff wich are most surely managed better outside of your application. Also, as you state, today's large applications should run behind reverse proxies/balancers that can also pickup the tab on static serving and other optimizations. This is a very interesting diverse and complex subject, but the main idea of my post was to state that Catalyst works well under multi-threaded Apache with mod_perl, allowing, _in some cases_ better usage of the available resources. It does not apply, of course, to all cases, and your insight explains this very well. BTW, Ashley suggested I write a how-to on the WIki or something like that. Could some suggest exactly where, and I may have time to that this week. Best, Alejandro Imass On Fri, May 1, 2009 at 8:01 AM, Tomas Doranbobtf...@bobtfish.net wrote: Alejandro Imass wrote: Anyway, the message is that with mod_worker/mod_perl you can spawn _thousands_ of threads, getting impressive concurrency (without counting the mutex). We have tested Catalyst applications that handle _thousands_ of concurrent requests using off the shelf AMD 64Bit HW and 12Gb RAM, with a Catalyst app of about 20MB RSS. There is a big difference between having thousands of requests in-flight at once, and serving thousands of new requests a second. You're saying that mod_worker can do the former well, without mentioning the latter. I'd very much guess that in your configuration, most of your workers (and requests) are just pushing bytes to the user, which isn't really a hard job.. :_) The reason that normal mod_perl fails at this is you have one process per request, and so having many many requests in flight at once hurts. However, if you have thousands of requests all trying to generate pages at once, you're going to fail and die - full stop... perl -e'system(perl -e\while (1) {}\ \) for (1..1000)' will convince you of this if you aren't already :) You can trivially get round this by having a _small_ number of mod_perl processes behind a proxy, so that your (expensive/large) mod_perl process generates a page, then throws it at network speed (1Gb/s or higher if you're on localhost) to the proxy, which then streams it to the user much much slower. This frees up your mod_perl processes as quickly as possible to be getting on with useful work. I'd also note that having more threads/processes generating pages than you have CPU cores is fairly inefficient, as the more processes you have, the greater the penalty you're going to incur due to increased context switching overhead. ( Quite often you block on the database in most apps, which means that 1 process per CPU core doesn't hold totally true for best throughput, so YMMV.. For the record, one of my apps can trivially do 200 requests a second, with 3000+ concurrent requests in-flight, using a single 4Gb dual core x64 box with one disk, running both the application _and_ the mysql server.. It flattens the 100Mb pipe to the internet I have in the office waaay before the system actually starts to struggle from a load perspective.. That's nginx / fastcgi with 3 fcgi worker
Re: [Catalyst] Scalable Catalyst
On 30 Jun 2009, at 11:58, Alejandro Imass wrote: Hi! Sorry for the lethargy, I've buried in a project and just recently saw the light of day :-) Yes, you are correct [Tomas], BUT it all depends on the type of application. Web concurrency is often misinterpreted. The application I was referring to needs the ability to have many, many concurrent processes waiting for a response from another service which has a long response time. So in this case, having many, many threads sitting there waiting for a response is the way to go. You're doing it wrong. Don't block app server threads on a remote service if you have a slow remote service, the only thing that lies down that route is doom and fail. Use a job queue or something so that your application servers don't sit waiting for slow remote services. If you really actually need to block architecturally, then a heavy weight application server is just the wrong solution, full stop. I'm sure that the worker mpm will give you more headroom if you have loads of mod_perl processes blocking than prefork, but I very much consider this to be an optimisation, not a solution. That said, I'm not trying to be disparaging, and I'm happy this works for you, and is a viable option :) The one thing that worries me about this is that it uses threads, and threads and perl don't get on.. For example, Moose's string style type constraints are a bad world (because the regex used makes various versions of at least perl 5.8 core dump). I don't think that this is an issue for anything in the Catalyst stack, as we're either type-constraint free, or use exclusively MooseX::Types stuff (which doesn't use those regular expressions, and therefore is safe, I think) - but may be a problem for other code.. I can certainly remember an un-fun world of apache puking it's guts on a coworker's machine as he was using the worker mpm.. So YMMV, depending on what portion of CPAN you use, and/or what your codebase looks like... :-/ This is a very interesting diverse and complex subject, but the main idea of my post was to state that Catalyst works well under multi-threaded Apache with mod_perl, allowing, _in some cases_ better usage of the available resources. It does not apply, of course, to all cases, and your insight explains this very well. Indeed - it all very much depends on the application load profile etc - so all of this is just painting the bike shed, unless we're discussing a specific application on a real (i.e. something like production) set of hardware, and have actual performance metrics we want to ensure it fulfill. :) BTW, Ashley suggested I write a how-to on the WIki or something like that. Could some suggest exactly where, and I may have time to that this week. http://dev.catalyst.perl.org/wiki/deployment Making a section linked from 'Apache' in there, briefly outlining your config and what benefits you get from doing mod_perl this way would be great :) Cheers t0m ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
On Tue, Jun 30, 2009 at 2:42 PM, Tomas Doranbobtf...@bobtfish.net wrote: You're doing it wrong. Don't block app server threads on a remote service if you have a slow remote service, the only thing that lies down that route is doom and fail. I don't see the problem. In fact, this was the _main and central_ point of our application: to handle as many concurrent requests acting like a dam to the slower service. At first, it seems like a perfect application for EDA, but any attempt in this sense failed miserably... Use a job queue or something so that your application servers don't sit waiting for slow remote services. I tried POE, AnyEvent and several other EDA patterns, tests, etc. making an external Job Manager, and our customer even resorted to hire someone to hack the Catalyst Engine making it even-based and it got really ugly in the actual tests. Threads with mod_worker worked fine and proved to be far more stable/scalable than any other option we tried. Incredible as it sounds, multi-threaded was way more stable than the other paths we tried. The way I see it LWP was exactly the solution we needed for this. If you really actually need to block architecturally, then a heavy weight application server is just the wrong solution, full stop. In this case, practical proved better than theory. We heavily tested in FreeBSD 6 and 7 and LInux 2.6.20+. The only thing we found was that Linux's VM still needs a lot of work once you start heavily using swap, it usually doesn't recover; it's usually faster switching than FreeBSD 6, and even 7 but a lot less stable. FreeBSD recovered in every case. I'm sure that the worker mpm will give you more headroom if you have loads of mod_perl processes blocking than prefork, but I very much consider this to be an optimisation, not a solution. Ok. What would you have done? - not meant as a defensive question but really, we would like to hear options for this application! believe me, we tried almost everything. The event-based Catalyst engine was a great idea, but the actual tests proved that it really had no real advantages over process/thread with mod_worker. Besides, it proved very problematic, unstable and hard to maintain in the long run. I remembered my previous experiences while working on Enterprise Service Bus technology and our results were very similar to Welsh's final conclusions on SEDA: It is also worth noting that a number of recent research papers have demonstrated that the SEDA prototype (in Java) performs poorly compared to threaded or event-based systems implemented in C. This would seem to contradict the findings in my work. (For more information I invite you to read recent papers by Vivek Pai's group at Princeton and the Capriccio work from UC Berkeley Actually, no further work after SEDA has ever proved that Event-Based is actually better than process/thread. I confirmed this with people at Priceton (Vivek Pai himself) about a year ago. As stated, we tried AnyEvent and POE and fiddled with the idea of implementing some kind of Async mechanism to/from the client. It's a B2B server so the syncing problems could have gotten very messy. We needed to block anywhere from 1 to 7 seconds and we expected several thousand hits per app server to make it cost effective. This is what we came up with Catalyst: 50 Apache processes with 64 threads each for a total of 3200 threads using about 10GB of RAM (12GB is actually on each server, for the OS, the mutex and to avoid swapping at all costs). The 3200 blocking load can be handled with 2 dual core AMD 64 Bit processors (4 are actually used). not a bad deal cost-wise and it works. Each 64 thread process grows to about 200MB RSS ( 750MB VSZ), plus a base Apache root process of about 115MB. So we have 3200 actual blocking requests + the Apache mutex, so the overall capacity of each app server is quite decent for the price. At the moment we reset each process every 1 requests with no observable or serious leaks or otherwise any sort of problems. We take about 65ms per request in our whole processing but sadly we have to block for the amount stated above. The app does have some complexity because it does considerable XML processing with LibXML 2 (DTD validations, XSL transformatiosn, XPath queries and such), database interaction with MySQL and talks to another server via LWP::UserAgent (this is where each thread blocks). We chose Catalyst precisely because of the flexibility to adapt to different scenarios (mod_perl, FastCGI, etc.) with little or no code change. That said, I'm not trying to be disparaging, and I'm happy this works for you, and is a viable option :) It did and it works fine. This is why I wouldn't discourage people to use mod_worker, although as we have both agreed it all depends on the actual application. The one thing that worries me about this is that it uses threads, and threads and perl don't get on.. For example, Moose's string style type I've done many
Re: [Catalyst] Scalable Catalyst
Hi all, just my $0.02 here 100 processes per second is not all that impressive for a Catalyst app. I have tested Catalyst in mod_perl with mod_worker (multi-threaded apache/perl), in Linux 2.6 and FreeBSD 6.4, 7.x With mod_worker, you can spawn processes each with an X number of threads each. Memory sharing in mod_worker/mod_perl is good and you can play with the amount of requests each process can handle before it is killed and restarted, this way you don't have to worry about leaky modules/code. In general, I have used relatively simple applications and others that use FormBuilder, and even some XS modules from Catalyst such as XML::LibXML, and in general the leaking is pretty decent, but againg not to worry (see above). Compiling Apache and mod_perl for threads is not that hard, and there are many variables you can play with in the Apache conf. Anyway, the message is that with mod_worker/mod_perl you can spawn _thousands_ of threads, getting impressive concurrency (without counting the mutex). We have tested Catalyst applications that handle _thousands_ of concurrent requests using off the shelf AMD 64Bit HW and 12Gb RAM, with a Catalyst app of about 20MB RSS. Best, Alejandro Imass On Sat, Apr 18, 2009 at 11:22 AM, Juan Miguel Paredes juan.pare...@gmail.com wrote: On Sat, Apr 18, 2009 at 2:42 AM, Graeme Lawton glaw...@alola.org wrote: Yeah, I was reading this the other day. Does anyone know if they use DBIC? Apparently, yes... ...The team which produces the web side server components for BBC iPlayer is expanding. We use Catalyst, DBIx::Class and TT to deliver a reliable and hugely popular website, as well as feeds to other teams within the BBC... http://london.pm.org/pipermail/jobs/2009-March/000189.html ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
On Apr 30, 2009, at 8:50 AM, Alejandro Imass wrote: Anyway, the message is that with mod_worker/mod_perl you can spawn _thousands_ of threads, getting impressive concurrency (without counting the mutex). We have tested Catalyst applications that handle _thousands_ of concurrent requests using off the shelf AMD 64Bit HW and 12Gb RAM, with a Catalyst app of about 20MB RSS. Hey, Alejandro! You should really write up the way you did it for the wiki or an article somewhere. Please! -Ashley ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
My math brings it to about 100 hits per second, rather than 1000, unless I'm reading things wrong. 9 million page views a day = 9,000,000/(60*60*24) = 104.16/sec Still an impressive feat for dynamically generated pages. Peter Edwards wrote: I was writing a blog entry http://dragonstaff.blogspot.com/ about the state of play of Catalyst and DBIC and came across this BBC iPlayer 2 uses Catalyst to handle 1000 hits per second: http://www.bbc.co.uk/blogs/bbcinternet/2008/12/iplayer_day_performance_tricks.html Cheers, Peter http://perl.dragonstaff.co.uk ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
Oops. Not paying attention: gets up to nearly one thousand concurrent requests per second. Joe Cooper wrote: My math brings it to about 100 hits per second, rather than 1000, unless I'm reading things wrong. 9 million page views a day = 9,000,000/(60*60*24) = 104.16/sec Still an impressive feat for dynamically generated pages. Peter Edwards wrote: I was writing a blog entry http://dragonstaff.blogspot.com/ about the state of play of Catalyst and DBIC and came across this BBC iPlayer 2 uses Catalyst to handle 1000 hits per second: http://www.bbc.co.uk/blogs/bbcinternet/2008/12/iplayer_day_performance_tricks.html Cheers, Peter http://perl.dragonstaff.co.uk ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
Yeah, I was reading this the other day. Does anyone know if they use DBIC? For each query we get we build an abstraction we call our blocklist. Makes me wonder if they are using their own in house db abstraction? Graeme 2009/4/17 Joe Cooper j...@virtualmin.com: Oops. Not paying attention: gets up to nearly one thousand concurrent requests per second. Joe Cooper wrote: My math brings it to about 100 hits per second, rather than 1000, unless I'm reading things wrong. 9 million page views a day = 9,000,000/(60*60*24) = 104.16/sec Still an impressive feat for dynamically generated pages. Peter Edwards wrote: I was writing a blog entry http://dragonstaff.blogspot.com/ about the state of play of Catalyst and DBIC and came across this BBC iPlayer 2 uses Catalyst to handle 1000 hits per second: http://www.bbc.co.uk/blogs/bbcinternet/2008/12/iplayer_day_performance_tricks.html Cheers, Peter http://perl.dragonstaff.co.uk ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/ ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/
Re: [Catalyst] Scalable Catalyst
On Sat, Apr 18, 2009 at 2:42 AM, Graeme Lawton glaw...@alola.org wrote: Yeah, I was reading this the other day. Does anyone know if they use DBIC? Apparently, yes... ...The team which produces the web side server components for BBC iPlayer is expanding. We use Catalyst, DBIx::Class and TT to deliver a reliable and hugely popular website, as well as feeds to other teams within the BBC... http://london.pm.org/pipermail/jobs/2009-March/000189.html ___ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/