Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
> I think too that the OS/machine results at > http://www.chamas.com/bench/hello_bycode.html could be more accurate > in comparing results if the results are also grouped by tester, > network connection type, and testing client so each grouping would > well reflect the relative speed differences web applications on the > same platform. Agreed. > I would argue that we should keep the code type grouping listed at > http://www.chamas.com/bench/hello_bycode.html because it gives > a good feel for how some operating systems & web servers are faster > than others, i.e., Solaris slower than Linux, WinNT good for static > HTML, Apache::ASP faster than IIS/ASP PerlScript, etc. See, I don't think you can even make statements like that based on these benchmarks. Where is the test on Solaris x86 and Linux done by the same person under the same conditions? I don't see one. Where is the test of NT and Linux on the same machine by the same person? Even the Apache::ASP vs PerlScript comparisons you did seem to be using different clients, netowork setups, and versions of NT. I'm not criticizing you for not being able to get lab-quality results, but I think we have to be careful what conclusions we draw from these. > Finally, I would very much like to keep the fastest benchmark page > as the first page, disclaiming it to death if necessary, the reason > being that I would like to encourage future submissions, with > new & faster hardware & OS configurations, and the best way to do > that is to have something of a benchmark competition happening on the > first page of the results. I can understand that; I just don't want mod_perl users to get a reputation as the Mindcraft of web application benchmarks. > It seems that HTTP 1.1 submissions represent a small subset of > skewed results, should these be dropped or presented separately? I'd say they're as meaningful as any of the others if you consider them independently of the other contributions. > I also need to clarify some results, or back them up somehow. > What should I do with results that seem skewed in general? > Not post them until there is secondary confirmation ? Your call. Again, to my mind each person's contribution can only be viewed in its own private context, so one is no more skewed than any other. - Perrin
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
Perrin Harkins wrote: > > I think we would need more numbers from the exact same people, on the > same machines, with the same configuration, the same client, the same > network, the same Linux kernel... In other words, controlled conditions. > I hear you, so how about a recommendation that people submit no fewer than 2 benchmarks for listing eligibility, at least static html, and another. The static html can be used as a rough control against other systems. > Ideally, I would get rid of every page except the one which lists the > tests grouped by OS/machine. Then I would put a big statement at the > top saying that comparisons across different people's tests are > meaningless. > I see where you are going, you feel that the summarized results are misleading, and to some extent they are in that they are not "controlled", so people's various hardware, OS, and configuration come into play very strongly in how the benchmark performed, and readers aren't wise enough to digest all the info presented and what it all really means. I think too that the OS/machine results at http://www.chamas.com/bench/hello_bycode.html could be more accurate in comparing results if the results are also grouped by tester, network connection type, and testing client so each grouping would well reflect the relative speed differences web applications on the same platform. I would argue that we should keep the code type grouping listed at http://www.chamas.com/bench/hello_bycode.html because it gives a good feel for how some operating systems & web servers are faster than others, i.e., Solaris slower than Linux, WinNT good for static HTML, Apache::ASP faster than IIS/ASP PerlScript, etc. I should drop the normalized results at http://www.chamas.com/bench/hello_normalized.html as they are unfair, and could be easily read wrong. You are not the first to complain about this. The other pages sort by Rate/MHz anyway, so someone can get a rough idea on those pages for what's faster overall. Finally, I would very much like to keep the fastest benchmark page as the first page, disclaiming it to death if necessary, the reason being that I would like to encourage future submissions, with new & faster hardware & OS configurations, and the best way to do that is to have something of a benchmark competition happening on the first page of the results. It seems that HTTP 1.1 submissions represent a small subset of skewed results, should these be dropped or presented separately? I already exclude them from the "top 10" style list since they don't compare well to HTTP 1.0 results, which are the majority. I also need to clarify some results, or back them up somehow. What should I do with results that seem skewed in general? Not post them until there is secondary confirmation ? Thanks Perrin for your feedback. -- Joshua _ Joshua Chamas Chamas Enterprises Inc. NodeWorks >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051
Strange problems.
This message was sent from Geocrawler.com by "Billow" <[EMAIL PROTECTED]> Be sure to reply to that address. I am a new user in Mod_perl. I found a trange problem. I defined some viariables in main script.(use my ...) And I want to use it directly in the subroutine. But sometimes, I can use the viariables, sometimes they are null. (I use reload in my browser) The script is like: ### . my (@a,@b) = (); @a = ... @b = ... (); function { print "@a"; print "@b"; } ## Any hints? If I use xxx(@a,@b), in function , I use my($a,$b) = @_;, it's ok. Any differences in Mod_perl with Cgi-Perl Geocrawler.com - The Knowledge Archive
Re: squid performance
Leslie Mikesell <[EMAIL PROTECTED]> writes: > The 'something happens' is the part I don't understand. On a unix > server, nothing one httpd process does should affect another > one's ability to serve up a static file quickly, mod_perl or > not. (Well, almost anyway). Welcome to the real world however where "something" can and does happen. Developers accidentally put untuned SQL code in a new page that takes too long to run. Database backups slow down normal processing. Disks crash slowing down the RAID array (if you're lucky). Developers include dependencies on services like mail directly in the web server instead of handling mail asynchronously and mail servers slow down for no reason at all. etc. > > The proxy server continues to get up to 20 requests per second > > for proxied pages, for each request it tries to connect to the mod_perl > > server. The mod_perl server can now only handle 5 requests per second though. > > So the proxy server processes quickly end up waiting in the backlog queue. > > If you are using squid or a caching proxy, those static requests > would not be passed to the backend most of the time anyway. Please reread the analysis more carefully. I explained that. That is precisely the scenario I'm describing faults in. -- greg
Re: squid performance
According to Greg Stark: > > > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests > > >on the same keep-alive connection > > > > I thought they just opened several connections in parallel without regard > > for the type of content. > > Right, that's the problem. If the two types of content are coming from the > same proxy server (as far as NS/IE is concerned) then they will intermix the > requests and the slow page could hold up several images queued behind it. I > actually suspect IE5 is cleverer about this, but you still know more than it > does. They have a maximum number of connections they will open at once but I don't think there is any concept of queueing involved. > > > 2) static images won't be delayed when the proxy gets bogged down waiting on > > >the backend dynamic server. > > Picture the following situation: The dynamic server normally generates pages > in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can > handle 20 connections per second. The mod_proxy runs 200 processes and it > handles static requests very quickly, so it can handle some huge number of > static requests, but it can still only handle 20 proxied requests per second. > > Now something happens to your mod_perl server and it starts taking 2s to > generate pages. The 'something happens' is the part I don't understand. On a unix server, nothing one httpd process does should affect another one's ability to serve up a static file quickly, mod_perl or not. (Well, almost anyway). > The proxy server continues to get up to 20 requests per second > for proxied pages, for each request it tries to connect to the mod_perl > server. The mod_perl server can now only handle 5 requests per second though. > So the proxy server processes quickly end up waiting in the backlog queue. If you are using squid or a caching proxy, those static requests would not be passed to the backend most of the time anyway. > Now *all* the mod_proxy processes are in "R" state and handling proxied > requests. The result is that the static images -- which under normal > conditions are handled quicly -- become delayed until a proxy process is > available to handle the request. Eventually the backlog queue will fill up and > the proxy server will hand out errors. But only if it doesn't cache or know how to serve static content itself. > Use a separate hostname for your pictures, it's a pain on the html authors but > it's worth it in the long run. That depends on what happens in the long run. If your domain name or vhost changes, all of those non-relative links will have to be fixed again. Les Mikesell [EMAIL PROTECTED]
Re: splitting mod_perl and sql over machines
According to Jeffrey W. Baker: > I will address two points: > > There is a very high degree of parallelism in modern PC architecture. > The I/O hardware is helpful here. The machine can do many things while > a SCSI subsystem is processing a command, or the network hardware is > writing a buffer over the wire. Yes, for performance it is going to boil down to contention for disk and RAM and (rarely) CPU. You just have to look at pricing for your particular scale of machine to see whether it is cheaper to stuff more in the same box or add another. However, once you have multiple web server boxes the backend database becomes a single point of failure so I consider it a good idea to shield it from direct internet access. Les Mikesell [EMAIL PROTECTED]
Re: squid performance
Leslie Mikesell <[EMAIL PROTECTED]> writes: > I agree that it is correct to serve images from a lightweight server > but I don't quite understand how these points relate. A proxy should > avoid the need to hit the backend server for static content if the > cache copy is current unless the user hits the reload button and > the browser sends the request with 'pragma: no-cache'. I'll try to expand a bit on the details: > > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests > >on the same keep-alive connection > > I thought they just opened several connections in parallel without regard > for the type of content. Right, that's the problem. If the two types of content are coming from the same proxy server (as far as NS/IE is concerned) then they will intermix the requests and the slow page could hold up several images queued behind it. I actually suspect IE5 is cleverer about this, but you still know more than it does. By putting them on different hostnames the browser will open a second set of parallel connections to that server and keep the two types of requests separate. > > 2) static images won't be delayed when the proxy gets bogged down waiting on > >the backend dynamic server. Picture the following situation: The dynamic server normally generates pages in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can handle 20 connections per second. The mod_proxy runs 200 processes and it handles static requests very quickly, so it can handle some huge number of static requests, but it can still only handle 20 proxied requests per second. Now something happens to your mod_perl server and it starts taking 2s to generate pages. The proxy server continues to get up to 20 requests per second for proxied pages, for each request it tries to connect to the mod_perl server. The mod_perl server can now only handle 5 requests per second though. So the proxy server processes quickly end up waiting in the backlog queue. Now *all* the mod_proxy processes are in "R" state and handling proxied requests. The result is that the static images -- which under normal conditions are handled quicly -- become delayed until a proxy process is available to handle the request. Eventually the backlog queue will fill up and the proxy server will hand out errors. > This is a good idea because it is easy to move to a different machine > if the load makes it necessary. However, a simple approach is to > use a non-mod_perl apache as a non-caching proxy front end for the > dynamic content and let it deliver the static pages directly. A > short stack of RewriteRules can arrange this if you use the > [L] or [PT] flags on the matches you want the front end to serve > and the [P] flag on the matches to proxy. That's what I thought. I'm trying to help others avoid my mistake :) Use a separate hostname for your pictures, it's a pain on the html authors but it's worth it in the long run. -- greg
Re: Novel technique for dynamic web page generation
On 28 Jan 2000, Randal L. Schwartz wrote: > Have you looked at the new XS version of HTML::Parser? Not previously, but I just did. > It's a speedy little beasty. I dare say probably faster than even > expat-based XML::Parser because it doesn't do quite as much. But still an order of magnitude slower than mine. For a test, I downloaded Yahoo!'s home page for a test HTML file and wrote the following code: - test code - #! /usr/local/bin/perl use Benchmark; use HTML::Parser; use HTML::Tree; @t = timethese( 1000, { 'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "/tmp/test.html" );', 'Tree' => '$html = HTML::Tree->new( "/tmp/test.html" );', } ); - The results are: - results - Benchmark: timing 1000 iterations of Parser, Tree... Parser: 37 secs (36.22 usr 0.15 sys = 36.37 cpu) Tree: 7 secs ( 7.40 usr 0.22 sys = 7.62 cpu) --- One really can't compete against mmap(2), pointer arithmetic, and dereferencing. - Paul
Re: RegistryLoader
> If I user RegistryLoader to preload a script, should it > show up in /perl-status?rgysubs (Apache::Status)?? Yes. Make sure that RegistryLoader didn't fail. (hint: watch the log) ___ Stas Bekmanmailto:[EMAIL PROTECTED] http://www.stason.org/stas Perl,CGI,Apache,Linux,Web,Java,PC http://www.stason.org/stas/TULARC perl.apache.orgmodperl.sourcegarden.org perlmonth.comperl.org single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
Joshua Chamas wrote: > There is no way that people are going to benchmark > 10+ different environments themselves, so this merely offers > a quick fix to get people going with their own comparisons. I agree that having the code snippets for running hello world on different tools collected in one place is handy. > Do you have any idea how much time it takes to do these? Yes, I've done quite a few of them. I never said they were easy. > In order to improve the benchmarks, like the Resin & Velocigen > ones that you cited where we have a very small sample, we simply > need more numbers from more people. I think we would need more numbers from the exact same people, on the same machines, with the same configuration, the same client, the same network, the same Linux kernel... In other words, controlled conditions. > Also, any disclaimer modifications might be good if you feel > there can be more work done there. Ideally, I would get rid of every page except the one which lists the tests grouped by OS/machine. Then I would put a big statement at the top saying that comparisons across different people's tests are meaningless. - Perrin
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
Ken Williams wrote: > How about we come up with a "benchmark suite" that can be downloaded and run in > one shot on various platforms? Given the variety of things that are being > tested, this might be too gargantuan a test, and perhaps a few things would > have to be cut from the suite. But there are a few things like EmbPerl, Mason, > SSI, Registry, static HTML, etc. that could be compared pretty easily by > several people on the modperl list. Realistically, it takes a fair amount of work to get most of these up and running with the proper tuning and with all of their dependencies satisfied. If it was easy, we would probably already see people sending in more benchmarks. - Perrin
ANNOUNCE: Apache::Filter 1.06
Hi, The URL http://mathforum.com/~ken/modules/archive/Apache-Filter-1.006.tar.gz has entered CPAN as file: $CPAN/authors/id/KWILLIAMS/Apache-Filter-1.006.tar.gz size: 14436 bytes md5: 345deb65da7317d1cea955574fd55be8 Changes: - Added 'handle' parameter to filter_input(), which lets callers open the input filehandle themselves. [[EMAIL PROTECTED] (Vegard Vesterheim)] - If $r->filename can't be opened, we no longer abort halfway through filter_input(). Just return an undef filehandle at the end. [[EMAIL PROTECTED] (Philippe M. Chiasson)] ------ Ken Williams Last Bastion of Euclidity [EMAIL PROTECTED]The Math Forum
Re: splitting mod_perl and sql over machines
Marko van der Puil wrote: > so httpd 1 has just queried the database and httpd 2 is just executing... > It also has to query the database, so it has to wait, for httpd 1 to finish. (not > actually how it works but close enough) > Now httpd 1 has the results from the query and is preparing to read the template > from disk. > httpd 2 is now quering the database... Now httpd 1 has to wait for the httpd 2 > query to finish, before it can fetch it's template from disk. a.s.o. a.s.o. This, > unfortunately is (still) how pc's work. There's no such thing as paralel processing > in PC architecture. > This example is highly simplyfied. In practise it is a lot worse than I demonstrate > here, because while waiting for the database query to finish, your application > still gets it's share of resources (CPU) so while the load on the machine is over > 1.00 it's actually doing nothing for half the time... :( This is true, take a > university course in information technology if ya want to know... It would be overly difficult for me to address every falsehood in that paragraph, so I will summarize by saying that I've never seen more psuedo-technical bullshit concentrated in one place before. I will address two points: There is a very high degree of parallelism in modern PC architecture. The I/O hardware is helpful here. The machine can do many things while a SCSI subsystem is processing a command, or the network hardware is writing a buffer over the wire. If a process is not runnable (that is, it is blocked waiting for I/O or similar), it is not using significant CPU time. The only CPU time that will be required to maintain a blocked process is the time it takes for the operating system's scheduler to look at the process, decide that it is still not runnable, and move on to the next process in the list. This is hardly any time at all. In your example, if you have two processes and one of them is blocked on I/O and the other is CPU bound, the blocked process is getting 0% CPU time, the runnable process is getting 99.9% CPU time, and the kernel scheduler is using the remainder. -jwb
Re: splitting mod_perl and sql over machines
Hi, Thanks for your reaction, still you seem to have misunderstood the point I am trying to make, and I will post it again in the list. There's different architectures that you can go with. 1. Put all your servers into one box. Mysql mod_perl httpd_docs (static) squid the lot. When it boggs down, upgrade it to the limits of modern technology Spend over $ 10'000,- on serverhardware... And still get bad results. People used to do this. The some very clever people invented client server architecture. Which is what we use on the web and on unix too. (Hell even Bill Gates is claiming that he's gonna do it... So it must be ancient) Then some even MORE clever people invented 3 tier architecture. Which is what I'm talking about and can seriously improve results, with much smally costs than the spend cash on one server and still fail method described above. 3 tier architechture means you split up several processes of your computing solution between diffent machines. 1st you have the client, who will see the data on its screen and can give instruction to modify or process the data. in our case, an internet browser. 2nd you have the application server, which does the actual processing of the data and sends it back to the client. in our case, a mod_perl enabled apache server. 3rd you have the database server, which stores and retrieves all the data for the application server. Now what you can do is you can try to stuff all this into one box, BUT, the 3 tiers are actually so different in nature, that they produce better results when run on 3 diffent machines, which are configured for their perticular task! Like the client only has to catch input from the user and display results on the screen. You might want some fancy high color graphics card in that, so the output looks better. Like the application server, in our case needs to process request for information, fetch the data, proccess it and server it back to the client. What this needs is lots of RAM and a good network connection.You don't need a fancy graphics card because you wont ever work on this machine. The database server, however needs to store all these huge amounts of data, and be able to retrieve it quickly. So it also needs a fair amount of RAM (for sorting and stuff) but most important it is to have a very fast disk in this one. You don't need a very high speed network connection because you only fetch the data you need, and do any dataprocessing on the database server. You dont need fancy grafic card and you don't need as much ram and cpu power as you do for the application server. What that means: You can get away with: Client: P 90 - 16 MB Ram - HIRES Videocard Application Server: P 300 - 256 MB Ram - IDE disk Database server: P 300 - 128 MB Ram - UW2 SCSI disk And still get better results than doing it all on a dual PIII 733 with 1 GB Mem and a UW2 SCSI RAID array. Why? Because the different tiers are *NOT* competing for *RECOURCES* Tom Brown wrote: > so you've overloaded the memory (10 meg * 20 daemons = 200 meg on a > machine with 128 Meg) ... No, you misread me, it said, taking *UPTO* 10 megs each. Not at least 10 MEGS each. I had no swapping during this benchmark. I kill of the processes when they get too large. And YES I overloaded the server, I dont expect that kind of traffic on my machine within the next year or so, just to prove a point. (That if you get lots of traffic, you should add onther box and put the database on that, instead of upgrading the server) > what happens if you take out the 128 meg from > your slow box and put it in the fast box? hhhmmm? Like i said, there was no swapping during the test, I added another 128 Megs of ram, and got the same results. (I could spawn more childeren but this is not relevant) RAM is the deciding factor when applying mod_perl apache... Nothing else. You could dump in 1 GB of Memory, when the database is also in the same machine; Each and every httpd has to wait for the database's IO to finish. (You know that IO also uses CPU?) Ofcourse you can add lots of high tech stuff, but the point is that while to you, it might seem that a server is doing over 20 things at the same time, it is actually doing them one after the other. (even worse it does a little piece a one process and then a little piece of the other a.s.o.) So lets see what happens in a single server secenario... Apache Server gets hit... Spawns httpd Excecutes mod_perl script mod_perl script query's database server database processes query -> disk IO mod_perl script processes data mod_perl script reads html template -> disk I/O mod_perl script sends data to client apache writes log entry -> disk I/O Now Imagine this happening 20 times simultaniously. Identify the bottlenecks in this setup. (easy, disk I/O is the slowest factor) so httpd 1 has just queried the database and httpd 2 is just executing... It also has to query the database, so it has to wait, for httpd 1 to finish. (not actually how it work
Re: ApacheDBI question
At 05:40 PM 1/28/00 -0500, Deepak Gupta wrote: >How does connection pooling determine how many connections to keep open? > >The reason I ask is that I am afraid my non-modperl scripts are getting >rejected by the db server b/c all (or most) connections are being >dedicated to Apache activity. Apache::DBI keeps one connection open per process per unique connection string. If you have 175 modperl processes running, be prepared to cope with as many as 175 database connections. The source code for Apache::DBI is worth a look -- it's very short and easy to understand, and then you'll know all there is to know about how it works. --- Mark Cogan[EMAIL PROTECTED] +1-520-881-8101 ArtToday www.arttoday.com
Re: overriding document root dynamically?
On Fri, Jan 28, 2000 at 12:11:07PM -0800, Jonathan Swartz wrote: > etc. I could do this with a set of virtual servers, but then I have to > change the httpd.conf and restart the server every time a user is > added, which is undesirable. Use mod_vhost_alias. You do not have to touch your httpd.conf for a new user but just create a directory named like the server you want, e.g. /home/httpd/www.joe.yourcompany.com/ Andre