Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Perrin Harkins

> I think too that the OS/machine results at
> http://www.chamas.com/bench/hello_bycode.html could be more accurate
> in comparing results if the results are also grouped by tester,
> network connection type, and testing client so each grouping would
> well reflect the relative speed differences web applications on the
> same platform.

Agreed.

> I would argue that we should keep the code type grouping listed at
> http://www.chamas.com/bench/hello_bycode.html because it gives
> a good feel for how some operating systems & web servers are faster
> than others, i.e., Solaris slower than Linux, WinNT good for static
> HTML, Apache::ASP faster than IIS/ASP PerlScript, etc.

See, I don't think you can even make statements like that based on these
benchmarks.  Where is the test on Solaris x86 and Linux done by the same
person under the same conditions?  I don't see one.  Where is the test of NT
and Linux on the same machine by the same person?  Even the Apache::ASP vs
PerlScript comparisons you did seem to be using different clients, netowork
setups, and versions of NT.

I'm not criticizing you for not being able to get lab-quality results, but I
think we have to be careful what conclusions we draw from these.

> Finally, I would very much like to keep the fastest benchmark page
> as the first page, disclaiming it to death if necessary, the reason
> being that I would like to encourage future submissions, with
> new & faster hardware & OS configurations, and the best way to do
> that is to have something of a benchmark competition happening on the
> first page of the results.

I can understand that; I just don't want mod_perl users to get a reputation
as the Mindcraft of web application benchmarks.

> It seems that HTTP 1.1 submissions represent a small subset of
> skewed results, should these be dropped or presented separately?

I'd say they're as meaningful as any of the others if you consider them
independently of the other contributions.

> I also need to clarify some results, or back them up somehow.
> What should I do with results that seem skewed in general?
> Not post them until there is secondary confirmation ?

Your call.  Again, to my mind each person's contribution can only be viewed
in its own private context, so one is no more skewed than any other.

- Perrin




Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Joshua Chamas

Perrin Harkins wrote:
> 
> I think we would need more numbers from the exact same people, on the
> same machines, with the same configuration, the same client, the same
> network, the same Linux kernel... In other words, controlled conditions.
> 

I hear you, so how about a recommendation that people submit
no fewer than 2 benchmarks for listing eligibility, at least 
static html, and another.  The static html can be used as a 
rough control against other systems.

> Ideally, I would get rid of every page except the one which lists the
> tests grouped by OS/machine.  Then I would put a big statement at the
> top saying that comparisons across different people's tests are
> meaningless.
> 

I see where you are going, you feel that the summarized results
are misleading, and to some extent they are in that they are
not "controlled", so people's various hardware, OS, and 
configuration come into play very strongly in how the benchmark
performed, and readers aren't wise enough to digest all the 
info presented and what it all really means.

I think too that the OS/machine results at 
http://www.chamas.com/bench/hello_bycode.html could be more accurate
in comparing results if the results are also grouped by tester, 
network connection type, and testing client so each grouping would 
well reflect the relative speed differences web applications on the 
same platform.

I would argue that we should keep the code type grouping listed at
http://www.chamas.com/bench/hello_bycode.html because it gives
a good feel for how some operating systems & web servers are faster 
than others, i.e., Solaris slower than Linux, WinNT good for static 
HTML, Apache::ASP faster than IIS/ASP PerlScript, etc.  

I should drop the normalized results at 
http://www.chamas.com/bench/hello_normalized.html as they are unfair, 
and could be easily read wrong.  You are not the first to complain 
about this.  The other pages sort by Rate/MHz anyway, so someone
can get a rough idea on those pages for what's faster overall.

Finally, I would very much like to keep the fastest benchmark page
as the first page, disclaiming it to death if necessary, the reason 
being that I would like to encourage future submissions, with 
new & faster hardware & OS configurations, and the best way to do 
that is to have something of a benchmark competition happening on the 
first page of the results.

It seems that HTTP 1.1 submissions represent a small subset of
skewed results, should these be dropped or presented separately?
I already exclude them from the "top 10" style list since they
don't compare well to HTTP 1.0 results, which are the majority.

I also need to clarify some results, or back them up somehow.
What should I do with results that seem skewed in general?
Not post them until there is secondary confirmation ?

Thanks Perrin for your feedback.

-- Joshua
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks >> free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051



Strange problems.

2000-01-29 Thread Billow

This message was sent from Geocrawler.com by "Billow" <[EMAIL PROTECTED]>
Be sure to reply to that address.

I am a new user in Mod_perl.
I found a trange problem.
I defined some viariables in main script.(use my
...)
And I want to use it directly in the subroutine.
But sometimes, I can use the viariables, sometimes
they are null. (I use reload in my browser)

The script is like:
###
.
my (@a,@b) = ();
@a = ...
@b = ...

();

function 
{
 print "@a";
 print "@b";
}
##

Any hints?
If I use xxx(@a,@b),
in function , I use my($a,$b) = @_;,
it's ok.

Any differences in Mod_perl with Cgi-Perl


Geocrawler.com - The Knowledge Archive



Re: squid performance

2000-01-29 Thread Greg Stark

Leslie Mikesell <[EMAIL PROTECTED]> writes:

> The 'something happens' is the part I don't understand.  On a unix
> server, nothing one httpd process does should affect another
> one's ability to serve up a static file quickly, mod_perl or
> not.  (Well, almost anyway). 

Welcome to the real world however where "something" can and does happen.
Developers accidentally put untuned SQL code in a new page that takes too long
to run. Database backups slow down normal processing. Disks crash slowing down
the RAID array (if you're lucky). Developers include dependencies on services
like mail directly in the web server instead of handling mail asynchronously
and mail servers slow down for no reason at all. etc.

> > The proxy server continues to get up to 20 requests per second
> > for proxied pages, for each request it tries to connect to the mod_perl
> > server. The mod_perl server can now only handle 5 requests per second though.
> > So the proxy server processes quickly end up waiting in the backlog queue. 
> 
> If you are using squid or a caching proxy, those static requests
> would not be passed to the backend most of the time anyway. 

Please reread the analysis more carefully. I explained that. That is
precisely the scenario I'm describing faults in.

-- 
greg



Re: squid performance

2000-01-29 Thread Leslie Mikesell

According to Greg Stark:

> > > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests
> > >on the same keep-alive connection
> > 
> > I thought they just opened several connections in parallel without regard
> > for the type of content.
> 
> Right, that's the problem. If the two types of content are coming from the
> same proxy server (as far as NS/IE is concerned) then they will intermix the
> requests and the slow page could hold up several images queued behind it. I
> actually suspect IE5 is cleverer about this, but you still know more than it
> does.

They have a maximum number of connections they will open at once
but I don't think there is any concept of queueing involved. 

> > > 2) static images won't be delayed when the proxy gets bogged down waiting on
> > >the backend dynamic server.
> 
> Picture the following situation: The dynamic server normally generates pages
> in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
> handle 20 connections per second. The mod_proxy runs 200 processes and it
> handles static requests very quickly, so it can handle some huge number of
> static requests, but it can still only handle 20 proxied requests per second.
> 
> Now something happens to your mod_perl server and it starts taking 2s to
> generate pages.

The 'something happens' is the part I don't understand.  On a unix
server, nothing one httpd process does should affect another
one's ability to serve up a static file quickly, mod_perl or
not.  (Well, almost anyway). 

> The proxy server continues to get up to 20 requests per second
> for proxied pages, for each request it tries to connect to the mod_perl
> server. The mod_perl server can now only handle 5 requests per second though.
> So the proxy server processes quickly end up waiting in the backlog queue. 

If you are using squid or a caching proxy, those static requests
would not be passed to the backend most of the time anyway. 

> Now *all* the mod_proxy processes are in "R" state and handling proxied
> requests. The result is that the static images -- which under normal
> conditions are handled quicly -- become delayed until a proxy process is
> available to handle the request. Eventually the backlog queue will fill up and
> the proxy server will hand out errors.

But only if it doesn't cache or know how to serve static content itself.

> Use a separate hostname for your pictures, it's a pain on the html authors but
> it's worth it in the long run.

That depends on what happens in the long run. If your domain name or
vhost changes, all of those non-relative links will have to be
fixed again.

  Les Mikesell
   [EMAIL PROTECTED]



Re: splitting mod_perl and sql over machines

2000-01-29 Thread Leslie Mikesell

According to Jeffrey W. Baker:

> I will address two points:
> 
> There is a very high degree of parallelism in modern PC architecture. 
> The I/O hardware is helpful here.  The machine can do many things while
> a SCSI subsystem is processing a command, or the network hardware is
> writing a buffer over the wire.

Yes, for performance it is going to boil down to contention for
disk and RAM and (rarely) CPU.  You just have to look at pricing
for your particular scale of machine to see whether it is cheaper
to stuff more in the same box or add another.  However, once you
have multiple web server boxes the backend database becomes a
single point of failure so I consider it a good idea to shield
it from direct internet access.

  Les Mikesell
   [EMAIL PROTECTED]



Re: squid performance

2000-01-29 Thread Greg Stark


Leslie Mikesell <[EMAIL PROTECTED]> writes:

> I agree that it is correct to serve images from a lightweight server
> but I don't quite understand how these points relate.  A proxy should
> avoid the need to hit the backend server for static content if the
> cache copy is current unless the user hits the reload button and
> the browser sends the request with 'pragma: no-cache'.

I'll try to expand a bit on the details:

> > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests
> >on the same keep-alive connection
> 
> I thought they just opened several connections in parallel without regard
> for the type of content.

Right, that's the problem. If the two types of content are coming from the
same proxy server (as far as NS/IE is concerned) then they will intermix the
requests and the slow page could hold up several images queued behind it. I
actually suspect IE5 is cleverer about this, but you still know more than it
does.

By putting them on different hostnames the browser will open a second set of
parallel connections to that server and keep the two types of requests
separate.

> > 2) static images won't be delayed when the proxy gets bogged down waiting on
> >the backend dynamic server.

Picture the following situation: The dynamic server normally generates pages
in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
handle 20 connections per second. The mod_proxy runs 200 processes and it
handles static requests very quickly, so it can handle some huge number of
static requests, but it can still only handle 20 proxied requests per second.

Now something happens to your mod_perl server and it starts taking 2s to
generate pages. The proxy server continues to get up to 20 requests per second
for proxied pages, for each request it tries to connect to the mod_perl
server. The mod_perl server can now only handle 5 requests per second though.
So the proxy server processes quickly end up waiting in the backlog queue. 

Now *all* the mod_proxy processes are in "R" state and handling proxied
requests. The result is that the static images -- which under normal
conditions are handled quicly -- become delayed until a proxy process is
available to handle the request. Eventually the backlog queue will fill up and
the proxy server will hand out errors.

> This is a good idea because it is easy to move to a different machine
> if the load makes it necessary.  However, a simple approach is to
> use a non-mod_perl apache as a non-caching proxy front end for the
> dynamic content and let it deliver the static pages directly.  A
> short stack of RewriteRules can arrange this if you use the 
> [L] or [PT] flags on the matches you want the front end to serve
> and the [P] flag on the matches to proxy.

That's what I thought. I'm trying to help others avoid my mistake :)

Use a separate hostname for your pictures, it's a pain on the html authors but
it's worth it in the long run.
-- 
greg



Re: Novel technique for dynamic web page generation

2000-01-29 Thread Paul J. Lucas

On 28 Jan 2000, Randal L. Schwartz wrote:

> Have you looked at the new XS version of HTML::Parser?

Not previously, but I just did.

> It's a speedy little beasty.  I dare say probably faster than even
> expat-based XML::Parser because it doesn't do quite as much.

But still an order of magnitude slower than mine.  For a test,
I downloaded Yahoo!'s home page for a test HTML file and wrote
the following code:

- test code -
#! /usr/local/bin/perl

use Benchmark;
use HTML::Parser;
use HTML::Tree;

@t = timethese( 1000, {
   'Parser' => '$p = HTML::Parser->new(); $p->parse_file( "/tmp/test.html" );',
   'Tree'   => '$html = HTML::Tree->new( "/tmp/test.html" );',
} );
-

The results are:

- results -
Benchmark: timing 1000 iterations of Parser, Tree...
Parser: 37 secs (36.22 usr  0.15 sys = 36.37 cpu)
  Tree:  7 secs ( 7.40 usr  0.22 sys =  7.62 cpu)
---

One really can't compete against mmap(2), pointer arithmetic,
and dereferencing.

- Paul



Re: RegistryLoader

2000-01-29 Thread Stas Bekman

> If I user RegistryLoader to preload a script, should it
> show up in /perl-status?rgysubs   (Apache::Status)??

Yes.

Make sure that RegistryLoader didn't fail. (hint: watch the log)


___
Stas Bekmanmailto:[EMAIL PROTECTED]  http://www.stason.org/stas
Perl,CGI,Apache,Linux,Web,Java,PC http://www.stason.org/stas/TULARC
perl.apache.orgmodperl.sourcegarden.org   perlmonth.comperl.org
single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com



Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Perrin Harkins

Joshua Chamas wrote:
> There is no way that people are going to benchmark
> 10+ different environments themselves, so this merely offers
> a quick fix to get people going with their own comparisons.

I agree that having the code snippets for running hello world on
different tools collected in one place is handy.

> Do you have any idea how much time it takes to do these?

Yes, I've done quite a few of them.  I never said they were easy.

> In order to improve the benchmarks, like the Resin & Velocigen
> ones that you cited where we have a very small sample, we simply
> need more numbers from more people.

I think we would need more numbers from the exact same people, on the
same machines, with the same configuration, the same client, the same
network, the same Linux kernel... In other words, controlled conditions.

> Also, any disclaimer modifications might be good if you feel
> there can be more work done there.

Ideally, I would get rid of every page except the one which lists the
tests grouped by OS/machine.  Then I would put a big statement at the
top saying that comparisons across different people's tests are
meaningless.

- Perrin



Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Perrin Harkins

Ken Williams wrote:
> How about we come up with a "benchmark suite" that can be downloaded and run in
> one shot on various platforms?  Given the variety of things that are being
> tested, this might be too gargantuan a test, and perhaps a few things would
> have to be cut from the suite.  But there are a few things like EmbPerl, Mason,
> SSI, Registry, static HTML, etc. that could be compared pretty easily by
> several people on the modperl list.

Realistically, it takes a fair amount of work to get most of these up
and running with the proper tuning and with all of their dependencies
satisfied.  If it was easy, we would probably already see people sending
in more benchmarks.

- Perrin



ANNOUNCE: Apache::Filter 1.06

2000-01-29 Thread Ken Williams

Hi,

The URL

http://mathforum.com/~ken/modules/archive/Apache-Filter-1.006.tar.gz

has entered CPAN as

  file: $CPAN/authors/id/KWILLIAMS/Apache-Filter-1.006.tar.gz
  size: 14436 bytes
   md5: 345deb65da7317d1cea955574fd55be8

Changes:

   - Added 'handle' parameter to filter_input(), which lets callers open the 
 input filehandle themselves. 
 [[EMAIL PROTECTED] (Vegard Vesterheim)]

   - If $r->filename can't be opened, we no longer abort halfway
 through filter_input().  Just return an undef filehandle at the end.
 [[EMAIL PROTECTED] (Philippe M. Chiasson)]


  ------
  Ken Williams Last Bastion of Euclidity
  [EMAIL PROTECTED]The Math Forum




Re: splitting mod_perl and sql over machines

2000-01-29 Thread Jeffrey W. Baker

Marko van der Puil wrote:
 
> so httpd 1 has just queried the database and httpd 2 is just executing...
> It also has to query the database, so it has to wait, for httpd 1 to finish. (not
> actually how it works but close enough)
> Now httpd 1 has the results from the query and is preparing to read the template
> from disk.
> httpd 2 is now quering the database... Now httpd 1 has to wait for the httpd 2
> query to finish, before it can fetch it's template from disk. a.s.o. a.s.o. This,
> unfortunately is (still) how pc's work. There's no such thing as paralel processing
> in PC architecture.
> This example is highly simplyfied. In practise it is a lot worse than I demonstrate
> here, because while waiting for the database query to finish, your application
> still gets it's share of resources (CPU) so while the load on the machine is over
> 1.00 it's actually doing nothing for half the time... :( This is true, take a
> university course in information technology if ya want to know...

It would be overly difficult for me to address every falsehood in that
paragraph, so I will summarize by saying that I've never seen more
psuedo-technical bullshit concentrated in one place before.

I will address two points:

There is a very high degree of parallelism in modern PC architecture. 
The I/O hardware is helpful here.  The machine can do many things while
a SCSI subsystem is processing a command, or the network hardware is
writing a buffer over the wire.

If a process is not runnable (that is, it is blocked waiting for I/O or
similar), it is not using significant CPU time.  The only CPU time that
will be required to maintain a blocked process is the time it takes for
the operating system's scheduler to look at the process, decide that it
is still not runnable, and move on to the next process in the list. 
This is hardly any time at all.  In your example, if you have two
processes and one of them is blocked on I/O and the other is CPU bound,
the blocked process is getting 0% CPU time, the runnable process is
getting 99.9% CPU time, and the kernel scheduler is using the remainder.

-jwb



Re: splitting mod_perl and sql over machines

2000-01-29 Thread Marko van der Puil

Hi,

Thanks for your reaction, still you seem to have misunderstood the point I am
trying to make, and I will post it again in the list.

There's different architectures that you can go with.

1. Put all your servers into one box.
Mysql
mod_perl
httpd_docs (static)
squid

the lot.

When it boggs down, upgrade it to the limits of modern technology Spend over  $
10'000,- on serverhardware... And still get bad results.

People used to do this.

The some very clever people invented client server architecture. Which is what we
use on the web and on unix too. (Hell even Bill Gates is claiming that he's gonna
do it... So it must be ancient)

Then some even MORE clever people invented 3 tier architecture. Which is what I'm
talking about and can seriously improve results, with much smally costs than the
spend cash on one server and still fail method described above.

3 tier architechture means you split up several processes of your computing
solution between diffent machines.

1st you have the client, who will see the data on its screen and can give
instruction to modify or process the data.
in our case, an internet browser.
2nd you have the application server, which does the actual processing of the data
and sends it back to the client.
in our case, a mod_perl enabled apache server.
3rd you have the database server, which stores and retrieves all the data for the
application server.

Now what you can do is you can try to stuff all this into one box, BUT, the 3 tiers
are actually so different in nature, that they produce better results when run on 3
diffent machines, which are configured for their perticular task!
Like the client only has to catch input from the user and display results on the
screen. You might want some fancy high color graphics card in that, so the output
looks better.
Like the application server, in our case needs to process request for information,
fetch the data, proccess it and server it back to the client. What this needs is
lots of RAM and a good network connection.You don't need a fancy graphics card
because you wont ever work on this machine.
The database server, however needs to store all these huge amounts of data, and be
able to retrieve it quickly. So it also needs a fair amount of RAM (for sorting and
stuff) but most important it is to have a very fast disk in this one. You don't
need a very high speed network connection because you only fetch the data you need,
and do any dataprocessing on the database server. You dont need fancy grafic card
and you don't need as much ram and cpu power as you do for the application server.

What that means:

You can get away with:

Client:
P 90 - 16 MB Ram - HIRES Videocard
Application Server:
P 300 - 256 MB Ram - IDE disk
Database server:
P 300 - 128 MB Ram - UW2 SCSI disk

And still get better results than doing it all on a dual PIII 733 with 1 GB Mem and
a UW2 SCSI RAID array.

Why?

Because the different tiers are *NOT* competing for *RECOURCES*



Tom Brown wrote:

> so you've overloaded the memory (10 meg * 20 daemons = 200 meg on a
> machine with 128 Meg) ...

No, you misread me, it said, taking *UPTO* 10 megs each. Not at least 10 MEGS each.
I had no swapping during this benchmark. I kill of the processes when they get too
large. And YES I overloaded the server, I dont expect that kind of traffic on my
machine within the next year or so, just to prove a point. (That if you get lots of
traffic, you should add onther box and put the database on that, instead of
upgrading the server)

> what happens if you take out the 128 meg from
> your slow box and put it in the fast box? hhhmmm?

Like i said, there was no swapping during the test, I added another 128 Megs of
ram, and got the same results. (I could spawn more childeren but this is not
relevant) RAM is the deciding factor when applying mod_perl apache... Nothing else.

You could dump in 1 GB of Memory, when the database is also in the same machine;
Each and every httpd has to wait for the database's IO to finish. (You know that IO
also uses CPU?) Ofcourse you can add lots of high tech stuff, but the point is that
while to you, it might seem that a server is doing over 20 things at the same time,
it is actually doing them one after the other. (even worse it does a little piece a
one process and then a little piece of the other a.s.o.)

So lets see what happens in a single server secenario...

Apache Server gets hit...
Spawns httpd
Excecutes mod_perl script
mod_perl script query's database server
database processes query -> disk IO
mod_perl script processes data
mod_perl script reads html template -> disk I/O
mod_perl script sends data to client
apache writes log entry -> disk I/O

Now Imagine this happening 20 times simultaniously.

Identify the bottlenecks in this setup. (easy, disk I/O is the slowest factor)

so httpd 1 has just queried the database and httpd 2 is just executing...
It also has to query the database, so it has to wait, for httpd 1 to finish. (not
actually how it work

Re: ApacheDBI question

2000-01-29 Thread Mark Cogan

At 05:40 PM 1/28/00 -0500, Deepak Gupta wrote:
>How does connection pooling determine how many connections to keep open?
>
>The reason I ask is that I am afraid my non-modperl scripts are getting
>rejected by the db server b/c all (or most) connections are being
>dedicated to Apache activity.

Apache::DBI keeps one connection open per process per unique connection 
string. If you have 175 modperl processes running, be prepared to cope with 
as many as 175 database connections.

The source code for Apache::DBI is worth a look -- it's very short and easy 
to understand, and then you'll know all there is to know about how it works.
---
Mark Cogan[EMAIL PROTECTED] +1-520-881-8101 
ArtToday  www.arttoday.com



Re: overriding document root dynamically?

2000-01-29 Thread Andre Landwehr

On Fri, Jan 28, 2000 at 12:11:07PM -0800, Jonathan Swartz wrote:
> etc. I could do this with a set of virtual servers, but then I have to 
> change the httpd.conf and restart the server every time a user is
> added, which is undesirable.

Use mod_vhost_alias. You do not have to touch your httpd.conf for
a new user but just create a directory named like the server you
want, e.g. /home/httpd/www.joe.yourcompany.com/

Andre