Re: CGI Script Source Code Disclosure Vulnerability in Apache for Windows

2006-08-19 Thread David Burry

Joshua Slive wrote:

notedirectiveScriptAlias/directive is used to
strongboth/strong map a URL to a directory strongand/strong
mark requests for that URL as pointing to CGI scripts.  It should not
be used for directories that are already accessible from the web
because they are under the directive
module=coreDocumentRoot/directive, for example.  Instead, you can
use:
example
lt;Directory /usr/local/apache2/htdocs/cgi-dir gt;br /
SetHandler cgi-scriptbr /
Options ExecCGIbr /
lt;/Directorygt;
/example/note


I like the idea of this documentation addition, plus maybe an 
explanation about why it is recommended on the security tips page 
(something about the differences between URLs and paths in the 
configuration, and the security implications of the difference, using 
CGI as an example), with a reference to it in the ScriptAlias section.


This is important to me because after reading this thread, I've realized 
I never thought about these particular security hazards of referencing 
something by their Location or Alias (which is always case sensitive 
and has different ways of referencing the same characters), vs by their 
Directory or File (which is case insensitive on some operating 
systems, and normalizes all those character differences before trying to 
match).  And now I need to go do an audit of my web servers to make sure...


Dave


Re: SSL enabled name virtual hosts

2006-03-06 Thread David Burry

Boyle Owen wrote:

- You're right that since apache can't see the host header, it uses the cert 
from the default VH to establish the SSL session. Thereafter, it *can* see the 
host header and so can route the requests successfully. This give a lot of 
people the illusion that SSL-NBVH is possible. The big problem is that you 
don't get authentication because the default cert, generally, will not match 
the requested site. For professional SSL, authentication is every bit as 
essential as encryption so this won't do.
  
We use a wildcard cert to overcome this situation... the technical 
limitation is that all the SSL hosts have to end with the same domain 
(a wildcard cert is bound to our domain, not any individual host name), 
but otherwise we can and do indeed run hundreds (soon to be thousands) 
of customers on their own individual host names under SSL, all on port 
443 on one instance of apache.  Unfortunately we have to do funny 
mod_rewrite trickery to simulate NBVH instead of using real NBVH  I 
suspect it would be a major change in Apache architecture to use real 
NBVH in our case (but otherwise, yes, it absolutely could be technically 
possible, given the all-must-be-in-the-same-domain, and must use a 
wildcard cert limitations).


Dave



Re: SSL enabled name virtual hosts

2006-03-06 Thread David Burry

Boyle Owen wrote:

-Original Message-
From: David Burry [mailto:[EMAIL PROTECTED]
We use a wildcard cert to overcome this situation...


How did you get your wildcard cert? Did you buy it? Apparently, the cert 
sellers (Thwate, Versign etc.) are not too keen on selling wildcards for the 
simple reason that it reduces the number of certs required (and hence sold) - a 
bit like the everlasting light bulb... Was that your experience?
  
Yes, it was bought... So far the experience has been awesome (other than 
the previously mentioned apache architecture limitation workarounds).  
There are some cert sellers out there who are getting competitive in 
this respect, maybe not the original big guys such as the ones you 
mention yet, but it would be absolutely impossible for us to do what 
we're doing right now without this kind of cert, it's not feasible to 
buy a class B block of IP addresses and we couldn't afford that many 
individual certs unless they were only a couple bucks apiece or less!


Dave



Re: Puzzling News

2005-03-02 Thread David Burry
Dependancy on third party modules still prevents us from upgrading from 
1.3 to 2.0 or at least makes the the drawbacks of upgrading more 
than the benefits  That's the main holdup for us.

For instance, mod_perl (and a few custom scripts that use the API 
extensively), mod_php (all those non-thread-safe libraries everyone 
demands nullifies the benefits of multi-threading), mod_dynamo (we're 
stuck with an older version until an enormous application we wrote on 
top of it is massively overhauled, by us), mod_webobjects (legacy code 
here too, should just go away someday, but it keeps working 
unfortunately), etc...  Not all these third party modules are API-stable 
and/or released as 2.0 (mod_perl is *finally* getting kind of close 
looks like, RC4, woo hoo), and we'd need to upgrade all our code that 
relies on them

For what?  for a minimal amount of footprint improvement?  Seriously, in 
many cases, throwing a little more hardware at 1.3 is a lot cheaper than 
sinking so many engineers into all that.  1.3 still works pretty well!

A very clean fresh install, with no third party modules, and no legacy 
code support necessary sure, we'd use 2.0!  But that's not reality 
for us yet.  We'll migrate eventually, just takes a while.

This is very much unlike that old piece of crap (Netscape server) we 
used before Apache 1.3 several years ago... we were dying with it... 1.3 
was so awesome back then, it was our savior!  And it still is pretty 
awesome!

Dave


buffer overflow in mod_proxy in 1.3.31?

2004-10-13 Thread David Burry
Has anyone checked this out yet?
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0492
It was reported in cnet news a month or two ago, and my SOX security 
guys at work have been bugging me about it...  I need to tell them 
either it's a false alarm or it will be fixed soon.

Any current status on it?
Dave


Re: Removing the Experimental MPMs in 2.2?

2004-09-03 Thread David Burry
Whether labeled experimental or not, it's always been very confusing 
to me that the release (stable) branch has modules in it that 
developers know don't work at all and therefore should not ever be 
attempted to be used by any ordinary user in any way whatsoever...

Therefore I agree, stable branch, experimental directory shouldn't be 
a place for known completely hosed and unusable modules, it should be 
a place for seems to work fine for me, see how it works for you, but 
this is pretty new and not necessarily as well tested in production on 
every platform yet so use at your own risk modules.  The broken and 
not yet finished enough for anyone to ever think about using yet even on 
an experimental basis modules should only be available in the dev 
branch, experimental directory at least until someone believes they 
work for at least some people on some platform.

Dave
Paul Querna wrote:
- leader
- perchild
- threadpool
My personal feeling is to *not* include them in the 2.2 branch at this
time.
 




Re: Reload CRL without re-starting Apache

2004-05-25 Thread David Burry
Perhaps we could use a new module that allows efficient on-the-fly 
config parameter changes without restarting any processes?  Kind of like 
a config server that you connect to and issue commands that add and 
remove apache directives, at least most of them if not all of them. 
There have been a few times when I wished for this feature, but so far 
haven't taken the time to *write* a module to do this.  I know apachectl 
graceful works, but it's quite a load to do too often.  Also there is 
the benefit that you always have a consistent view of what the 
configuration is when you use graceful, whereas it could be easy to 
become confused with on-the-fly changes, so apache administrators beware 
with such a module!

Dave


Re: strip rewritelog functionality per compiler option

2003-08-04 Thread David Burry
I agree entirely, as the documentation says, rewrite rules
are voodoo and often very hard to understand what's going
on and why a given ruleset isn't working as expected (which
is not the same as an error in the errorlog, more of a user
error).  The inability to trace through what it's doing in
the rewritelog would have made many of my past interesting
rulesets impossible to create.  Taking it out of production
sites is a great optimization, but definitely not a good
thing for site development.

Dave


On Fri, 1 Aug 2003 22:55:31 +0100
 Thom May [EMAIL PROTECTED] wrote:
 * Justin Erenkrantz ([EMAIL PROTECTED]) wrote :
  I'd support removing RewriteLog entirely in 2.1.
  
 -1 ; As Mads says, RewriteLog is used for debugging only,
 not for day-to-day
 logging. This is why Andre proposed the patch, on the
 basis that production
 sites can remove the functionality entirely, but dev
 sites that need to know
 what the hell the module is doing can still work it out.
 Removing RewriteLog entirely would make life a living
 hell.
 -Thom



RE: Bug 18388: Set-Cookie header not honored on 304 (Not modified) status

2003-06-04 Thread David Burry
Since there seems to be some disagreement on this, and the RFC doesn't
really specify which it is but instead makes a point of leaving it open
for discussion, and there are many other popular servers out there that
behave differently than apache and therefore some people may come to
expect (nay even depend on) behavior different than apache What
about just making it configurable in the httpd.conf, with the default
value the current way?  Why make it very difficult for some people to
migrate to apache on such an open-ended issue?

If we change the behavior, you are correct in that the possibility
exists to get confused about the session id, but in some cases that
might even be desireable, for instance, if you're trying to cobble
together several different web apps with a common login or something
like that without doing an invasive rewrite of the authentication part
of each app...  It would be nice as an apache administrator to have that
option in a case like that.  So making it configurable with the default
value of the current behavior might make everyone happy

Dave


-Original Message-
From: Graham Leggett [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2003 11:04 AM
To: [EMAIL PROTECTED]
Subject: Re: Bug 18388: Set-Cookie header not honored on 304 (Not
modified) status


Ryan Eberhard wrote:

 Despite the quote from Roy Fielding, I stand by my claim that 
 Set-Cookie
 is a response-header and not an entity-header.

I would say a cookie is an entity header, in that in its typical use, 
the cookie value is bound somehow to the page that comes along with it.

For example, a cookie might (and often does) contain a session id, while

the content of the page is delivered unique to the session. If the 
cookie is considered a request header, then the possiblity exists for a 
different session id to be delivered with the original content, which 
does not necessarily match that session id.

Regards,
Graham
-- 
-
[EMAIL PROTECTED]   There's a moon
over Bourbon Street
tonight...



Re: Removing Server: header

2003-03-27 Thread David Burry

- Original Message -
From: Graham Leggett
 Martin Kutschker wrote:

  Removing the server header won't hurt.

 Removing the server header is a protocol viloation, and serves no purpose.

How is it a protocol violation?  I can't find anywhere in the HTTP 1.1
protocol where it says the server header is required  In fact, it says
it's encouraged that this field be configurable for security reasons (but
doesn't specify if that means only configure the value or possibly configure
whether the header exists or not).

See:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.38

I'll agree that configuring it to not say apache doesn't serve very much
useful purpose (since it's only security by obscurity) but neither does our
current policy of allowing hiding the apache version number (since that's
security by obscurity too).  It's one and the same, whether you hide the
product name or version number... makes no difference.  The only difference
I can see is it will make it nearly impossible for people to accurately
track the numbers of Apache servers out there in the world, so I guess we
keep it for vanity purposes?

Sorry if everyone's talked to death about this issue already and sick and
tired of it, I don't remember it happening since I joined several months to
a year ago.

Dave



RE: Removing Server: header

2003-03-26 Thread David Burry
I don't see a good reason not to have a ServerTokens None option...  All
the ServerTokens options that hide version numbers are security by
obscurity anyway So it's not really anything new, just expanding
something that already exists to have a more complete compliment of
similar options.

Dave

-Original Message-
From: Brass, Phil (ISS Atlanta) [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 26, 2003 12:31 PM
To: [EMAIL PROTECTED]
Subject: RE: Removing Server: header


OK, so given that Date and Last-Modified are required response headers
and everybody pretty much hates the idea of removing them, and that
removing the Server header amounts to nothing more than security by
obscurity, is anybody still interested in seeing a patch that offers a
ServerTokens value of None and strictly prevents the addition of the
Server: header to the response?  If so I'd be happy to do it.

Thanks in advance!

Phil



Re: Advanced Mass Hosting Module

2003-03-14 Thread David Burry
You and someone else said the same thing.  I currently have a setup where we
run several hundred vhosts (all individually specified) without issue, I'll
have to remember this if it ever grows to thousands.  Thanks.  With the lack
of a more powerful vhost-alias type thing, I'll probably have to vhost-alias
all the standard bare bones configs, and list out the anomalies
separately

Dave

- Original Message -
From: Mads Toftum [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, March 14, 2003 12:55 AM
Subject: Re: Advanced Mass Hosting Module


 On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote:
  These are neat ideas.  At a few companies I've worked for we already do
  similar things but we have scripts that generate the httpd.conf files
  and distribute them out to the web servers and gracefully restart.
  Adding a new web server machine to the mix is as simple as adding the
  host name to the distribution script.
 
 This only works when you have a limited number of vhosts - if you were
 to run thousands of vhosts on each machine, then mod_vhost_alias
 (or mod_rewrite) is currently the only way to go. A module like this
 could provide a nice compromise between the flexibility of using
 httpd.conf to specify each vhost and the speed of vhost_alias.

 vh

 Mads Toftum
 --
 `Darn it, who spiked my coffee with water?!' - lwall




RE: Advanced Mass Hosting Module

2003-03-13 Thread David Burry
These are neat ideas.  At a few companies I've worked for we already do
similar things but we have scripts that generate the httpd.conf files
and distribute them out to the web servers and gracefully restart.
Adding a new web server machine to the mix is as simple as adding the
host name to the distribution script.

What you're talking about doing sounds like a lot more complexity to
achieve a similar thing, and more complexity means there's a lot more
that can go wrong.  For instance, what are you going to do if the LDAP
server is down, are many not-yet-cached virtual hosts just going to
fail?  In our scenario it's solved simply and easily by the generation
script simply failing and nothing being copied (but at least the web
servers keep working fine with the last config revision, so not many/any
end user web surfers will notice the outage).

Dave

-Original Message-
From: Nathan Ollerenshaw [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 13, 2003 3:28 AM
To: [EMAIL PROTECTED]
Subject: Advanced Mass Hosting Module


Resending this to this list as I got no response on users list.

Currently, we are using flat config files generated by our website
provisioning software to support our mass hosted customers. The reason
for doing it this way, and not using the mod_vhost_alias module is
because we need to be able to turn on/off CGI, PHP, Java, shtml etc on a
per vhost basis. We need the power that having a distinct VirtualHost
directive for each site gives you.

Is there a better way?

What I have in mind is a module that fits in with our current LDAP based
infrastructure. Currently, LDAP services our mail users, and I would
like to see the Apache mass hosting configuration held in LDAP as well.
In this way, we can just scale by adding more apache servers, mounting
the shared docroot and pointing them to the LDAP server.

The LDAP entry would look something like this:

# www.example.com, base
dn: uid=www.example.com, o=base
siteGidNumber: 10045
siteUidNumber: 10045
objectClass: top
objectClass: apacheVhost
serverName: www.example.com
serverAlias: example.com
serverAlias: another.example.com
docRoot: /data/web/04/09/example.com/www
vhostStatus: enabled
phpStatus: enabled
shtmlStatus: enabled
cgiStatus: enabled
dataOutSoftLimit: 100 (in bytes per month)
dataOutHardLimit: 1000
dataInSoftLimit: 100
dataInHardLimit: 1000
dataThrottleRate: 100 (in bits/sec)

Then, as a request came in, the imaginary mod_advanced_masshosting
module would first check to see if it had the information about the
domain already cached in memory (to avoid hitting LDAP for every HTTP
request, which would be a Bad Idea) and then if not, it would grab the
entry from LDAP, cache it, and service the incoming requests.

The cache itself would need to be shared among the actual child apache
processes somehow.

In addition to these features, the module would keep track of the amount
of data transferred in  out for each vhost and apply a soft/hard limit
when the limits defined in the LDAP entry were reached. The amount of
actual data transferred would periodically be written to either a GDBM
file or even to an LDAP entry (not sure what is best - probably LDAP for
consistency) and the data would also need to be shared among any servers
in a cluster somehow.

This would enable ISPs to bill on a per vhost basis fairly accurately,
and limit abusive sites.

Now, I've looked around for something like this, and as far as I can
see, there isn't anything that does vhosting quite like this, except for
the commercial systems out there such as Zeus.

Do people think this is a good approach?

Will another method give me what I want? (LDAP is not a dependency, just
a nice-to-have)

Finally, I am thinking about starting an Open Source project to write
this module. My C is pretty primitive right now, though I have got
simple LDAP lookup code working already (just not in Apache, yet).

Would anyone else see this as a worthwhile project for Apache?

It certainly would solve our problems, but it sometimes feels like I'm
trying to fix a simple problem with something very heavy - though
implemented correctly, I don't think performance will be a problem.

Comments gratefully received :)

Regards,

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting ValueCommerce
Japan - http://www.valuecommerce.ne.jp

If you think nobody cares if you're alive, try missing a
couple of car payments.



Re: Proposal: Remove mod_imap from default list

2003-03-09 Thread David Burry
are we talking about removing modules entirely, or just modifying what's
enabled by default?

Dave

- Original Message -
From: Justin Erenkrantz [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, March 09, 2003 8:39 PM
Subject: Re: Proposal: Remove mod_imap from default list


 --On Sunday, March 9, 2003 6:48 PM -0500 Joshua Slive [EMAIL PROTECTED]
wrote:

  2. If we want to keep our contract with the user about the stable series
  valid, this change should go into 2.1 only.  Otherwise, users doing a
  configure; make; make install or even a config.status could get a
  nasty surprise.

 +1.  2.1 is the right place to tweak module default values.  And, it's
also
 the place to remove modules entirely...  1.3 and 2.0 shouldn't change.  --
 justin




Re: story posted

2003-02-11 Thread David Burry
2.1 was started so that 2.0 can remain stable from here on out instead of
changing the 2.0 API with every minor release, requiring everyone to re-port
their modules with every minor 2.0 release  so the fact that 2.1 exists
is a very good sign for 2.0!

Dave

- Original Message -
From: Harrie Hazewinkel [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, February 11, 2003 2:30 AM
Subject: Re: story posted


 Hi,

 Following this thread, I was wondering what the majority thinks
 is a best way forward. Currently, not many modules are ready
 for Apache 2.0 regardless the reasons.

 But it seems already that work is done for Apache 2.1 and
 people have to port their module again.
 OK, the module API is almost equal, but to improve adaption
 first 2.0 needs to be an established entity and with 2.1
 the will have a bad signal/noise ratio.


 Harrie
 --
 Author of MOD-SNMP, enabling SNMP management of Apache HTTP server





RE: Strange Behavior of Apache 2.0.43 on SPARC MP system

2003-02-11 Thread David Burry
  I am running the server/client on the
  same machine.
  
  You will not get reliable results by doing this.
 
 Can you elaborate why? Plus we were forced to do this,
 but would like to avoid in the future if it really affects
 our results.

Because the client will contend very heavily with the server for many
system resources.  It's indeterminate which one (client or server)
requires more resources, which one wins more, and how much more of which
resources.  Running both on the same machine will certainly stress the
machine pretty well, but you can't compare any measurement you get with
what the same machine will perform if Apache doesn't have to contend
with a client for its resources, it won't be the same result at all.  In
the real world apache doesn't have a client stealing its system
resources, therefore an accurate test of how apache would behave in the
real world can only be done if you set up a test with the same
situation.  This could be why apache is performing better when you let
your client sleep a little (then again, it could be something else,
that's why I say it's indeterminate (unknown) how much of the
resources the client itself is stealing away from the server).  To
measure the effect of anything, you have to limit the number of
variables that can influence the result.

Dave




Re: Graceful shutdown in 2.0

2003-02-05 Thread David Burry
On our systems we just rename that alteoncheck.txt file to
alteoncheck_DOWN.txt when we're going to bring a server down (causing a
404 error for the health check, which stops all new requests), it
effectively does the same thing you describe without the hassle of writing a
handler.  And yes it is very nice in that it's easily automated...

So, yes, it would be very nice to have a graceful shutdown, but it's not
necessarily high priority for those who have some sort of load balancer box
(not round robin DNS ;) because there are other relatively simple ways to
achieve the same effect...

Dave


- Original Message -
From: Andrew Ho [EMAIL PROTECTED]

 On a more load balancer specific note, Alteons (and some other load
 balancers) use the concept of a health check URL. Our Alteons are
 configured for example to check for a specific URL (for example, the
 Alteon might do a GET alteoncheck.txt HTTP/1.0 every 2 seconds).

 I had a plan originally to write a handler that accepts requests for this
 heartbeat check... on some signal (a particular request? an OS signal?) it
 would start returning an error for the heartbeat check case, but keep
 servicing all other requests as normal. Eventually, the Alteon would
 decide that that machine was bad, and the number of connections would fall
 to zero; it would then be safe to take the server out of rotation.

 The benefit of this scenario is that you don't have to touch the load
 balancer at all to get individual machines in and out of the load
 balancer. Also, this type of scenario is far more automatable (rather than
 telnetting into, say, a load balancer console interface and navigating
 menus, ugh).




Re: Apache q for covalent.net site.

2003-02-04 Thread David Burry
I've seen the exact same problem, and diagnosed it as the same issue.  It
would be very nice to have the default apache installation handle this
properly, to prevent the dumb we-think-we're-smarter-than-you browsers from
renaming files...  if not by monkeying with the mime types file for pureness
reasons, then at least by adding config params in the default httpd.conf.

Dave

- Original Message -
From: William A. Rowe, Jr. [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, February 04, 2003 10:20 AM
Subject: Re: Apache q for covalent.net site.


 An answer to a user question (and a gripe from me since I've hit this
 problem myself on apache.org);

  [the links] point to .tar.gz files, which exist.  When they are clicked
on,
  however, they are being renamed to .tar.tar files. Any ideas?

 Renamed by the browser, not the server.

 Check in mime.types that we have content types for .tar *and* .gz.  The
 default defines only .tar, so we return application/x-tar.  If it included
.gz,
 the *final* pathname element determines the mime type, which would
 be application/x-gzip.

 Almost makes you wish for application/x-gtar or something.  You can only
 count on gnu tar to support tar -z.

 Anyways, because the content type is application/x-tar, and the browser
 sees the *final* extension is .gz, it is choosing to replace .gz with .tar
or
 even adding another .tar (e.g., .tar.gz.tar which I've seen also.)

 Seems like the ASF needs to choose between removing application/x-tar
 or adding application/x-gzip in the default mime types.  Sure, we have a
 general policy against adding x- extentions, but by adding one, we open
 ourselves up to problems. :-)

 Bill





Re: Graceful shutdown in 2.0

2003-02-04 Thread David Burry
The same effect is already possible by configuring your proxying machine to
stop forwarding new requests to that box first  Of course, it's possible
that different people manage the proxying service vs the back end apache
services, so I can see how it can be desireable to have this feature in
apache too, but still those two people should always be working pretty
closely together anyway...

Dave

- Original Message -
From: Bill Stoddard [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, February 04, 2003 8:53 AM
Subject: Graceful shutdown in 2.0


 Has anyone ever thought about the best way to implement graceful
 shutdown (perhaps with a timeout) in the server?   This would be a
 useful function to have where there is a cluster of http servers (in a
 DMZ for instance) proxying requests to a farm backend application
 servers. If you need to take an http server completely offline, you
 could issue something like apachectl shutdown_graceful [timeout] and
 prevent that server from accepting any new connections but allow the
 existing requests to be completed w/o disruption (subject to a timeout).

 Bill





Re: Graceful shutdown in 2.0

2003-02-04 Thread David Burry
um, but if you're talking about shutting down the proxy itself (i.e. the
whole service, cutting off all load balanced machines behind it) that's
hardly graceful to begin with so why bother to make it graceful...

I assumed you meant just gracefully shutting down one single load balanced
machine behind the proxy machine... you can do that already now by a)
configuring the proxy machine to stop routing (new) requests to it, b)
graceful restart the proxy machine to make the new config go into effect, c)
wait till the existing connections to that behind-the-proxy machine are
finished with a timeout if necessary (sort of part of the graceful restart
process in the proxy machine), then d) shut down the machine behind the
proxy, in that order.  External users should not notice anything at all in
this scenario.


- Original Message -
From: Bill Stoddard [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, February 04, 2003 12:25 PM
Subject: Re: Graceful shutdown in 2.0


 David Burry wrote:
  The same effect is already possible by configuring your proxying machine
to
  stop forwarding new requests to that box first

 Yep, that's the idea. In the scenario I'm interested in, Apache httpd
 -is- the proxy machine!

 Bill







Re: mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)

2003-01-02 Thread David Burry
 Random thoughts:
 - Did the content have short expiration times (or recent change dates
which
 would result in the cache making agressive expiration estimates). That
could
 churn the cache.

No.  files literally never change, when updates appear they are always new
files, web pages just point to new ones each update.  In this application
these are all executable downloadable files, think FTP repository over HTTP.

 - Was CacheMaxStreamingBuffer set appopropriately? (it may not be needed
at all
 if the content length header is included on all replies).

C-L is included (SSIs and all similar dynamic stuff is disabled), not sure
about CacheMaxStreamingBuffer, I'd need to go check.

 - Did you try caching open file descriptors? I am rather curious if
caching open
 fds will be useful/practicle on Unix systems.  Oh..., but this probably
will not
 help your disk throughput... nevermind.

:-D

 - It's probably worth noting in the doc that -each- child process will
cache up
 to MCacheSize KBytes.  If you have 10 child processes, then you need
 10xMCacheSize Kbytes RAM available just for the cache (the same files
could be
 cached in each process). I wonder if we should, at startup, allocate
MCacheSize
 KB of shared storage and have mod_mem_cache allocate out of the shared
pool.
 Each child process would have it's own unique reference to the object, but
the
 object itself would only be cached once for all processes to access.

I suspected that's where our memory running out was coming from, but it
would have been helpful to have confirmation of my suspicion in the docs,
yes.  The problem is our cache needed to be quite large to cache very many
of those large files, and we needed to run a lot of processes due to the
mutex contentions with too many threads in one process (see the [PATCH]
remove some mutex locks in the worker MPM thread)... so we kind of gave up
on mod_mem_cache.  This is kind of how this discussion branched off of that
thread, sorry I didn't state that clearly earlier.

It would be nice if there were some kind of shared cache, shared between
processes.  With large files like this it only needs to be read once it's
primed with the most popular files... and which files are the most popular
files doesn't change that often since we only make new releases every couple
months... mod_file_cache works ok for this, but we need to develop something
that guesses what will be most popular and generates the httpd.conf list
and restarts apache before each new release is publicly linked on web pages
but after the files are put live, to avoid our servers falling over with
each new release... it's quite a pain and quite scary what will happen if
those steps aren't followed correctly, that's why I was hoping Apache could
manage it automatically with mod_mem_cache.

Dave


 Bill

  Apache 2.0.43, Solaris 8, Sun E220R, 4 gig memory, gig ethernet.  We
tried
  both Sun forte and gcc compilers.  The problem was mod_mem_cache was
just
  way too resource intensive when pounding on a machine that hard, trying
to
  see if everything would fit into the cache... cpu/mutexes were very
high,
  especially memory was out of control (we had many very large files,
ranging
  from half dozen to two dozen megs, the most popular of those were what
we
  really wanted cached), and we were running several hundred concurrent
  connections at once.  Maybe a new cache loading/hit/removal algorithm
that
  works better for many hits to very large files would solve it I dunno.
 
  We finally settled on listing out some of the most popular files out in
the
  httpd.conf file for mod_file_cache, but that presents a management
problem
  as what's most popular changes.  It would have been nicer if apache
could
  auto-sense the most popular files.  Also it seems mod_file_cache has a
file
  size limit but at least we got enough in there the disk wasn't
bottlenecking
  anymore...
 
  Dave
 
  - Original Message -
  From: Bill Stoddard [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Wednesday, January 01, 2003 6:38 AM
  Subject: RE: [PATCH] remove some mutex locks in the worker MPM
 
 
it may also have to do with caching we were doing (mod_mem_cache
crashed
  and
   burned,
   What version were you running?  What was the failure? If you can give
me
  enough
   info to debug the problem, I'll work on it.
  
   Bill
  





Re: mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)

2003-01-02 Thread David Burry

- Original Message -
From: Brian Pane [EMAIL PROTECTED]
Sent: Thursday, January 02, 2003 2:19 PM

 For large files, I'd anticipate that mod_cache wouldn't provide much
benefit
 at all.  If you characterize the cost of delivering a file as

time_to_stat_and_open_and_close +
time_to_transfer_from_memory_to_network

 mod_mem_cache can help reduce the first term but not the second.  For
small
 files, the first term is significant, so it makes sense to try to optimize
 away the stat/open/close with an in-httpd cache.  But for large files,
where
 the second term is much larger than the first, mod_mem_cache doesn't
 necessarily
 have an advantage.

Unless... of course, you're requesting the same file dozens of times per
second (i.e. high hundreds of concurrent downloads per machine, because it
takes a few minutes for most people to get the file) then caching it in
memory can help, because your disk drive would sit there thrashing
otherwise.  If you don't have gig ethernet don't even worry you won't see
the problem really, ethernet will be your bottleneck.  What we're trying to
do is get close to maxing out a gig ethernet with these large files without
the machine dying...

  And it has at least three disadvantages that I can
 think of:
   1. With mod_mem_cache, you can't use sendfile(2) to send the content.
  If your kernel does zero-copy on sendfile but not on writev, it
  could be faster to deliver a file instead of a cached copy.

Memory is always faster than a spinning disk.  It should be possible to make
a memory cache that's faster than the disk, and uses the same amount of
space as disk too.

   2. And as long as mod_mem_cache maintains a separate cache per worker
  process, it will use memory less efficiently than the filesystem
  cache.

yes, that is definitely a problem.  good that mod_file_cache does not have
this problem, but it has other file list maintainability problems.

   3. On a cache miss, mod_mem_cache needs to read the file in order to
  cache it.  By default, it uses mmap/munmap to do this.  We've seen
  mutex contention problems in munmap on high-volume Solaris servers.

sounds familiar...

 What sort of results do you get if you bypass mod_cache and just rely on
 the Unix filesystem cache to keep large files in memory?

Not sure how to configure that so that it will use a few hundred megs to
cache often-accessed large files... but I could ask around here to more
solaris-knowledgable people...

Dave




Re: mod_mem_cache bad for large/busy files (Was: [PATCH] removesome mutex locks in the worker MPM)

2003-01-02 Thread David Burry
interesting... so then why did using mod_file_cache to specify caching a
couple dozen known-most-often-accessed files decrease disk io significantly?
I'll try the test you mention next time I get a chance.

Dave

- Original Message -
From: Brian Pane [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 02, 2003 9:43 PM
Subject: Re: mod_mem_cache bad for large/busy files (Was: [PATCH] removesome
mutex locks in the worker MPM)


 On Thu, 2003-01-02 at 21:21, David Burry wrote:
  - Original Message -
  From: Brian Pane [EMAIL PROTECTED]
  Sent: Thursday, January 02, 2003 2:19 PM
  
   For large files, I'd anticipate that mod_cache wouldn't provide much
  benefit
   at all.  If you characterize the cost of delivering a file as
  
  time_to_stat_and_open_and_close +
  time_to_transfer_from_memory_to_network
  
   mod_mem_cache can help reduce the first term but not the second.  For
  small
   files, the first term is significant, so it makes sense to try to
optimize
   away the stat/open/close with an in-httpd cache.  But for large files,
  where
   the second term is much larger than the first, mod_mem_cache doesn't
   necessarily
   have an advantage.
 
  Unless... of course, you're requesting the same file dozens of times per
  second (i.e. high hundreds of concurrent downloads per machine, because
it
  takes a few minutes for most people to get the file) then caching it
in
  memory can help, because your disk drive would sit there thrashing
  otherwise.  If you don't have gig ethernet don't even worry you won't
see
  the problem really, ethernet will be your bottleneck.  What we're trying
to
  do is get close to maxing out a gig ethernet with these large files
without
  the machine dying...

 Definitely, caching the file in memory will help in this scenario.
 But that's happening already; the filesystem cache is sitting
 between the httpd and the disk, so you're getting the benefits
 of block caching for oft-used files by default.


   What sort of results do you get if you bypass mod_cache and just rely
on
   the Unix filesystem cache to keep large files in memory?
 
  Not sure how to configure that so that it will use a few hundred megs to
  cache often-accessed large files... but I could ask around here to more
  solaris-knowledgable people...

 In my experience with Solaris, the OS is pretty proactive about
 using all available memory for the filesystem cache by default.
 One low-tech way you could check is:
   - Reboot
   - Run something to monitor free memory (top works fine)
   - Run something to read a bunch of your large files
 (e.g., cksum [file]).
 In the third step, you should see the free memory decrease by
 roughly the total size of the files you've read.

 Brian






mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)

2003-01-01 Thread David Burry
Apache 2.0.43, Solaris 8, Sun E220R, 4 gig memory, gig ethernet.  We tried
both Sun forte and gcc compilers.  The problem was mod_mem_cache was just
way too resource intensive when pounding on a machine that hard, trying to
see if everything would fit into the cache... cpu/mutexes were very high,
especially memory was out of control (we had many very large files, ranging
from half dozen to two dozen megs, the most popular of those were what we
really wanted cached), and we were running several hundred concurrent
connections at once.  Maybe a new cache loading/hit/removal algorithm that
works better for many hits to very large files would solve it I dunno.

We finally settled on listing out some of the most popular files out in the
httpd.conf file for mod_file_cache, but that presents a management problem
as what's most popular changes.  It would have been nicer if apache could
auto-sense the most popular files.  Also it seems mod_file_cache has a file
size limit but at least we got enough in there the disk wasn't bottlenecking
anymore...

Dave

- Original Message -
From: Bill Stoddard [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, January 01, 2003 6:38 AM
Subject: RE: [PATCH] remove some mutex locks in the worker MPM


  it may also have to do with caching we were doing (mod_mem_cache crashed
and
 burned,
 What version were you running?  What was the failure? If you can give me
enough
 info to debug the problem, I'll work on it.

 Bill





Re: [PATCH] remove some mutex locks in the worker MPM

2002-12-31 Thread David Burry
Oh, I should have mentioned, our mutex issues lessened a lot when we made
more processes with fewer threads each, but that kind of started defeating
the purpose of using the worker mpm after a while...  your optimizations
sound like they may help fix this issue.. thanks again.

Dave

- Original Message -
From: David Burry [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, December 31, 2002 5:54 PM
Subject: Re: [PATCH] remove some mutex locks in the worker MPM


 Ohh this sounds like an awesome optimization... I noticed mutex
contentions
 were extremely high on a very high traffic machine (say.. high enough to
get
 close to maxing out a gig ethernet card) using the worker mpm on solaris
 8...  it may also have to do with caching we were doing (mod_mem_cache
 crashed and burned, we had to use mod_file_cache to get it to work but it
 was still quite the exercise).

 Dave

 - Original Message -
 From: Brian Pane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Tuesday, December 31, 2002 5:30 PM
 Subject: [PATCH] remove some mutex locks in the worker MPM


  I'm working on replacing some mutex locks with atomic-compare-and-swap
  based algorithms in the worker MPM, in order to get better concurrency
  and lower overhead.
 
  Here's the first change: take the pool recycling code out of the
  mutex-protected critical region in the queue_info code.  Comments
  welcome...
 
  Next on my list is the code that synchronizes the idle worker count.
  I think I can eliminate the need to lock a mutex except in the
  special case where all the workers are busy.
 
  Brian
 
 





2.0.44 release?

2002-12-19 Thread David Burry
What does everyone think about releasing 2.0.44 soon?  My company kind of
needs the logio fixes but we're hesitant to run our own special patched
version of 2.0.43 when we have so much riding on this project...

Dave




dynamically change max client value

2002-11-04 Thread David Burry
Recently there has been a little discussion about an API in apache for
controlling starts, stops, restarts, etc...

I have an idea that may help me solve a problem I've been having.  The
problem is in limiting the number of processes that will run on a machine to
somewhere below where the machine will keel over and die, while still being
close to the maximum the machine will handle.  The issue is depending on
what the majority of those processes are doing it changes the maximum number
a given machine can handle by a few orders of magnitude, so a multi-purpose
machine that serves, say, static content and cgi scripts (or other things
that vary greatly in machine resource usage) cannot be properly tuned for
maximum performance while guaranteeing the machine won't die under heavy
load.

The solution I've thought of is... what if Apache had an API that could be
used to say no more processes, whatever you have NOW is the max!  or
otherwise to dynamically raise or lower the max number (perhaps oh there's
too many, reduce a bit)  You see, an external monitoring system could
monitor cpu and memory and whatnot and dynamically adjust apache depending
on what it's doing.  This kind of system could really increase the
stability of any large Apache server farm, and help keep large traffic
spikes from killing apache so bad that nobody gets served anything at all.

In fact this idea could be extended someday to dynamically change all sorts
of apache configuration things, but all I really need that I know of right
now is the max client value...

What do you all think of this idea?  Does this already exist perhaps?

Dave




Re: dynamically change max client value

2002-11-04 Thread David Burry
I realize that allowing _everything_ to be dynamically configured via SNMP
(or signal or something) would probably be too substantial of an API change
to be considered for the current code base, but it would be nice to consider
it for some future major revision of Apache

And it would be more than just nice if at least the max client value thing
could be somehow worked into the current versions of Apache...  There is a
current very real and very large problem that could be solved by this, not
just a nice to have feature.  This is what I meant to emphasize in my
original email...

Dave

- Original Message -
From: Dirk-Willem van Gulik [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, November 04, 2002 9:35 AM
Subject: Re: dynamically change max client value



 In my ideal world every config directive would be able to advertize or
 register an optional 'has changed' hook. Which, if present, would be
 called in context whenever a value is somehow updated (through snmp, a
 configd, signal, wathever). If there is no such hook; the old -update- on
 graceful restart is the default (though it sure would be nice to have some
 values also advertize that they need a full shutdown and restart).

 Of course one could also argue for not just a put but also for a 'get'
 interface in context :-)

 Dw

 On Mon, 4 Nov 2002, David Burry wrote:

  Recently there has been a little discussion about an API in apache for
  controlling starts, stops, restarts, etc...
 
  I have an idea that may help me solve a problem I've been having.  The
  problem is in limiting the number of processes that will run on a
machine to
  somewhere below where the machine will keel over and die, while still
being
  close to the maximum the machine will handle.  The issue is depending on
  what the majority of those processes are doing it changes the maximum
number
  a given machine can handle by a few orders of magnitude, so a
multi-purpose
  machine that serves, say, static content and cgi scripts (or other
things
  that vary greatly in machine resource usage) cannot be properly tuned
for
  maximum performance while guaranteeing the machine won't die under heavy
  load.
 
  The solution I've thought of is... what if Apache had an API that could
be
  used to say no more processes, whatever you have NOW is the max!  or
  otherwise to dynamically raise or lower the max number (perhaps oh
there's
  too many, reduce a bit)  You see, an external monitoring system
could
  monitor cpu and memory and whatnot and dynamically adjust apache
depending
  on what it's doing.  This kind of system could really increase the
  stability of any large Apache server farm, and help keep large traffic
  spikes from killing apache so bad that nobody gets served anything at
all.
 
  In fact this idea could be extended someday to dynamically change all
sorts
  of apache configuration things, but all I really need that I know of
right
  now is the max client value...
 
  What do you all think of this idea?  Does this already exist perhaps?
 
  Dave
 
 





Re: dynamically change max client value

2002-11-04 Thread David Burry
Interesting comments, thanks.  You obviously speak from experience.

The idea I was having is that no matter how overloaded a machine becomes, it should 
never run so far out of resources that it dies, but there should be some kind of limit 
in place... I thought this was what MaxClients was for, however it varies too greatly 
what the optimum value is depending on what's getting hit so I don't know what to do 
about it...

Your idea of a reverse proxy in front (or module) turning away or redirecting some 
requests in a very light fashion deserves merit...  I know that Excite.com used to 
have a very light static home page they'd slap up during peak loads for this very 
reason, I'll investigate something like that for the solution to my problem instead of 
what I originally suggested, thanks.

Dave


At 12:01 PM 11/4/2002 -0800, Scott Hess wrote:
Based on my experience, this wouldn't be a high-quality solution, it would
be a hack.  I've seen very few cases where load spiked enough to be an
issue, but was transient enough that a solution like this would work - and
in those cases, plain old Unix multitasking normally suffices.

What happens if you implement the solution anyhow is that you get a bunch
of users stuck in the ListenBacklog.  So they'll wait a couple minutes
before their request even starts.  If you have a deep backlog, requests
just pile up so that the machine never gets its head above water.  In the
worst case, clients will timeout while their request is in the backlog,
but since you don't find that out until you send a response which writes
out to the network, you can very easily do work that can never be
delivered.  Beyond all that, the user experience simply _sucks_.

[Yes, I've done what you suggest, just not using the implementation you
suggest.  It's integrated into an existing custom module, you could also
probably do it with a reverse proxy.  In the end, it was not a productive
solution.]

What I think you really want is a module that will intercept all requests,
and send back The server is really busy, try again in five minutes if
the server is too busy by some measure.  You generally want this to be a
super-low-cost option, so that you can spin through requests very quickly.  
Optimally, no externally-blockable pieces (no database connections, no
locking filesystem access, etc).  One relatively simple option might be to
use a Squid, and an URL redirector which implements the magic check.  If
the machine is not busy, send through to the real server, if the machine
is busy, redirect to an URL which will deliver your message.

[Again, yes, I've done this in Apache1.3, but in code targetted to our
custom modules.  You could certainly do it more generically, I just
haven't had the need.  You might check mod_backhand.]

Later,
scott


On Mon, 4 Nov 2002, David Burry wrote:
 I realize that allowing _everything_ to be dynamically configured via
 SNMP (or signal or something) would probably be too substantial of an
 API change to be considered for the current code base, but it would be
 nice to consider it for some future major revision of Apache
 
 And it would be more than just nice if at least the max client value
 thing could be somehow worked into the current versions of Apache...  
 There is a current very real and very large problem that could be solved
 by this, not just a nice to have feature.  This is what I meant to
 emphasize in my original email...
 
 Dave
 
 - Original Message -
 From: Dirk-Willem van Gulik [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Monday, November 04, 2002 9:35 AM
 Subject: Re: dynamically change max client value
 
 
 
  In my ideal world every config directive would be able to advertize or
  register an optional 'has changed' hook. Which, if present, would be
  called in context whenever a value is somehow updated (through snmp, a
  configd, signal, wathever). If there is no such hook; the old -update- on
  graceful restart is the default (though it sure would be nice to have some
  values also advertize that they need a full shutdown and restart).
 
  Of course one could also argue for not just a put but also for a 'get'
  interface in context :-)
 
  Dw
 
  On Mon, 4 Nov 2002, David Burry wrote:
 
   Recently there has been a little discussion about an API in apache for
   controlling starts, stops, restarts, etc...
  
   I have an idea that may help me solve a problem I've been having.  The
   problem is in limiting the number of processes that will run on a
 machine to
   somewhere below where the machine will keel over and die, while still
 being
   close to the maximum the machine will handle.  The issue is depending on
   what the majority of those processes are doing it changes the maximum
 number
   a given machine can handle by a few orders of magnitude, so a
 multi-purpose
   machine that serves, say, static content and cgi scripts (or other
 things
   that vary greatly in machine resource usage) cannot be properly tuned

Re: new download page

2002-10-27 Thread David Burry
Awesome script...  I hadn't thought of doing it this way, this is better
than what I was thinking.. it seems to address everyone's concerns too in
the best way that's still within our resources.

Dave

- Original Message -
From: Justin Erenkrantz [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, October 27, 2002 1:50 PM
Subject: Re: new download page


 --On Sunday, October 27, 2002 9:39 AM -0800 Justin Erenkrantz
 [EMAIL PROTECTED] wrote:

  I'm trying to write it up now.  I'm also cleaning up closer.cgi
  while I'm at it.  -- justin

 Well, that took *way* longer than I wanted to.  Anyway, a rough
 sketch of what I'm thinking of is here:

 http://www.apache.org/dyn/mirrors/httpd.cgi

 And, to prove that this new system isn't any worse than the old one:

 http://www.apache.org/dyn/mirrors/list.cgi

 This is a python-based CGI script that uses Greg Stein's EZT library
 (much kudos to Greg for this awesome tool).  It allows for the
 separation of the layout from the mirroring data.  Therefore, it
 makes it really easy to do the above with only one CGI script
 (httpd.cgi and list.cgi are symlinked to the same file) that has
 multiple 'views' and templates.

 We would probably have to work a bit on the layout and flesh it out
 some, but this is the idea that I had.

 Source at:

 http://www.apache.org/~jerenkrantz/mirrors.tar.gz

 If I could run CGI scripts from my home dir, I wouldn't have stuck
 this in www.apache.org's docroot, but CGI scripts are not allowed
 from user directories.  ISTR mentioning this before and getting no
 response from Greg or Jeff.  -- justin





Re: new download page

2002-10-26 Thread David Burry
Excellent little utility... however closer network-wise is often
significantly different than closer geographically, for instance California
is likely a lot closer to Peru than Chile is (as an extreme example), if you
go by the packets fly instead of by the crow flies...   Also when a closer
server is overloaded you will get a download quicker from a more distant
server (regardless of how you define closer).  So a good balancing
algorithm really shouldn't care about geographic distance but traceroute
hops and ping times and server loads...

Dave

- Original Message -
From: Joshua Slive [EMAIL PROTECTED]
 See: http://maxmind.com/geoip/

 If someone wants a little project, it shouldn't be too hard to integrate
 this into the existing closer.cgi script.

 Joshua.






Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002

2002-10-24 Thread David Burry
Is it possible to get some of the fixes to mod_logio committed?  Wouldn't
everyone agree that the current logging of the outgoing bytes is incorrect
behavior?  Currently it logs the full file size (plus headers) even if it
gets cut off in the middle, instead of the actual number of bytes sent.
I've seen several patches to fix this but very little comment on it...  I've
seen lots of comments that it can't be done without major rearchitecting,
but Bojan seems to have done it without that by breaking pipelining, am I
correct?

I also wish that %b would be fixed in a similar manner but I haven't seen
any patches for that (or comments about it).  Wouldn't everyone agree that
it too should log actual bytes sent not just the full file size every time?
Apache 2.0 should do everything that 1.3 did, so this logging issue really
should be considered a bug, right?

Since I depend on correct outgoing byte count logging to see how many people
sucessfully download files, I can live with broken pipelining for now in
2.0, currently I've had to roll back to Apache 1.3 and put in 3 times as
many machines (12 instead of 4)  I'd really like to return those 8
borrowed machines someday and be able to upgrade to 2.0... but can't do that
in the current broken log state.

Dave




Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002

2002-10-24 Thread David Burry
- Original Message -
From: Bojan Smojver [EMAIL PROTECTED]


 On Fri, 2002-10-25 at 03:31, David Burry wrote:
  Is it possible to get some of the fixes to mod_logio committed?
Wouldn't
  everyone agree that the current logging of the outgoing bytes is
incorrect
  behavior?  Currently it logs the full file size (plus headers) even if
it
  gets cut off in the middle, instead of the actual number of bytes sent.
  I've seen several patches to fix this but very little comment on it...
I've
  seen lots of comments that it can't be done without major
rearchitecting,
  but Bojan seems to have done it without that by breaking pipelining, am
I
  correct?

 Actually, the last patch I sent contains one snag I'm still working on.
 It breaks the core's connection configuration structure, which gets
 attached to c-conn_config. However, I think I can get around that by
 using an optional function. As the matter of fact, I'm working on it
 right now.

I see.. ok, I'll keep waiting patiently...

  Since I depend on correct outgoing byte count logging to see how many
people
  sucessfully download files, I can live with broken pipelining for now in
  2.0, currently I've had to roll back to Apache 1.3 and put in 3 times as
  many machines (12 instead of 4)  I'd really like to return those 8
  borrowed machines someday and be able to upgrade to 2.0... but can't do
that
  in the current broken log state.

 Glad to hear Apache 2.0 makes a huge performance difference. Not so glad
 to hear you had to resort to going back to 1.3. The only thing I can
 promise is a patch using optional function (this should guarantee
 compatibility of core between 43 and 44 and no MMN bump) during the day
 (Sydney time). It's up to the committers to review and, if they like,
 commit.

The memory savings is quite significant, and I'll admit that the 8 extra
machines are smaller than the original 4 so it's not exactly 3 times better
cpu-wise...  the memory caching is where the largest savings comes with disk
IO, we have a very high traffic site--usually 3 terabytes transferred per
day, last few days have been more like 5 terabytes due to a new release.

 PS. By the number of messages on the list I'm guessing committers must
 be rather busy on their real jobs these days. Unfortunately there is no
 way of speeding things up, given this is volunteer effort. Unless, of
 course, you decide to bribe some of them ;-)

Not exactly the same thing as bribing a commit, but this could get similar
results:  My manager's manager is actually not opposed to hiring a
contractor to fix this... anyone want a temporary job?  I don't know if this
is the right place to say such things, tell me if there's a better place.
When you've got millions of dollars worth of sales depending on an open
source project, throwing a little at the open source project isn't such a
big deal...  I'd gladly do it myself with my company's blessing (on the
clock, not volunteer) but I'm not a very experienced C programmer yet, more
of a Perl hacker and applications architect so far.  This little paragraph
had better not get me too flooded with resumes or flames or I'm going to
feel dumb, whatever you do don't spam this list with personal replies!   ;o)

Dave




Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002

2002-10-24 Thread David Burry
Excellent!  I will perform some tests with that when I get a chance!  You managed to 
get it working without breaking pipelining even?  That's awesome!

Not meaning to belittle Bojan's hard work, but for my purposes mod_logio values are 
not as good as %b would be if %b worked properly... what I ideally need is the byte 
sent count without the headers... using Bojan's module I can get approximate results 
but they will be a hair off because they include headers...  My main purpose is to 
detect if and when several meg files have been downloaded all the way vs. if they were 
cut off in the middle, including if a given user uses some byte-ranging download 
manager that lets you pause and restart...  We also use it for chargebacks to the 
various departments for bandwidth usage (in this case mod_logio would of course be 
more accurate than %b though)...  We actually had to fudge some of our statistics 
(duplicated nearby days' data with similar overall throughputs) due to us not catching 
the problem with Apache 2.0 soon enough...

Dave


At 09:15 AM 10/25/2002 +1000, Bojan Smojver wrote:
On Fri, 2002-10-25 at 07:42, David Burry wrote:

 I see.. ok, I'll keep waiting patiently...

The patch for 2.0.43 is here:

ftp://ftp.rexursive.com/pub/apache/counting_io_flush-2.0.43.patch

You need to apply mod_logio patch for 2.0.43 first.

Bojan




Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002

2002-10-24 Thread David Burry
At 09:38 PM 10/24/2002 -0400, Glenn wrote:
Have you looked at the %...X directive in Apache2?


That's an interesting idea I hadn't thought of...  it doesn't solve the chargeback 
issue but it's worth investigating for detecting successful downloads...

Dave




Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002

2002-10-24 Thread David Burry
At 08:45 PM 10/24/2002 -0500, William A. Rowe, Jr. wrote:
At 08:40 PM 10/24/2002, Bojan Smojver wrote:
Quoting David Burry [EMAIL PROTECTED]:

 Excellent!  I will perform some tests with that when I get a chance!  You
 managed to get it working without breaking pipelining even?  That's awesome!

That's what I *think*, which has been known to deviate from the truth, from time
to time. However, I appreciate all input, especially results of the actual tests.

  I recall you had tested a ton of 'little files' pipelined.

  What might be more interesting is a 100MB download (over a fast pipe)
which is entirely 'sendfile'd out.  Apache would consider itself done with
the request long before it was finished with the connection.


In case someone else wants to independently verify it...

The exact test I was doing was with a 70+ meg .tar.gz file both over a 100mbit 
ethernet and a 1.5mbit DSL, starting and canceling it multiple times in Windoze 
Internet Explorer 5 or 6 (which appears to effectively use byte range requests for 
subsequent tries, by the way) and monitoring what was logged each time.  This test 
isn't super precise on the byte count (I did not bother to go comb my IE cache) but it 
sure is obvious when it consistently logs the whole file size and I only received a 
small fraction according to the IE progress bar...  Also looking at the byte range 
requests with %{Range}i makes it obvious how much IE received previously on each 
subsequent try (IE appears to only request the part of the file it hasn't cached yet).

I was thinking of writing a script that did this in a more automated fashion... 
perhaps contributing a test to the apache test suite when I figure that thing out... 
;o)

Dave





Re: Counting of I/O bytes

2002-10-16 Thread David Burry

I know that I'm ultimately wanting to log bytes per request pretty
accurately to add up and tell if the whole file was transferred or not even
if it was broken up into several byte range requests.  I have given up on
the possibility of this happening in 2.0 for a long time to come due to the
architecture changes required to make this happen, we're back on 1.3 with
three times as many servers because of the lack of threading... :-(

Dave

- Original Message -
From: Bojan Smojver [EMAIL PROTECTED]

 Is something like this possible? If not, I think we should be pretty much
OK as
 the whole point of mod_logio is to log the traffic, most likely per
virtual host.





Re: url finishing by / are declined by cache

2002-10-16 Thread David Burry

- Original Message -
From: Thomas Eibner [EMAIL PROTECTED]
 On Wed, Oct 16, 2002 at 11:20:07AM -0400, Bill Stoddard wrote:
   Hi,
  
   Why url finishing by / are not cacheable ?
 
  muddled thinking ? :-) I was unsure how to handle caching default index
  pages but I see no reason why we can't just delete this check.

 Not for negotiation reasons? (I seem to remember discussions where
 that was brought up).

But any file could be negotiated the same way, not only files that end with
/  Negotiated pages need to inform caches that they are un-cacheable by
setting the correct headers to inform everyone of that, that would take care
of this case as well.

Dave





Re: apache test suite?

2002-10-13 Thread David Burry

sorry folks, I knew it would be that easy, I should have looked more closely
at the web site too  ;o)

Dave


- Original Message -
From: Cliff Woolley [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, October 12, 2002 9:50 PM
Subject: Re: apache test suite?


 On Sat, 12 Oct 2002, David Burry wrote:

  Has anyone worked on an Apache test suite?  You know, like how many
things
  have a make test that runs all sorts of tests... or perhaps a separate
  package that runs tests...  I might be interested in starting one but
would
  rather build upon other's work if some of it has already been done...

 See the httpd-test repository.  :)





apache test suite?

2002-10-13 Thread David Burry

Has anyone worked on an Apache test suite?  You know, like how many things
have a make test that runs all sorts of tests... or perhaps a separate
package that runs tests...  I might be interested in starting one but would
rather build upon other's work if some of it has already been done...

Dave




Re: apache 2.0.43: %b not showing bytes sent but bytes requested

2002-10-11 Thread David Burry

ok serious problems still...  I've sucessfully installed mod_logio
(finally), and %O is STILL not logging the actual bytes transferred, but the
bytes apache is INTENDING to eventually transfer... if I stop it in the
middle by canceling or disconnecting, this number is horribly too big
Of course it's a couple bytes larger than %b because of the headers
included, but this is still totally useless for the purpose of figuring out
if/when/whether the whole file was actually downloaded.  Anyone have any
ideas why this is so and how to fix it?  Is the logging happening completely
before the request is finished, could that be the problem?

This should also be a concern for anyone who's using mod_logio to charge for
bandwidth, because customers should be concerned about some serious
overcharging going on here!

I had to install libtool 1.4 and m4 1.4 and this was my configure line:
./configure --prefix=/usr/local/apache_2.0.43+logio --enable-logio --enable-
usertrack --enable-file-cache --enable-cache --enable-mem-cache --enable-sta
tic-support --disable-cgid --disable-cgi --enable-rewrite --with-mpm=worker 
--disable-userdir


Dave

- Original Message -
From: David Burry [EMAIL PROTECTED]
 Yes, I've been thinking of experimenting with mod_logio, but I'm a bit
 hesitant to hack out a patch from (or use whole-hog) CVS HEAD into a
 production site that gets 3 terabytes of traffic per day and I'm
embarassed
 to admit how much revenue depends on... ;o)  Thanks for the link, I'll try
 that.  It won't be as accurate as getting the byte count without the
 headers, but at least it should be something better than nothing if it
works
 as described...

 If we're not going to fix %s shouldn't we at least fix the documentation
to
 be more accurate?  2.0 and 1.3 really are quite different here.

 Dave

 - Original Message -
 From: Bojan Smojver [EMAIL PROTECTED]
  Have you tried using mod_logio? If won't only give you the body bytes,
 but
  also the headers bytes. It reports the number of input bytes too and
 should
  understand encryption and compression. You can either check it out from
 Apache
  CVS (HEAD), or download the patch for 2.0.43 here:
  http://www.rexursive.com/software.html.
 
  You'd use it with %I (for input) and %O (for output). It would be
 interesting to
  know if it reports accurately in the case you described...





Re: apache 2.0.43: %b not showing bytes sent but bytes requested

2002-10-11 Thread David Burry

Being off by a little is better than being off by the whole thing... My
tests show that was Apache 1.3 behavior...  At least that way the value is
close, so if you're charging for bandwidth you're not overcharging so much,
and you can still tell if the whole file got there or not.

Dave

- Original Message -
From: Bennett, Tony - CNF [EMAIL PROTECTED]
 I believe the %b gets its value from the request_rec's bytes_sent
 member.  This field is filled in by the content_length output
 filter.  It is not filled in during the actual send operation.
 Even if it was filled in following each send, that doesn't
 mean it was received by the client... in the event of a disconnect,
 I believe the bytes_sent will always be off.

 -tony




apache 2.0.43: %b not showing bytes sent but bytes requested

2002-10-10 Thread David Burry

The documentation for Apache 2.0.43 for mod_log_config states:

%...b: Bytes sent, excluding HTTP headers. In CLF format i.e. a '-' rather than a 0 
when no bytes are sent.

However, in testing I clearly see it's logging the number of bytes _requested_ (that 
is, that apache intended to send)!!! not the actual bytes _sent_!!!  If a user presses 
the cancel button on their browser or they're cut off in the middle, this number is 
not accurate at all, because it appears the entire file was sent when it was not.

We're running a site that serves many large files (dozen megs or so typically) for 
download.  It depends on this bytes sent number for statistics and monitoring to see 
if and when a download has completed including with a 206 byte range response... 
Typical throughput is 600 mbit/sec, 3 terabytes/day, running on Solaris 8 on 4 Sun 
E280R's with 4 gig of ram each...  We're seriously considering rolling back to old 
hardware with Apache 1.3.x (which seems to log actual bytes sent by the way) and 
CacheFlow machines because of this issue...  Is there a patch out for this problem 
instead? or is someone already working on this problem?  or does anyone have an idea 
where the root of this problem is and I might take a stab at patching myself?

As a side note, we tried but we were unable to use mod_mem_cache on this setup we 
suspect due to mutex issues, might be possible if we spread this out over more 
machines (but what's the point of memory caching if you have to do that), 
mod_file_cache works ok though for a static list of the most-often-used files, though 
mmap does have its limitations on how large of a file can be stuck into memory...

We have sendfile installed and I configured with the following:

./configure --prefix=/usr/local/apache_2.0.43 --enable-usertrack --enable-file-cache 
--enable-cache --enable-mem-cache --enable-static-support --disable-cgid --disable-cgi 
--enable-rewrite --with-mpm=worker --disable-userdir

Dave




Re: mod_blanks

2002-09-26 Thread David Burry

I agree that mod_gzip does a lot better job as far as compression goes, and
it doesn't even use more cpu likely.

However, it's still important to remove HTML and JavaScript comments
sometimes for security reasons, but I suspect this could probably be better
done as part of the publishing process, not on the fly as pages are served.
(even gzip compression could be done this way actually, come to think of it)

Dave

- Original Message -
From: Peter J. Cranstone [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, September 26, 2002 12:54 PM
Subject: RE: mod_blanks


 Fabio,

 Mod_gzip for Apache is a better solution. Prior to it's release both
 Kevin and I looked at what we call poor man's compression. I.e. just
 removing the blank spaces, lines and other garbage in a served page.

 Here was what we learned.

 No one was interested. It didn't save much on the overall page, and
 people really don't like their HTML etc being messed with.

 Also it's easier if you are going to spend the CPU cycles to simply use
 gzip compression to squeeze the page by upwards of 80%+ and save all the
 formatting to the Author's HTML

 Mod_gzip already saves a ton of bandwidth and with a current browser
 there is no need to install a client side decoder.

 Regards,


 Peter J. Cranstone


 -Original Message-
 From: fabio rohrich [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, September 26, 2002 6:38 AM
 To: [EMAIL PROTECTED]
 Subject: mod_blanks

 I'm going to develop this topic for thesis.
 Has anybody of you any suggest for it? Something to
 addin the development (like compression of the string
 ) or some feature to implement!

 And, the last thing, what do you think about it?

 Thanks a lot,
 Fabio

 - mod_blanks: a module for the Apache web server which
 would on-the-fly
 remove unnecessary blank space, comments and other
 non-interesting
 things from the served page.  Skills needed: the C
 langugae, a bit of
 text parsing techniques, HTML, learn Apache API.
 Complexity: low to
 moderate (after learning the API).  Usefulness:
 moderate to low (but
 maybe better than that, it's a kind of nice toy topic
 that could be
 shown to save a lot of bandwith on the Internet :-).



 __
 Mio Yahoo!: personalizza Yahoo! come piace a te
 http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/





Re: [PATCH] add simple ${ENV} substitution during config file read

2002-09-25 Thread David Burry

This may be confusing because people may begin to expect it to do the substitution at 
request time in certain cases instead of only at server startup time  Admittedly 
that would be almost like turning every directive into mod_rewrite, but... an env var 
is an env var, many things are handled at request time so

Dave


At 03:55 AM 9/26/2002 +0200, Dirk-Willem van Gulik wrote:


On Thu, 26 Sep 2002, [ISO-8859-1] André Malo wrote:

 I'm note sure, but I'd guess this may cause conflicts with mod_rewrite.

Mod rewrite uses % rather than $ for variable names.

It does use $1, $2.. for back references. Which is not a problem as it is
not followed by a {.

It also uses the dollar for the end of string match. Which thus is thus
very unlikely to be followed by a { too.

But...  It also uses the $ for ${mapname:key|default} constructs - which
may cause an issue now. We can make sure that such is ignored; or amend
the patch to allow a '\' or extra $ to escape either the $ or the {.

 Otherwise...hmm, the feature probably leads to some weird effects, if
 you forget to set or to remove some env variables...

Aye - this is all about giving a competent admin enough rope to hang
him/herself - while making sure that a normal user (who is not having any
${} constructs in his/her file is unaffected.).

Dw




ForceLanguagePriority in Apache 1.3

2002-09-18 Thread David Burry

By the way, I love the idea of backporting the Apache 2.0
ForceLanguagePriority into Apache 1.3...  This directive completely solves a
looot of problems I've been having with stupid non-standards-conformant IE
and content negotation.  Many thanks to whoever posted the patch.  If only
it were possible that this could be included with 1.3.27 I'd be one very
happy camper!  ;o)

Dave




Re: 2.0.40 (UNIX), mysterious SSL connections to self

2002-08-29 Thread David Burry

You may want to try actually grabbing a couple-byte monitor page in your case from the 
load balancers and parse it and look for a special string inside it, that's what we do 
and that works well.  This method not only tests if a connection is being opened, but 
more thoroughly tests server internals (i.e. Tomcat if you're using mod_jk for 
instance).  And since they're all valid requests we can just do a simple 
containerSetEnv var/container or SetEnvIf var and CustomLog !var to filter out the 
noise, that way no Perl overhead.  BTW, we're doing our health checks every 5 seconds 
not every 1 second.

Dave

At 10:31 PM 8/29/2002 -0400, [EMAIL PROTECTED] wrote:
My solution to the complaints was to use piped error logs and have a
simple Perl script as the first script in a pipeline.  The Perl script's
only job was to remove those error messages and then pass the log line
on to cronolog.

The reason for taking such a measure was because the server farm was
behind a pair of commercial load balancers which made TCP connections to
port 80 and port 443 ** every second ** to health check that the servers
were alive and accepting connections within a reasonable response time.
Then, it shut down the connection, without attempting any SSL negotiation.
So every second, every server was logging two SSL failure messages (from
each of the redundant load balancers).  Talk about noise in the logs!

It would be Real Nice (tm) if these sort of SSL error messages weren't
reported unless some sort of data was exchanged above the connection
level.  Only in those cases would the SSL error message be correct that
SSL negotiation failed, as opposed to not even having started.

-Glenn




Re: Command line argument inconsistency...

2002-07-23 Thread David Burry

What about using something else external to apache to detect output under
netware?  Is it possible to pipe the output to something else that detects
if there is any, and holds the window open only if there is?  Wouldn't that
solve the problem in a more consistent way than altering the return codes?
In my opinion, return codes should just be consistent as to whether there is
an error or not (and perhaps if so, what kind of error), and not indicate
whether there's output or not, since we could simply check the output itself
for that info.

Dave

- Original Message -
From: Brad Nicholes [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, July 23, 2002 3:07 PM
Subject: Re: Command line argument inconsistency...


I understand and that is the reason why I asked.  As far as the
 return value is concerned on NetWare, it doesn't matter.  But I need
 some indication that the screen needs to stay open so that the user can
 see what happened.  If an error occurs and an error message is printed
 to the screen, the exit code will most likely not be 0, so we are OK.
 And I agree that returning an error code on a normal exit such as with
 -h or -v command line arguments would also be inconsistent.  But I have
 no way of telling the difference between a normal exit (ie. Apache
 exited normally on a shutdown) or a normal exit with messages (ie. -v or
 -h ).  Furthermore, this is all in common crossplatform code so I can't
 really #ifdef it for NetWare and fix the problem.  So I'm stuck unless I
 can come up with a better idea.  BTW, I'm open to ideas. :)

 thanks,
 Brad

 Brad Nicholes
 Senior Software Engineer
 Novell, Inc., the leading provider of Net business solutions
 http://www.novell.com

  [EMAIL PROTECTED] Tuesday, July 23, 2002 3:43:03
 PM 
 On Tue, Jul 23, 2002 at 02:03:44PM -0600, Brad Nicholes wrote:
  [...]to be inconsistent.  For example, if I start Apache2 with a -h
 option,
  it displays the help screen and then calls
 destroy_and_exit_process()
  with an exit code of 1.
  [...] Is there any reason why we can't switch the -v, -V,
  -l, -L options to exit with a 1 instead of a 0 like the -h option?

 Yes. The unix philosophy.
 You are absolutely right: it IS inconsistent, and should be fixed.
 But rather than changing all exit codes to 1, I would prefer to see
 all these exit codes being changed to EX_OK:
   #define EX_OK   0   /* successful termination */
 because all of them indicate that the request for information has been
 processed successfully.
 In comparison, on unix, the return code of ls -l is always zero
 if all files could be listed successfully, even if the command
 produced output via stdout.
 If Netware has a problem with anything being displayed, IMHO it is
 Netware's problem to fix it.

 Sorry, I don't want to sound harsh, but also I do not intend to
 pervert the Unix philosophy here.

Martin
 --
 [EMAIL PROTECTED] | Fujitsu Siemens
 Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730  Munich,  Germany





Re: cvs commit: apache-1.3/src/main http_protocol.c

2002-07-09 Thread David Burry

While we are debating the best way to accomplish this Content-Length fix for the next 
release, I kind of need to have it working for me right now in the current released 
version.  Therefore I've implemented this partial fix against 1.3.26 on my system:

root@nueva[391] pwd
/usr/share/src/packages/apache_1.3.26/src/main
root@nueva[392] diff -c http_protocol.original.c http_protocol.c
*** http_protocol.original.cTue Jul  9 13:35:54 2002
--- http_protocol.c Tue Jul  9 12:35:59 2002
***
*** 1991,1997 
  
  r-read_chunked = 1;
  }
! else if (lenp) {
  const char *pos = lenp;
  int conversion_error = 0;
  
--- 1991,1997 
  
  r-read_chunked = 1;
  }
! else if (lenp  *lenp != '\0') {
  const char *pos = lenp;
  int conversion_error = 0;

Admittedly it only opens up empty string (blank) Content-Length values to default to 
0, not white space ones, but I think that's all I really need to get me going for now 
until the next release.  I believe this may be the simple check of *lenp that Roy 
was talking about.  Since I'm brand new to the Apache source code in general, and not 
really a C expert either, any comments or criticisms are welcome regarding this.

Dave



At 11:18 AM 7/9/2002 -0700, Roy T. Fielding wrote:
WTF?  -1   Jim, that code is doing an error check prior to the
strtol.  It is not looking for the start of the number, but
ensuring that the number is non-negative and all digits prior
to calling the library routine.  A simple check of *lenp would
have been sufficient for the blank case.




recent chunked encoding fix -vs- mod_proxy...

2002-07-09 Thread David Burry

I have a situation where I have an external-facing apache server proxying to another 
apache server inside a firewall.  I've updated the proxying one to Apache 1.3.26 so 
that it won't get hacked due to the chunked encoding bug, but I'm not able to upgrade 
the other one behind the firewall for quite some time (a few months since it's 
integrated with another product).  I've been trying to figure out if I'm vulnerable 
externally or not in this situation.

It appears to me that I'm not, because it looks to me like the mod_proxy handler calls 
the same core chunked reading functionality that the rest of Apache uses (i.e. from 
main/http_protocol.c) and that appears to be where all the fixes were made.

However, I thought I'd run this by you good folks here since you're a lot more 
experienced with the Apache code than I am (just 2 days for me so far)

Dave




1.3.26: Content-Length header too strict now

2002-07-08 Thread David Burry

Hi!

I'm having a problem since upgrading from Apache 1.3.23 to 1.3.26.  It appears that 
the Content-Length header field is much more strict in what it will accept from http 
clients than it was before, and this is causing me biiig problems.

A certain http client (which shall remain nameless due to embarrassment) is generating 
a request header like this:

GET /some/file HTTP/1.0
Host: some.place.com
Content-Type:
Last-Modified:
Accept: */*
User-Agent: foo
Content-Length:

Technically this is a very big no-no to have some blank header fields like this, I 
know.  Content-Length, for instance, should either specify 0 (zero) or not be listed 
there at all.  But the client is already out there in millions of users' hands, 
embedded into several popular products (as part of an auto-update sort of 
mechanism).. so ideally, what I'd like to have, is an environment variable 
flag that disables some of the strictness of the Content-Length header field checking, 
to allow this aberrant behavior in some cases without producing errors.  Perhaps this 
can be made part of Apache 1.3.27?

Please let me know what you all think of this idea.

Dave