Re: CGI Script Source Code Disclosure Vulnerability in Apache for Windows
Joshua Slive wrote: notedirectiveScriptAlias/directive is used to strongboth/strong map a URL to a directory strongand/strong mark requests for that URL as pointing to CGI scripts. It should not be used for directories that are already accessible from the web because they are under the directive module=coreDocumentRoot/directive, for example. Instead, you can use: example lt;Directory /usr/local/apache2/htdocs/cgi-dir gt;br / SetHandler cgi-scriptbr / Options ExecCGIbr / lt;/Directorygt; /example/note I like the idea of this documentation addition, plus maybe an explanation about why it is recommended on the security tips page (something about the differences between URLs and paths in the configuration, and the security implications of the difference, using CGI as an example), with a reference to it in the ScriptAlias section. This is important to me because after reading this thread, I've realized I never thought about these particular security hazards of referencing something by their Location or Alias (which is always case sensitive and has different ways of referencing the same characters), vs by their Directory or File (which is case insensitive on some operating systems, and normalizes all those character differences before trying to match). And now I need to go do an audit of my web servers to make sure... Dave
Re: SSL enabled name virtual hosts
Boyle Owen wrote: - You're right that since apache can't see the host header, it uses the cert from the default VH to establish the SSL session. Thereafter, it *can* see the host header and so can route the requests successfully. This give a lot of people the illusion that SSL-NBVH is possible. The big problem is that you don't get authentication because the default cert, generally, will not match the requested site. For professional SSL, authentication is every bit as essential as encryption so this won't do. We use a wildcard cert to overcome this situation... the technical limitation is that all the SSL hosts have to end with the same domain (a wildcard cert is bound to our domain, not any individual host name), but otherwise we can and do indeed run hundreds (soon to be thousands) of customers on their own individual host names under SSL, all on port 443 on one instance of apache. Unfortunately we have to do funny mod_rewrite trickery to simulate NBVH instead of using real NBVH I suspect it would be a major change in Apache architecture to use real NBVH in our case (but otherwise, yes, it absolutely could be technically possible, given the all-must-be-in-the-same-domain, and must use a wildcard cert limitations). Dave
Re: SSL enabled name virtual hosts
Boyle Owen wrote: -Original Message- From: David Burry [mailto:[EMAIL PROTECTED] We use a wildcard cert to overcome this situation... How did you get your wildcard cert? Did you buy it? Apparently, the cert sellers (Thwate, Versign etc.) are not too keen on selling wildcards for the simple reason that it reduces the number of certs required (and hence sold) - a bit like the everlasting light bulb... Was that your experience? Yes, it was bought... So far the experience has been awesome (other than the previously mentioned apache architecture limitation workarounds). There are some cert sellers out there who are getting competitive in this respect, maybe not the original big guys such as the ones you mention yet, but it would be absolutely impossible for us to do what we're doing right now without this kind of cert, it's not feasible to buy a class B block of IP addresses and we couldn't afford that many individual certs unless they were only a couple bucks apiece or less! Dave
Re: Puzzling News
Dependancy on third party modules still prevents us from upgrading from 1.3 to 2.0 or at least makes the the drawbacks of upgrading more than the benefits That's the main holdup for us. For instance, mod_perl (and a few custom scripts that use the API extensively), mod_php (all those non-thread-safe libraries everyone demands nullifies the benefits of multi-threading), mod_dynamo (we're stuck with an older version until an enormous application we wrote on top of it is massively overhauled, by us), mod_webobjects (legacy code here too, should just go away someday, but it keeps working unfortunately), etc... Not all these third party modules are API-stable and/or released as 2.0 (mod_perl is *finally* getting kind of close looks like, RC4, woo hoo), and we'd need to upgrade all our code that relies on them For what? for a minimal amount of footprint improvement? Seriously, in many cases, throwing a little more hardware at 1.3 is a lot cheaper than sinking so many engineers into all that. 1.3 still works pretty well! A very clean fresh install, with no third party modules, and no legacy code support necessary sure, we'd use 2.0! But that's not reality for us yet. We'll migrate eventually, just takes a while. This is very much unlike that old piece of crap (Netscape server) we used before Apache 1.3 several years ago... we were dying with it... 1.3 was so awesome back then, it was our savior! And it still is pretty awesome! Dave
buffer overflow in mod_proxy in 1.3.31?
Has anyone checked this out yet? http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0492 It was reported in cnet news a month or two ago, and my SOX security guys at work have been bugging me about it... I need to tell them either it's a false alarm or it will be fixed soon. Any current status on it? Dave
Re: Removing the Experimental MPMs in 2.2?
Whether labeled experimental or not, it's always been very confusing to me that the release (stable) branch has modules in it that developers know don't work at all and therefore should not ever be attempted to be used by any ordinary user in any way whatsoever... Therefore I agree, stable branch, experimental directory shouldn't be a place for known completely hosed and unusable modules, it should be a place for seems to work fine for me, see how it works for you, but this is pretty new and not necessarily as well tested in production on every platform yet so use at your own risk modules. The broken and not yet finished enough for anyone to ever think about using yet even on an experimental basis modules should only be available in the dev branch, experimental directory at least until someone believes they work for at least some people on some platform. Dave Paul Querna wrote: - leader - perchild - threadpool My personal feeling is to *not* include them in the 2.2 branch at this time.
Re: Reload CRL without re-starting Apache
Perhaps we could use a new module that allows efficient on-the-fly config parameter changes without restarting any processes? Kind of like a config server that you connect to and issue commands that add and remove apache directives, at least most of them if not all of them. There have been a few times when I wished for this feature, but so far haven't taken the time to *write* a module to do this. I know apachectl graceful works, but it's quite a load to do too often. Also there is the benefit that you always have a consistent view of what the configuration is when you use graceful, whereas it could be easy to become confused with on-the-fly changes, so apache administrators beware with such a module! Dave
Re: strip rewritelog functionality per compiler option
I agree entirely, as the documentation says, rewrite rules are voodoo and often very hard to understand what's going on and why a given ruleset isn't working as expected (which is not the same as an error in the errorlog, more of a user error). The inability to trace through what it's doing in the rewritelog would have made many of my past interesting rulesets impossible to create. Taking it out of production sites is a great optimization, but definitely not a good thing for site development. Dave On Fri, 1 Aug 2003 22:55:31 +0100 Thom May [EMAIL PROTECTED] wrote: * Justin Erenkrantz ([EMAIL PROTECTED]) wrote : I'd support removing RewriteLog entirely in 2.1. -1 ; As Mads says, RewriteLog is used for debugging only, not for day-to-day logging. This is why Andre proposed the patch, on the basis that production sites can remove the functionality entirely, but dev sites that need to know what the hell the module is doing can still work it out. Removing RewriteLog entirely would make life a living hell. -Thom
RE: Bug 18388: Set-Cookie header not honored on 304 (Not modified) status
Since there seems to be some disagreement on this, and the RFC doesn't really specify which it is but instead makes a point of leaving it open for discussion, and there are many other popular servers out there that behave differently than apache and therefore some people may come to expect (nay even depend on) behavior different than apache What about just making it configurable in the httpd.conf, with the default value the current way? Why make it very difficult for some people to migrate to apache on such an open-ended issue? If we change the behavior, you are correct in that the possibility exists to get confused about the session id, but in some cases that might even be desireable, for instance, if you're trying to cobble together several different web apps with a common login or something like that without doing an invasive rewrite of the authentication part of each app... It would be nice as an apache administrator to have that option in a case like that. So making it configurable with the default value of the current behavior might make everyone happy Dave -Original Message- From: Graham Leggett [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 03, 2003 11:04 AM To: [EMAIL PROTECTED] Subject: Re: Bug 18388: Set-Cookie header not honored on 304 (Not modified) status Ryan Eberhard wrote: Despite the quote from Roy Fielding, I stand by my claim that Set-Cookie is a response-header and not an entity-header. I would say a cookie is an entity header, in that in its typical use, the cookie value is bound somehow to the page that comes along with it. For example, a cookie might (and often does) contain a session id, while the content of the page is delivered unique to the session. If the cookie is considered a request header, then the possiblity exists for a different session id to be delivered with the original content, which does not necessarily match that session id. Regards, Graham -- - [EMAIL PROTECTED] There's a moon over Bourbon Street tonight...
Re: Removing Server: header
- Original Message - From: Graham Leggett Martin Kutschker wrote: Removing the server header won't hurt. Removing the server header is a protocol viloation, and serves no purpose. How is it a protocol violation? I can't find anywhere in the HTTP 1.1 protocol where it says the server header is required In fact, it says it's encouraged that this field be configurable for security reasons (but doesn't specify if that means only configure the value or possibly configure whether the header exists or not). See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.38 I'll agree that configuring it to not say apache doesn't serve very much useful purpose (since it's only security by obscurity) but neither does our current policy of allowing hiding the apache version number (since that's security by obscurity too). It's one and the same, whether you hide the product name or version number... makes no difference. The only difference I can see is it will make it nearly impossible for people to accurately track the numbers of Apache servers out there in the world, so I guess we keep it for vanity purposes? Sorry if everyone's talked to death about this issue already and sick and tired of it, I don't remember it happening since I joined several months to a year ago. Dave
RE: Removing Server: header
I don't see a good reason not to have a ServerTokens None option... All the ServerTokens options that hide version numbers are security by obscurity anyway So it's not really anything new, just expanding something that already exists to have a more complete compliment of similar options. Dave -Original Message- From: Brass, Phil (ISS Atlanta) [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 26, 2003 12:31 PM To: [EMAIL PROTECTED] Subject: RE: Removing Server: header OK, so given that Date and Last-Modified are required response headers and everybody pretty much hates the idea of removing them, and that removing the Server header amounts to nothing more than security by obscurity, is anybody still interested in seeing a patch that offers a ServerTokens value of None and strictly prevents the addition of the Server: header to the response? If so I'd be happy to do it. Thanks in advance! Phil
Re: Advanced Mass Hosting Module
You and someone else said the same thing. I currently have a setup where we run several hundred vhosts (all individually specified) without issue, I'll have to remember this if it ever grows to thousands. Thanks. With the lack of a more powerful vhost-alias type thing, I'll probably have to vhost-alias all the standard bare bones configs, and list out the anomalies separately Dave - Original Message - From: Mads Toftum [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, March 14, 2003 12:55 AM Subject: Re: Advanced Mass Hosting Module On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote: These are neat ideas. At a few companies I've worked for we already do similar things but we have scripts that generate the httpd.conf files and distribute them out to the web servers and gracefully restart. Adding a new web server machine to the mix is as simple as adding the host name to the distribution script. This only works when you have a limited number of vhosts - if you were to run thousands of vhosts on each machine, then mod_vhost_alias (or mod_rewrite) is currently the only way to go. A module like this could provide a nice compromise between the flexibility of using httpd.conf to specify each vhost and the speed of vhost_alias. vh Mads Toftum -- `Darn it, who spiked my coffee with water?!' - lwall
RE: Advanced Mass Hosting Module
These are neat ideas. At a few companies I've worked for we already do similar things but we have scripts that generate the httpd.conf files and distribute them out to the web servers and gracefully restart. Adding a new web server machine to the mix is as simple as adding the host name to the distribution script. What you're talking about doing sounds like a lot more complexity to achieve a similar thing, and more complexity means there's a lot more that can go wrong. For instance, what are you going to do if the LDAP server is down, are many not-yet-cached virtual hosts just going to fail? In our scenario it's solved simply and easily by the generation script simply failing and nothing being copied (but at least the web servers keep working fine with the last config revision, so not many/any end user web surfers will notice the outage). Dave -Original Message- From: Nathan Ollerenshaw [mailto:[EMAIL PROTECTED] Sent: Thursday, March 13, 2003 3:28 AM To: [EMAIL PROTECTED] Subject: Advanced Mass Hosting Module Resending this to this list as I got no response on users list. Currently, we are using flat config files generated by our website provisioning software to support our mass hosted customers. The reason for doing it this way, and not using the mod_vhost_alias module is because we need to be able to turn on/off CGI, PHP, Java, shtml etc on a per vhost basis. We need the power that having a distinct VirtualHost directive for each site gives you. Is there a better way? What I have in mind is a module that fits in with our current LDAP based infrastructure. Currently, LDAP services our mail users, and I would like to see the Apache mass hosting configuration held in LDAP as well. In this way, we can just scale by adding more apache servers, mounting the shared docroot and pointing them to the LDAP server. The LDAP entry would look something like this: # www.example.com, base dn: uid=www.example.com, o=base siteGidNumber: 10045 siteUidNumber: 10045 objectClass: top objectClass: apacheVhost serverName: www.example.com serverAlias: example.com serverAlias: another.example.com docRoot: /data/web/04/09/example.com/www vhostStatus: enabled phpStatus: enabled shtmlStatus: enabled cgiStatus: enabled dataOutSoftLimit: 100 (in bytes per month) dataOutHardLimit: 1000 dataInSoftLimit: 100 dataInHardLimit: 1000 dataThrottleRate: 100 (in bits/sec) Then, as a request came in, the imaginary mod_advanced_masshosting module would first check to see if it had the information about the domain already cached in memory (to avoid hitting LDAP for every HTTP request, which would be a Bad Idea) and then if not, it would grab the entry from LDAP, cache it, and service the incoming requests. The cache itself would need to be shared among the actual child apache processes somehow. In addition to these features, the module would keep track of the amount of data transferred in out for each vhost and apply a soft/hard limit when the limits defined in the LDAP entry were reached. The amount of actual data transferred would periodically be written to either a GDBM file or even to an LDAP entry (not sure what is best - probably LDAP for consistency) and the data would also need to be shared among any servers in a cluster somehow. This would enable ISPs to bill on a per vhost basis fairly accurately, and limit abusive sites. Now, I've looked around for something like this, and as far as I can see, there isn't anything that does vhosting quite like this, except for the commercial systems out there such as Zeus. Do people think this is a good approach? Will another method give me what I want? (LDAP is not a dependency, just a nice-to-have) Finally, I am thinking about starting an Open Source project to write this module. My C is pretty primitive right now, though I have got simple LDAP lookup code working already (just not in Apache, yet). Would anyone else see this as a worthwhile project for Apache? It certainly would solve our problems, but it sometimes feels like I'm trying to fix a simple problem with something very heavy - though implemented correctly, I don't think performance will be a problem. Comments gratefully received :) Regards, Nathan. -- Nathan Ollerenshaw - Systems Engineer - Shared Hosting ValueCommerce Japan - http://www.valuecommerce.ne.jp If you think nobody cares if you're alive, try missing a couple of car payments.
Re: Proposal: Remove mod_imap from default list
are we talking about removing modules entirely, or just modifying what's enabled by default? Dave - Original Message - From: Justin Erenkrantz [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 09, 2003 8:39 PM Subject: Re: Proposal: Remove mod_imap from default list --On Sunday, March 9, 2003 6:48 PM -0500 Joshua Slive [EMAIL PROTECTED] wrote: 2. If we want to keep our contract with the user about the stable series valid, this change should go into 2.1 only. Otherwise, users doing a configure; make; make install or even a config.status could get a nasty surprise. +1. 2.1 is the right place to tweak module default values. And, it's also the place to remove modules entirely... 1.3 and 2.0 shouldn't change. -- justin
Re: story posted
2.1 was started so that 2.0 can remain stable from here on out instead of changing the 2.0 API with every minor release, requiring everyone to re-port their modules with every minor 2.0 release so the fact that 2.1 exists is a very good sign for 2.0! Dave - Original Message - From: Harrie Hazewinkel [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 11, 2003 2:30 AM Subject: Re: story posted Hi, Following this thread, I was wondering what the majority thinks is a best way forward. Currently, not many modules are ready for Apache 2.0 regardless the reasons. But it seems already that work is done for Apache 2.1 and people have to port their module again. OK, the module API is almost equal, but to improve adaption first 2.0 needs to be an established entity and with 2.1 the will have a bad signal/noise ratio. Harrie -- Author of MOD-SNMP, enabling SNMP management of Apache HTTP server
RE: Strange Behavior of Apache 2.0.43 on SPARC MP system
I am running the server/client on the same machine. You will not get reliable results by doing this. Can you elaborate why? Plus we were forced to do this, but would like to avoid in the future if it really affects our results. Because the client will contend very heavily with the server for many system resources. It's indeterminate which one (client or server) requires more resources, which one wins more, and how much more of which resources. Running both on the same machine will certainly stress the machine pretty well, but you can't compare any measurement you get with what the same machine will perform if Apache doesn't have to contend with a client for its resources, it won't be the same result at all. In the real world apache doesn't have a client stealing its system resources, therefore an accurate test of how apache would behave in the real world can only be done if you set up a test with the same situation. This could be why apache is performing better when you let your client sleep a little (then again, it could be something else, that's why I say it's indeterminate (unknown) how much of the resources the client itself is stealing away from the server). To measure the effect of anything, you have to limit the number of variables that can influence the result. Dave
Re: Graceful shutdown in 2.0
On our systems we just rename that alteoncheck.txt file to alteoncheck_DOWN.txt when we're going to bring a server down (causing a 404 error for the health check, which stops all new requests), it effectively does the same thing you describe without the hassle of writing a handler. And yes it is very nice in that it's easily automated... So, yes, it would be very nice to have a graceful shutdown, but it's not necessarily high priority for those who have some sort of load balancer box (not round robin DNS ;) because there are other relatively simple ways to achieve the same effect... Dave - Original Message - From: Andrew Ho [EMAIL PROTECTED] On a more load balancer specific note, Alteons (and some other load balancers) use the concept of a health check URL. Our Alteons are configured for example to check for a specific URL (for example, the Alteon might do a GET alteoncheck.txt HTTP/1.0 every 2 seconds). I had a plan originally to write a handler that accepts requests for this heartbeat check... on some signal (a particular request? an OS signal?) it would start returning an error for the heartbeat check case, but keep servicing all other requests as normal. Eventually, the Alteon would decide that that machine was bad, and the number of connections would fall to zero; it would then be safe to take the server out of rotation. The benefit of this scenario is that you don't have to touch the load balancer at all to get individual machines in and out of the load balancer. Also, this type of scenario is far more automatable (rather than telnetting into, say, a load balancer console interface and navigating menus, ugh).
Re: Apache q for covalent.net site.
I've seen the exact same problem, and diagnosed it as the same issue. It would be very nice to have the default apache installation handle this properly, to prevent the dumb we-think-we're-smarter-than-you browsers from renaming files... if not by monkeying with the mime types file for pureness reasons, then at least by adding config params in the default httpd.conf. Dave - Original Message - From: William A. Rowe, Jr. [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, February 04, 2003 10:20 AM Subject: Re: Apache q for covalent.net site. An answer to a user question (and a gripe from me since I've hit this problem myself on apache.org); [the links] point to .tar.gz files, which exist. When they are clicked on, however, they are being renamed to .tar.tar files. Any ideas? Renamed by the browser, not the server. Check in mime.types that we have content types for .tar *and* .gz. The default defines only .tar, so we return application/x-tar. If it included .gz, the *final* pathname element determines the mime type, which would be application/x-gzip. Almost makes you wish for application/x-gtar or something. You can only count on gnu tar to support tar -z. Anyways, because the content type is application/x-tar, and the browser sees the *final* extension is .gz, it is choosing to replace .gz with .tar or even adding another .tar (e.g., .tar.gz.tar which I've seen also.) Seems like the ASF needs to choose between removing application/x-tar or adding application/x-gzip in the default mime types. Sure, we have a general policy against adding x- extentions, but by adding one, we open ourselves up to problems. :-) Bill
Re: Graceful shutdown in 2.0
The same effect is already possible by configuring your proxying machine to stop forwarding new requests to that box first Of course, it's possible that different people manage the proxying service vs the back end apache services, so I can see how it can be desireable to have this feature in apache too, but still those two people should always be working pretty closely together anyway... Dave - Original Message - From: Bill Stoddard [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 04, 2003 8:53 AM Subject: Graceful shutdown in 2.0 Has anyone ever thought about the best way to implement graceful shutdown (perhaps with a timeout) in the server? This would be a useful function to have where there is a cluster of http servers (in a DMZ for instance) proxying requests to a farm backend application servers. If you need to take an http server completely offline, you could issue something like apachectl shutdown_graceful [timeout] and prevent that server from accepting any new connections but allow the existing requests to be completed w/o disruption (subject to a timeout). Bill
Re: Graceful shutdown in 2.0
um, but if you're talking about shutting down the proxy itself (i.e. the whole service, cutting off all load balanced machines behind it) that's hardly graceful to begin with so why bother to make it graceful... I assumed you meant just gracefully shutting down one single load balanced machine behind the proxy machine... you can do that already now by a) configuring the proxy machine to stop routing (new) requests to it, b) graceful restart the proxy machine to make the new config go into effect, c) wait till the existing connections to that behind-the-proxy machine are finished with a timeout if necessary (sort of part of the graceful restart process in the proxy machine), then d) shut down the machine behind the proxy, in that order. External users should not notice anything at all in this scenario. - Original Message - From: Bill Stoddard [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 04, 2003 12:25 PM Subject: Re: Graceful shutdown in 2.0 David Burry wrote: The same effect is already possible by configuring your proxying machine to stop forwarding new requests to that box first Yep, that's the idea. In the scenario I'm interested in, Apache httpd -is- the proxy machine! Bill
Re: mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)
Random thoughts: - Did the content have short expiration times (or recent change dates which would result in the cache making agressive expiration estimates). That could churn the cache. No. files literally never change, when updates appear they are always new files, web pages just point to new ones each update. In this application these are all executable downloadable files, think FTP repository over HTTP. - Was CacheMaxStreamingBuffer set appopropriately? (it may not be needed at all if the content length header is included on all replies). C-L is included (SSIs and all similar dynamic stuff is disabled), not sure about CacheMaxStreamingBuffer, I'd need to go check. - Did you try caching open file descriptors? I am rather curious if caching open fds will be useful/practicle on Unix systems. Oh..., but this probably will not help your disk throughput... nevermind. :-D - It's probably worth noting in the doc that -each- child process will cache up to MCacheSize KBytes. If you have 10 child processes, then you need 10xMCacheSize Kbytes RAM available just for the cache (the same files could be cached in each process). I wonder if we should, at startup, allocate MCacheSize KB of shared storage and have mod_mem_cache allocate out of the shared pool. Each child process would have it's own unique reference to the object, but the object itself would only be cached once for all processes to access. I suspected that's where our memory running out was coming from, but it would have been helpful to have confirmation of my suspicion in the docs, yes. The problem is our cache needed to be quite large to cache very many of those large files, and we needed to run a lot of processes due to the mutex contentions with too many threads in one process (see the [PATCH] remove some mutex locks in the worker MPM thread)... so we kind of gave up on mod_mem_cache. This is kind of how this discussion branched off of that thread, sorry I didn't state that clearly earlier. It would be nice if there were some kind of shared cache, shared between processes. With large files like this it only needs to be read once it's primed with the most popular files... and which files are the most popular files doesn't change that often since we only make new releases every couple months... mod_file_cache works ok for this, but we need to develop something that guesses what will be most popular and generates the httpd.conf list and restarts apache before each new release is publicly linked on web pages but after the files are put live, to avoid our servers falling over with each new release... it's quite a pain and quite scary what will happen if those steps aren't followed correctly, that's why I was hoping Apache could manage it automatically with mod_mem_cache. Dave Bill Apache 2.0.43, Solaris 8, Sun E220R, 4 gig memory, gig ethernet. We tried both Sun forte and gcc compilers. The problem was mod_mem_cache was just way too resource intensive when pounding on a machine that hard, trying to see if everything would fit into the cache... cpu/mutexes were very high, especially memory was out of control (we had many very large files, ranging from half dozen to two dozen megs, the most popular of those were what we really wanted cached), and we were running several hundred concurrent connections at once. Maybe a new cache loading/hit/removal algorithm that works better for many hits to very large files would solve it I dunno. We finally settled on listing out some of the most popular files out in the httpd.conf file for mod_file_cache, but that presents a management problem as what's most popular changes. It would have been nicer if apache could auto-sense the most popular files. Also it seems mod_file_cache has a file size limit but at least we got enough in there the disk wasn't bottlenecking anymore... Dave - Original Message - From: Bill Stoddard [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 01, 2003 6:38 AM Subject: RE: [PATCH] remove some mutex locks in the worker MPM it may also have to do with caching we were doing (mod_mem_cache crashed and burned, What version were you running? What was the failure? If you can give me enough info to debug the problem, I'll work on it. Bill
Re: mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)
- Original Message - From: Brian Pane [EMAIL PROTECTED] Sent: Thursday, January 02, 2003 2:19 PM For large files, I'd anticipate that mod_cache wouldn't provide much benefit at all. If you characterize the cost of delivering a file as time_to_stat_and_open_and_close + time_to_transfer_from_memory_to_network mod_mem_cache can help reduce the first term but not the second. For small files, the first term is significant, so it makes sense to try to optimize away the stat/open/close with an in-httpd cache. But for large files, where the second term is much larger than the first, mod_mem_cache doesn't necessarily have an advantage. Unless... of course, you're requesting the same file dozens of times per second (i.e. high hundreds of concurrent downloads per machine, because it takes a few minutes for most people to get the file) then caching it in memory can help, because your disk drive would sit there thrashing otherwise. If you don't have gig ethernet don't even worry you won't see the problem really, ethernet will be your bottleneck. What we're trying to do is get close to maxing out a gig ethernet with these large files without the machine dying... And it has at least three disadvantages that I can think of: 1. With mod_mem_cache, you can't use sendfile(2) to send the content. If your kernel does zero-copy on sendfile but not on writev, it could be faster to deliver a file instead of a cached copy. Memory is always faster than a spinning disk. It should be possible to make a memory cache that's faster than the disk, and uses the same amount of space as disk too. 2. And as long as mod_mem_cache maintains a separate cache per worker process, it will use memory less efficiently than the filesystem cache. yes, that is definitely a problem. good that mod_file_cache does not have this problem, but it has other file list maintainability problems. 3. On a cache miss, mod_mem_cache needs to read the file in order to cache it. By default, it uses mmap/munmap to do this. We've seen mutex contention problems in munmap on high-volume Solaris servers. sounds familiar... What sort of results do you get if you bypass mod_cache and just rely on the Unix filesystem cache to keep large files in memory? Not sure how to configure that so that it will use a few hundred megs to cache often-accessed large files... but I could ask around here to more solaris-knowledgable people... Dave
Re: mod_mem_cache bad for large/busy files (Was: [PATCH] removesome mutex locks in the worker MPM)
interesting... so then why did using mod_file_cache to specify caching a couple dozen known-most-often-accessed files decrease disk io significantly? I'll try the test you mention next time I get a chance. Dave - Original Message - From: Brian Pane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 02, 2003 9:43 PM Subject: Re: mod_mem_cache bad for large/busy files (Was: [PATCH] removesome mutex locks in the worker MPM) On Thu, 2003-01-02 at 21:21, David Burry wrote: - Original Message - From: Brian Pane [EMAIL PROTECTED] Sent: Thursday, January 02, 2003 2:19 PM For large files, I'd anticipate that mod_cache wouldn't provide much benefit at all. If you characterize the cost of delivering a file as time_to_stat_and_open_and_close + time_to_transfer_from_memory_to_network mod_mem_cache can help reduce the first term but not the second. For small files, the first term is significant, so it makes sense to try to optimize away the stat/open/close with an in-httpd cache. But for large files, where the second term is much larger than the first, mod_mem_cache doesn't necessarily have an advantage. Unless... of course, you're requesting the same file dozens of times per second (i.e. high hundreds of concurrent downloads per machine, because it takes a few minutes for most people to get the file) then caching it in memory can help, because your disk drive would sit there thrashing otherwise. If you don't have gig ethernet don't even worry you won't see the problem really, ethernet will be your bottleneck. What we're trying to do is get close to maxing out a gig ethernet with these large files without the machine dying... Definitely, caching the file in memory will help in this scenario. But that's happening already; the filesystem cache is sitting between the httpd and the disk, so you're getting the benefits of block caching for oft-used files by default. What sort of results do you get if you bypass mod_cache and just rely on the Unix filesystem cache to keep large files in memory? Not sure how to configure that so that it will use a few hundred megs to cache often-accessed large files... but I could ask around here to more solaris-knowledgable people... In my experience with Solaris, the OS is pretty proactive about using all available memory for the filesystem cache by default. One low-tech way you could check is: - Reboot - Run something to monitor free memory (top works fine) - Run something to read a bunch of your large files (e.g., cksum [file]). In the third step, you should see the free memory decrease by roughly the total size of the files you've read. Brian
mod_mem_cache bad for large/busy files (Was: [PATCH] remove some mutex locks in the worker MPM)
Apache 2.0.43, Solaris 8, Sun E220R, 4 gig memory, gig ethernet. We tried both Sun forte and gcc compilers. The problem was mod_mem_cache was just way too resource intensive when pounding on a machine that hard, trying to see if everything would fit into the cache... cpu/mutexes were very high, especially memory was out of control (we had many very large files, ranging from half dozen to two dozen megs, the most popular of those were what we really wanted cached), and we were running several hundred concurrent connections at once. Maybe a new cache loading/hit/removal algorithm that works better for many hits to very large files would solve it I dunno. We finally settled on listing out some of the most popular files out in the httpd.conf file for mod_file_cache, but that presents a management problem as what's most popular changes. It would have been nicer if apache could auto-sense the most popular files. Also it seems mod_file_cache has a file size limit but at least we got enough in there the disk wasn't bottlenecking anymore... Dave - Original Message - From: Bill Stoddard [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 01, 2003 6:38 AM Subject: RE: [PATCH] remove some mutex locks in the worker MPM it may also have to do with caching we were doing (mod_mem_cache crashed and burned, What version were you running? What was the failure? If you can give me enough info to debug the problem, I'll work on it. Bill
Re: [PATCH] remove some mutex locks in the worker MPM
Oh, I should have mentioned, our mutex issues lessened a lot when we made more processes with fewer threads each, but that kind of started defeating the purpose of using the worker mpm after a while... your optimizations sound like they may help fix this issue.. thanks again. Dave - Original Message - From: David Burry [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 31, 2002 5:54 PM Subject: Re: [PATCH] remove some mutex locks in the worker MPM Ohh this sounds like an awesome optimization... I noticed mutex contentions were extremely high on a very high traffic machine (say.. high enough to get close to maxing out a gig ethernet card) using the worker mpm on solaris 8... it may also have to do with caching we were doing (mod_mem_cache crashed and burned, we had to use mod_file_cache to get it to work but it was still quite the exercise). Dave - Original Message - From: Brian Pane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 31, 2002 5:30 PM Subject: [PATCH] remove some mutex locks in the worker MPM I'm working on replacing some mutex locks with atomic-compare-and-swap based algorithms in the worker MPM, in order to get better concurrency and lower overhead. Here's the first change: take the pool recycling code out of the mutex-protected critical region in the queue_info code. Comments welcome... Next on my list is the code that synchronizes the idle worker count. I think I can eliminate the need to lock a mutex except in the special case where all the workers are busy. Brian
2.0.44 release?
What does everyone think about releasing 2.0.44 soon? My company kind of needs the logio fixes but we're hesitant to run our own special patched version of 2.0.43 when we have so much riding on this project... Dave
dynamically change max client value
Recently there has been a little discussion about an API in apache for controlling starts, stops, restarts, etc... I have an idea that may help me solve a problem I've been having. The problem is in limiting the number of processes that will run on a machine to somewhere below where the machine will keel over and die, while still being close to the maximum the machine will handle. The issue is depending on what the majority of those processes are doing it changes the maximum number a given machine can handle by a few orders of magnitude, so a multi-purpose machine that serves, say, static content and cgi scripts (or other things that vary greatly in machine resource usage) cannot be properly tuned for maximum performance while guaranteeing the machine won't die under heavy load. The solution I've thought of is... what if Apache had an API that could be used to say no more processes, whatever you have NOW is the max! or otherwise to dynamically raise or lower the max number (perhaps oh there's too many, reduce a bit) You see, an external monitoring system could monitor cpu and memory and whatnot and dynamically adjust apache depending on what it's doing. This kind of system could really increase the stability of any large Apache server farm, and help keep large traffic spikes from killing apache so bad that nobody gets served anything at all. In fact this idea could be extended someday to dynamically change all sorts of apache configuration things, but all I really need that I know of right now is the max client value... What do you all think of this idea? Does this already exist perhaps? Dave
Re: dynamically change max client value
I realize that allowing _everything_ to be dynamically configured via SNMP (or signal or something) would probably be too substantial of an API change to be considered for the current code base, but it would be nice to consider it for some future major revision of Apache And it would be more than just nice if at least the max client value thing could be somehow worked into the current versions of Apache... There is a current very real and very large problem that could be solved by this, not just a nice to have feature. This is what I meant to emphasize in my original email... Dave - Original Message - From: Dirk-Willem van Gulik [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, November 04, 2002 9:35 AM Subject: Re: dynamically change max client value In my ideal world every config directive would be able to advertize or register an optional 'has changed' hook. Which, if present, would be called in context whenever a value is somehow updated (through snmp, a configd, signal, wathever). If there is no such hook; the old -update- on graceful restart is the default (though it sure would be nice to have some values also advertize that they need a full shutdown and restart). Of course one could also argue for not just a put but also for a 'get' interface in context :-) Dw On Mon, 4 Nov 2002, David Burry wrote: Recently there has been a little discussion about an API in apache for controlling starts, stops, restarts, etc... I have an idea that may help me solve a problem I've been having. The problem is in limiting the number of processes that will run on a machine to somewhere below where the machine will keel over and die, while still being close to the maximum the machine will handle. The issue is depending on what the majority of those processes are doing it changes the maximum number a given machine can handle by a few orders of magnitude, so a multi-purpose machine that serves, say, static content and cgi scripts (or other things that vary greatly in machine resource usage) cannot be properly tuned for maximum performance while guaranteeing the machine won't die under heavy load. The solution I've thought of is... what if Apache had an API that could be used to say no more processes, whatever you have NOW is the max! or otherwise to dynamically raise or lower the max number (perhaps oh there's too many, reduce a bit) You see, an external monitoring system could monitor cpu and memory and whatnot and dynamically adjust apache depending on what it's doing. This kind of system could really increase the stability of any large Apache server farm, and help keep large traffic spikes from killing apache so bad that nobody gets served anything at all. In fact this idea could be extended someday to dynamically change all sorts of apache configuration things, but all I really need that I know of right now is the max client value... What do you all think of this idea? Does this already exist perhaps? Dave
Re: dynamically change max client value
Interesting comments, thanks. You obviously speak from experience. The idea I was having is that no matter how overloaded a machine becomes, it should never run so far out of resources that it dies, but there should be some kind of limit in place... I thought this was what MaxClients was for, however it varies too greatly what the optimum value is depending on what's getting hit so I don't know what to do about it... Your idea of a reverse proxy in front (or module) turning away or redirecting some requests in a very light fashion deserves merit... I know that Excite.com used to have a very light static home page they'd slap up during peak loads for this very reason, I'll investigate something like that for the solution to my problem instead of what I originally suggested, thanks. Dave At 12:01 PM 11/4/2002 -0800, Scott Hess wrote: Based on my experience, this wouldn't be a high-quality solution, it would be a hack. I've seen very few cases where load spiked enough to be an issue, but was transient enough that a solution like this would work - and in those cases, plain old Unix multitasking normally suffices. What happens if you implement the solution anyhow is that you get a bunch of users stuck in the ListenBacklog. So they'll wait a couple minutes before their request even starts. If you have a deep backlog, requests just pile up so that the machine never gets its head above water. In the worst case, clients will timeout while their request is in the backlog, but since you don't find that out until you send a response which writes out to the network, you can very easily do work that can never be delivered. Beyond all that, the user experience simply _sucks_. [Yes, I've done what you suggest, just not using the implementation you suggest. It's integrated into an existing custom module, you could also probably do it with a reverse proxy. In the end, it was not a productive solution.] What I think you really want is a module that will intercept all requests, and send back The server is really busy, try again in five minutes if the server is too busy by some measure. You generally want this to be a super-low-cost option, so that you can spin through requests very quickly. Optimally, no externally-blockable pieces (no database connections, no locking filesystem access, etc). One relatively simple option might be to use a Squid, and an URL redirector which implements the magic check. If the machine is not busy, send through to the real server, if the machine is busy, redirect to an URL which will deliver your message. [Again, yes, I've done this in Apache1.3, but in code targetted to our custom modules. You could certainly do it more generically, I just haven't had the need. You might check mod_backhand.] Later, scott On Mon, 4 Nov 2002, David Burry wrote: I realize that allowing _everything_ to be dynamically configured via SNMP (or signal or something) would probably be too substantial of an API change to be considered for the current code base, but it would be nice to consider it for some future major revision of Apache And it would be more than just nice if at least the max client value thing could be somehow worked into the current versions of Apache... There is a current very real and very large problem that could be solved by this, not just a nice to have feature. This is what I meant to emphasize in my original email... Dave - Original Message - From: Dirk-Willem van Gulik [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, November 04, 2002 9:35 AM Subject: Re: dynamically change max client value In my ideal world every config directive would be able to advertize or register an optional 'has changed' hook. Which, if present, would be called in context whenever a value is somehow updated (through snmp, a configd, signal, wathever). If there is no such hook; the old -update- on graceful restart is the default (though it sure would be nice to have some values also advertize that they need a full shutdown and restart). Of course one could also argue for not just a put but also for a 'get' interface in context :-) Dw On Mon, 4 Nov 2002, David Burry wrote: Recently there has been a little discussion about an API in apache for controlling starts, stops, restarts, etc... I have an idea that may help me solve a problem I've been having. The problem is in limiting the number of processes that will run on a machine to somewhere below where the machine will keel over and die, while still being close to the maximum the machine will handle. The issue is depending on what the majority of those processes are doing it changes the maximum number a given machine can handle by a few orders of magnitude, so a multi-purpose machine that serves, say, static content and cgi scripts (or other things that vary greatly in machine resource usage) cannot be properly tuned
Re: new download page
Awesome script... I hadn't thought of doing it this way, this is better than what I was thinking.. it seems to address everyone's concerns too in the best way that's still within our resources. Dave - Original Message - From: Justin Erenkrantz [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, October 27, 2002 1:50 PM Subject: Re: new download page --On Sunday, October 27, 2002 9:39 AM -0800 Justin Erenkrantz [EMAIL PROTECTED] wrote: I'm trying to write it up now. I'm also cleaning up closer.cgi while I'm at it. -- justin Well, that took *way* longer than I wanted to. Anyway, a rough sketch of what I'm thinking of is here: http://www.apache.org/dyn/mirrors/httpd.cgi And, to prove that this new system isn't any worse than the old one: http://www.apache.org/dyn/mirrors/list.cgi This is a python-based CGI script that uses Greg Stein's EZT library (much kudos to Greg for this awesome tool). It allows for the separation of the layout from the mirroring data. Therefore, it makes it really easy to do the above with only one CGI script (httpd.cgi and list.cgi are symlinked to the same file) that has multiple 'views' and templates. We would probably have to work a bit on the layout and flesh it out some, but this is the idea that I had. Source at: http://www.apache.org/~jerenkrantz/mirrors.tar.gz If I could run CGI scripts from my home dir, I wouldn't have stuck this in www.apache.org's docroot, but CGI scripts are not allowed from user directories. ISTR mentioning this before and getting no response from Greg or Jeff. -- justin
Re: new download page
Excellent little utility... however closer network-wise is often significantly different than closer geographically, for instance California is likely a lot closer to Peru than Chile is (as an extreme example), if you go by the packets fly instead of by the crow flies... Also when a closer server is overloaded you will get a download quicker from a more distant server (regardless of how you define closer). So a good balancing algorithm really shouldn't care about geographic distance but traceroute hops and ping times and server loads... Dave - Original Message - From: Joshua Slive [EMAIL PROTECTED] See: http://maxmind.com/geoip/ If someone wants a little project, it shouldn't be too hard to integrate this into the existing closer.cgi script. Joshua.
Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002
Is it possible to get some of the fixes to mod_logio committed? Wouldn't everyone agree that the current logging of the outgoing bytes is incorrect behavior? Currently it logs the full file size (plus headers) even if it gets cut off in the middle, instead of the actual number of bytes sent. I've seen several patches to fix this but very little comment on it... I've seen lots of comments that it can't be done without major rearchitecting, but Bojan seems to have done it without that by breaking pipelining, am I correct? I also wish that %b would be fixed in a similar manner but I haven't seen any patches for that (or comments about it). Wouldn't everyone agree that it too should log actual bytes sent not just the full file size every time? Apache 2.0 should do everything that 1.3 did, so this logging issue really should be considered a bug, right? Since I depend on correct outgoing byte count logging to see how many people sucessfully download files, I can live with broken pipelining for now in 2.0, currently I've had to roll back to Apache 1.3 and put in 3 times as many machines (12 instead of 4) I'd really like to return those 8 borrowed machines someday and be able to upgrade to 2.0... but can't do that in the current broken log state. Dave
Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002
- Original Message - From: Bojan Smojver [EMAIL PROTECTED] On Fri, 2002-10-25 at 03:31, David Burry wrote: Is it possible to get some of the fixes to mod_logio committed? Wouldn't everyone agree that the current logging of the outgoing bytes is incorrect behavior? Currently it logs the full file size (plus headers) even if it gets cut off in the middle, instead of the actual number of bytes sent. I've seen several patches to fix this but very little comment on it... I've seen lots of comments that it can't be done without major rearchitecting, but Bojan seems to have done it without that by breaking pipelining, am I correct? Actually, the last patch I sent contains one snag I'm still working on. It breaks the core's connection configuration structure, which gets attached to c-conn_config. However, I think I can get around that by using an optional function. As the matter of fact, I'm working on it right now. I see.. ok, I'll keep waiting patiently... Since I depend on correct outgoing byte count logging to see how many people sucessfully download files, I can live with broken pipelining for now in 2.0, currently I've had to roll back to Apache 1.3 and put in 3 times as many machines (12 instead of 4) I'd really like to return those 8 borrowed machines someday and be able to upgrade to 2.0... but can't do that in the current broken log state. Glad to hear Apache 2.0 makes a huge performance difference. Not so glad to hear you had to resort to going back to 1.3. The only thing I can promise is a patch using optional function (this should guarantee compatibility of core between 43 and 44 and no MMN bump) during the day (Sydney time). It's up to the committers to review and, if they like, commit. The memory savings is quite significant, and I'll admit that the 8 extra machines are smaller than the original 4 so it's not exactly 3 times better cpu-wise... the memory caching is where the largest savings comes with disk IO, we have a very high traffic site--usually 3 terabytes transferred per day, last few days have been more like 5 terabytes due to a new release. PS. By the number of messages on the list I'm guessing committers must be rather busy on their real jobs these days. Unfortunately there is no way of speeding things up, given this is volunteer effort. Unless, of course, you decide to bribe some of them ;-) Not exactly the same thing as bribing a commit, but this could get similar results: My manager's manager is actually not opposed to hiring a contractor to fix this... anyone want a temporary job? I don't know if this is the right place to say such things, tell me if there's a better place. When you've got millions of dollars worth of sales depending on an open source project, throwing a little at the open source project isn't such a big deal... I'd gladly do it myself with my company's blessing (on the clock, not volunteer) but I'm not a very experienced C programmer yet, more of a Perl hacker and applications architect so far. This little paragraph had better not get me too flooded with resumes or flames or I'm going to feel dumb, whatever you do don't spam this list with personal replies! ;o) Dave
Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002
Excellent! I will perform some tests with that when I get a chance! You managed to get it working without breaking pipelining even? That's awesome! Not meaning to belittle Bojan's hard work, but for my purposes mod_logio values are not as good as %b would be if %b worked properly... what I ideally need is the byte sent count without the headers... using Bojan's module I can get approximate results but they will be a hair off because they include headers... My main purpose is to detect if and when several meg files have been downloaded all the way vs. if they were cut off in the middle, including if a given user uses some byte-ranging download manager that lets you pause and restart... We also use it for chargebacks to the various departments for bandwidth usage (in this case mod_logio would of course be more accurate than %b though)... We actually had to fudge some of our statistics (duplicated nearby days' data with similar overall throughputs) due to us not catching the problem with Apache 2.0 soon enough... Dave At 09:15 AM 10/25/2002 +1000, Bojan Smojver wrote: On Fri, 2002-10-25 at 07:42, David Burry wrote: I see.. ok, I'll keep waiting patiently... The patch for 2.0.43 is here: ftp://ftp.rexursive.com/pub/apache/counting_io_flush-2.0.43.patch You need to apply mod_logio patch for 2.0.43 first. Bojan
Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002
At 09:38 PM 10/24/2002 -0400, Glenn wrote: Have you looked at the %...X directive in Apache2? That's an interesting idea I hadn't thought of... it doesn't solve the chargeback issue but it's worth investigating for detecting successful downloads... Dave
Re: [STATUS] (httpd-2.0) Wed Oct 23 23:45:19 EDT 2002
At 08:45 PM 10/24/2002 -0500, William A. Rowe, Jr. wrote: At 08:40 PM 10/24/2002, Bojan Smojver wrote: Quoting David Burry [EMAIL PROTECTED]: Excellent! I will perform some tests with that when I get a chance! You managed to get it working without breaking pipelining even? That's awesome! That's what I *think*, which has been known to deviate from the truth, from time to time. However, I appreciate all input, especially results of the actual tests. I recall you had tested a ton of 'little files' pipelined. What might be more interesting is a 100MB download (over a fast pipe) which is entirely 'sendfile'd out. Apache would consider itself done with the request long before it was finished with the connection. In case someone else wants to independently verify it... The exact test I was doing was with a 70+ meg .tar.gz file both over a 100mbit ethernet and a 1.5mbit DSL, starting and canceling it multiple times in Windoze Internet Explorer 5 or 6 (which appears to effectively use byte range requests for subsequent tries, by the way) and monitoring what was logged each time. This test isn't super precise on the byte count (I did not bother to go comb my IE cache) but it sure is obvious when it consistently logs the whole file size and I only received a small fraction according to the IE progress bar... Also looking at the byte range requests with %{Range}i makes it obvious how much IE received previously on each subsequent try (IE appears to only request the part of the file it hasn't cached yet). I was thinking of writing a script that did this in a more automated fashion... perhaps contributing a test to the apache test suite when I figure that thing out... ;o) Dave
Re: Counting of I/O bytes
I know that I'm ultimately wanting to log bytes per request pretty accurately to add up and tell if the whole file was transferred or not even if it was broken up into several byte range requests. I have given up on the possibility of this happening in 2.0 for a long time to come due to the architecture changes required to make this happen, we're back on 1.3 with three times as many servers because of the lack of threading... :-( Dave - Original Message - From: Bojan Smojver [EMAIL PROTECTED] Is something like this possible? If not, I think we should be pretty much OK as the whole point of mod_logio is to log the traffic, most likely per virtual host.
Re: url finishing by / are declined by cache
- Original Message - From: Thomas Eibner [EMAIL PROTECTED] On Wed, Oct 16, 2002 at 11:20:07AM -0400, Bill Stoddard wrote: Hi, Why url finishing by / are not cacheable ? muddled thinking ? :-) I was unsure how to handle caching default index pages but I see no reason why we can't just delete this check. Not for negotiation reasons? (I seem to remember discussions where that was brought up). But any file could be negotiated the same way, not only files that end with / Negotiated pages need to inform caches that they are un-cacheable by setting the correct headers to inform everyone of that, that would take care of this case as well. Dave
Re: apache test suite?
sorry folks, I knew it would be that easy, I should have looked more closely at the web site too ;o) Dave - Original Message - From: Cliff Woolley [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, October 12, 2002 9:50 PM Subject: Re: apache test suite? On Sat, 12 Oct 2002, David Burry wrote: Has anyone worked on an Apache test suite? You know, like how many things have a make test that runs all sorts of tests... or perhaps a separate package that runs tests... I might be interested in starting one but would rather build upon other's work if some of it has already been done... See the httpd-test repository. :)
apache test suite?
Has anyone worked on an Apache test suite? You know, like how many things have a make test that runs all sorts of tests... or perhaps a separate package that runs tests... I might be interested in starting one but would rather build upon other's work if some of it has already been done... Dave
Re: apache 2.0.43: %b not showing bytes sent but bytes requested
ok serious problems still... I've sucessfully installed mod_logio (finally), and %O is STILL not logging the actual bytes transferred, but the bytes apache is INTENDING to eventually transfer... if I stop it in the middle by canceling or disconnecting, this number is horribly too big Of course it's a couple bytes larger than %b because of the headers included, but this is still totally useless for the purpose of figuring out if/when/whether the whole file was actually downloaded. Anyone have any ideas why this is so and how to fix it? Is the logging happening completely before the request is finished, could that be the problem? This should also be a concern for anyone who's using mod_logio to charge for bandwidth, because customers should be concerned about some serious overcharging going on here! I had to install libtool 1.4 and m4 1.4 and this was my configure line: ./configure --prefix=/usr/local/apache_2.0.43+logio --enable-logio --enable- usertrack --enable-file-cache --enable-cache --enable-mem-cache --enable-sta tic-support --disable-cgid --disable-cgi --enable-rewrite --with-mpm=worker --disable-userdir Dave - Original Message - From: David Burry [EMAIL PROTECTED] Yes, I've been thinking of experimenting with mod_logio, but I'm a bit hesitant to hack out a patch from (or use whole-hog) CVS HEAD into a production site that gets 3 terabytes of traffic per day and I'm embarassed to admit how much revenue depends on... ;o) Thanks for the link, I'll try that. It won't be as accurate as getting the byte count without the headers, but at least it should be something better than nothing if it works as described... If we're not going to fix %s shouldn't we at least fix the documentation to be more accurate? 2.0 and 1.3 really are quite different here. Dave - Original Message - From: Bojan Smojver [EMAIL PROTECTED] Have you tried using mod_logio? If won't only give you the body bytes, but also the headers bytes. It reports the number of input bytes too and should understand encryption and compression. You can either check it out from Apache CVS (HEAD), or download the patch for 2.0.43 here: http://www.rexursive.com/software.html. You'd use it with %I (for input) and %O (for output). It would be interesting to know if it reports accurately in the case you described...
Re: apache 2.0.43: %b not showing bytes sent but bytes requested
Being off by a little is better than being off by the whole thing... My tests show that was Apache 1.3 behavior... At least that way the value is close, so if you're charging for bandwidth you're not overcharging so much, and you can still tell if the whole file got there or not. Dave - Original Message - From: Bennett, Tony - CNF [EMAIL PROTECTED] I believe the %b gets its value from the request_rec's bytes_sent member. This field is filled in by the content_length output filter. It is not filled in during the actual send operation. Even if it was filled in following each send, that doesn't mean it was received by the client... in the event of a disconnect, I believe the bytes_sent will always be off. -tony
apache 2.0.43: %b not showing bytes sent but bytes requested
The documentation for Apache 2.0.43 for mod_log_config states: %...b: Bytes sent, excluding HTTP headers. In CLF format i.e. a '-' rather than a 0 when no bytes are sent. However, in testing I clearly see it's logging the number of bytes _requested_ (that is, that apache intended to send)!!! not the actual bytes _sent_!!! If a user presses the cancel button on their browser or they're cut off in the middle, this number is not accurate at all, because it appears the entire file was sent when it was not. We're running a site that serves many large files (dozen megs or so typically) for download. It depends on this bytes sent number for statistics and monitoring to see if and when a download has completed including with a 206 byte range response... Typical throughput is 600 mbit/sec, 3 terabytes/day, running on Solaris 8 on 4 Sun E280R's with 4 gig of ram each... We're seriously considering rolling back to old hardware with Apache 1.3.x (which seems to log actual bytes sent by the way) and CacheFlow machines because of this issue... Is there a patch out for this problem instead? or is someone already working on this problem? or does anyone have an idea where the root of this problem is and I might take a stab at patching myself? As a side note, we tried but we were unable to use mod_mem_cache on this setup we suspect due to mutex issues, might be possible if we spread this out over more machines (but what's the point of memory caching if you have to do that), mod_file_cache works ok though for a static list of the most-often-used files, though mmap does have its limitations on how large of a file can be stuck into memory... We have sendfile installed and I configured with the following: ./configure --prefix=/usr/local/apache_2.0.43 --enable-usertrack --enable-file-cache --enable-cache --enable-mem-cache --enable-static-support --disable-cgid --disable-cgi --enable-rewrite --with-mpm=worker --disable-userdir Dave
Re: mod_blanks
I agree that mod_gzip does a lot better job as far as compression goes, and it doesn't even use more cpu likely. However, it's still important to remove HTML and JavaScript comments sometimes for security reasons, but I suspect this could probably be better done as part of the publishing process, not on the fly as pages are served. (even gzip compression could be done this way actually, come to think of it) Dave - Original Message - From: Peter J. Cranstone [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, September 26, 2002 12:54 PM Subject: RE: mod_blanks Fabio, Mod_gzip for Apache is a better solution. Prior to it's release both Kevin and I looked at what we call poor man's compression. I.e. just removing the blank spaces, lines and other garbage in a served page. Here was what we learned. No one was interested. It didn't save much on the overall page, and people really don't like their HTML etc being messed with. Also it's easier if you are going to spend the CPU cycles to simply use gzip compression to squeeze the page by upwards of 80%+ and save all the formatting to the Author's HTML Mod_gzip already saves a ton of bandwidth and with a current browser there is no need to install a client side decoder. Regards, Peter J. Cranstone -Original Message- From: fabio rohrich [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 26, 2002 6:38 AM To: [EMAIL PROTECTED] Subject: mod_blanks I'm going to develop this topic for thesis. Has anybody of you any suggest for it? Something to addin the development (like compression of the string ) or some feature to implement! And, the last thing, what do you think about it? Thanks a lot, Fabio - mod_blanks: a module for the Apache web server which would on-the-fly remove unnecessary blank space, comments and other non-interesting things from the served page. Skills needed: the C langugae, a bit of text parsing techniques, HTML, learn Apache API. Complexity: low to moderate (after learning the API). Usefulness: moderate to low (but maybe better than that, it's a kind of nice toy topic that could be shown to save a lot of bandwith on the Internet :-). __ Mio Yahoo!: personalizza Yahoo! come piace a te http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/
Re: [PATCH] add simple ${ENV} substitution during config file read
This may be confusing because people may begin to expect it to do the substitution at request time in certain cases instead of only at server startup time Admittedly that would be almost like turning every directive into mod_rewrite, but... an env var is an env var, many things are handled at request time so Dave At 03:55 AM 9/26/2002 +0200, Dirk-Willem van Gulik wrote: On Thu, 26 Sep 2002, [ISO-8859-1] André Malo wrote: I'm note sure, but I'd guess this may cause conflicts with mod_rewrite. Mod rewrite uses % rather than $ for variable names. It does use $1, $2.. for back references. Which is not a problem as it is not followed by a {. It also uses the dollar for the end of string match. Which thus is thus very unlikely to be followed by a { too. But... It also uses the $ for ${mapname:key|default} constructs - which may cause an issue now. We can make sure that such is ignored; or amend the patch to allow a '\' or extra $ to escape either the $ or the {. Otherwise...hmm, the feature probably leads to some weird effects, if you forget to set or to remove some env variables... Aye - this is all about giving a competent admin enough rope to hang him/herself - while making sure that a normal user (who is not having any ${} constructs in his/her file is unaffected.). Dw
ForceLanguagePriority in Apache 1.3
By the way, I love the idea of backporting the Apache 2.0 ForceLanguagePriority into Apache 1.3... This directive completely solves a looot of problems I've been having with stupid non-standards-conformant IE and content negotation. Many thanks to whoever posted the patch. If only it were possible that this could be included with 1.3.27 I'd be one very happy camper! ;o) Dave
Re: 2.0.40 (UNIX), mysterious SSL connections to self
You may want to try actually grabbing a couple-byte monitor page in your case from the load balancers and parse it and look for a special string inside it, that's what we do and that works well. This method not only tests if a connection is being opened, but more thoroughly tests server internals (i.e. Tomcat if you're using mod_jk for instance). And since they're all valid requests we can just do a simple containerSetEnv var/container or SetEnvIf var and CustomLog !var to filter out the noise, that way no Perl overhead. BTW, we're doing our health checks every 5 seconds not every 1 second. Dave At 10:31 PM 8/29/2002 -0400, [EMAIL PROTECTED] wrote: My solution to the complaints was to use piped error logs and have a simple Perl script as the first script in a pipeline. The Perl script's only job was to remove those error messages and then pass the log line on to cronolog. The reason for taking such a measure was because the server farm was behind a pair of commercial load balancers which made TCP connections to port 80 and port 443 ** every second ** to health check that the servers were alive and accepting connections within a reasonable response time. Then, it shut down the connection, without attempting any SSL negotiation. So every second, every server was logging two SSL failure messages (from each of the redundant load balancers). Talk about noise in the logs! It would be Real Nice (tm) if these sort of SSL error messages weren't reported unless some sort of data was exchanged above the connection level. Only in those cases would the SSL error message be correct that SSL negotiation failed, as opposed to not even having started. -Glenn
Re: Command line argument inconsistency...
What about using something else external to apache to detect output under netware? Is it possible to pipe the output to something else that detects if there is any, and holds the window open only if there is? Wouldn't that solve the problem in a more consistent way than altering the return codes? In my opinion, return codes should just be consistent as to whether there is an error or not (and perhaps if so, what kind of error), and not indicate whether there's output or not, since we could simply check the output itself for that info. Dave - Original Message - From: Brad Nicholes [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, July 23, 2002 3:07 PM Subject: Re: Command line argument inconsistency... I understand and that is the reason why I asked. As far as the return value is concerned on NetWare, it doesn't matter. But I need some indication that the screen needs to stay open so that the user can see what happened. If an error occurs and an error message is printed to the screen, the exit code will most likely not be 0, so we are OK. And I agree that returning an error code on a normal exit such as with -h or -v command line arguments would also be inconsistent. But I have no way of telling the difference between a normal exit (ie. Apache exited normally on a shutdown) or a normal exit with messages (ie. -v or -h ). Furthermore, this is all in common crossplatform code so I can't really #ifdef it for NetWare and fix the problem. So I'm stuck unless I can come up with a better idea. BTW, I'm open to ideas. :) thanks, Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com [EMAIL PROTECTED] Tuesday, July 23, 2002 3:43:03 PM On Tue, Jul 23, 2002 at 02:03:44PM -0600, Brad Nicholes wrote: [...]to be inconsistent. For example, if I start Apache2 with a -h option, it displays the help screen and then calls destroy_and_exit_process() with an exit code of 1. [...] Is there any reason why we can't switch the -v, -V, -l, -L options to exit with a 1 instead of a 0 like the -h option? Yes. The unix philosophy. You are absolutely right: it IS inconsistent, and should be fixed. But rather than changing all exit codes to 1, I would prefer to see all these exit codes being changed to EX_OK: #define EX_OK 0 /* successful termination */ because all of them indicate that the request for information has been processed successfully. In comparison, on unix, the return code of ls -l is always zero if all files could be listed successfully, even if the command produced output via stdout. If Netware has a problem with anything being displayed, IMHO it is Netware's problem to fix it. Sorry, I don't want to sound harsh, but also I do not intend to pervert the Unix philosophy here. Martin -- [EMAIL PROTECTED] | Fujitsu Siemens Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730 Munich, Germany
Re: cvs commit: apache-1.3/src/main http_protocol.c
While we are debating the best way to accomplish this Content-Length fix for the next release, I kind of need to have it working for me right now in the current released version. Therefore I've implemented this partial fix against 1.3.26 on my system: root@nueva[391] pwd /usr/share/src/packages/apache_1.3.26/src/main root@nueva[392] diff -c http_protocol.original.c http_protocol.c *** http_protocol.original.cTue Jul 9 13:35:54 2002 --- http_protocol.c Tue Jul 9 12:35:59 2002 *** *** 1991,1997 r-read_chunked = 1; } ! else if (lenp) { const char *pos = lenp; int conversion_error = 0; --- 1991,1997 r-read_chunked = 1; } ! else if (lenp *lenp != '\0') { const char *pos = lenp; int conversion_error = 0; Admittedly it only opens up empty string (blank) Content-Length values to default to 0, not white space ones, but I think that's all I really need to get me going for now until the next release. I believe this may be the simple check of *lenp that Roy was talking about. Since I'm brand new to the Apache source code in general, and not really a C expert either, any comments or criticisms are welcome regarding this. Dave At 11:18 AM 7/9/2002 -0700, Roy T. Fielding wrote: WTF? -1 Jim, that code is doing an error check prior to the strtol. It is not looking for the start of the number, but ensuring that the number is non-negative and all digits prior to calling the library routine. A simple check of *lenp would have been sufficient for the blank case.
recent chunked encoding fix -vs- mod_proxy...
I have a situation where I have an external-facing apache server proxying to another apache server inside a firewall. I've updated the proxying one to Apache 1.3.26 so that it won't get hacked due to the chunked encoding bug, but I'm not able to upgrade the other one behind the firewall for quite some time (a few months since it's integrated with another product). I've been trying to figure out if I'm vulnerable externally or not in this situation. It appears to me that I'm not, because it looks to me like the mod_proxy handler calls the same core chunked reading functionality that the rest of Apache uses (i.e. from main/http_protocol.c) and that appears to be where all the fixes were made. However, I thought I'd run this by you good folks here since you're a lot more experienced with the Apache code than I am (just 2 days for me so far) Dave
1.3.26: Content-Length header too strict now
Hi! I'm having a problem since upgrading from Apache 1.3.23 to 1.3.26. It appears that the Content-Length header field is much more strict in what it will accept from http clients than it was before, and this is causing me biiig problems. A certain http client (which shall remain nameless due to embarrassment) is generating a request header like this: GET /some/file HTTP/1.0 Host: some.place.com Content-Type: Last-Modified: Accept: */* User-Agent: foo Content-Length: Technically this is a very big no-no to have some blank header fields like this, I know. Content-Length, for instance, should either specify 0 (zero) or not be listed there at all. But the client is already out there in millions of users' hands, embedded into several popular products (as part of an auto-update sort of mechanism).. so ideally, what I'd like to have, is an environment variable flag that disables some of the strictness of the Content-Length header field checking, to allow this aberrant behavior in some cases without producing errors. Perhaps this can be made part of Apache 1.3.27? Please let me know what you all think of this idea. Dave