Re: filtering huge request bodies (like 650MB files)
On Wed, 10 Dec 2003, William A. Rowe, Jr. wrote: Is it Chrises' own filter or one of ours? whichever it is, it would be nice to get this fixed. Can I suggest Chris insert mod_diagnostics at different points in his chain to identify exactly where it's buffering (if indeed that's where his memory is going)? I had a very similar situation to this, when a bug in a third-party library caused it to buffer everything in my filter. mod_diagnostics rapidly tracked that down for a x300 performance improvement. -- Nick Kew
Re: Thread terminatinos ?
On Thu, 11 Dec 2003, Nasko wrote: (can you please fix your mailer to post text and use a sensible line length?) Hello everyone. I have Apache 2 module running under Windows.In each Apache's thread I have a separate ODBC connection to a database. Why every thread? Wouldn't it be more efficient to share a connection pool between your threads? You might possibly want to look at mod_pg_pool (at http://apache.webthing.com/ ) as a template for that. My questions are : Can't answer them. ICBW, but AIUI threads in your MPM are merely an implementation of Apache's abstract architecture, so if you tie something to them, you're more-or-less fighting against the architecture. That means your module is likely to be non-portable and at risk of breaking on future Apache updates. -- Nick Kew In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
Re: Save brigade and buckets
On Wed, 7 Jan 2004, Brian Akins wrote: This may not be apache-dev related, but I do not know where else to ask it. apache-modules maybe? Is it possible to save an entire bucket bridade (including it's buckets) across requests. I looked at ap_save_brigade, but I'm sure that will work. It seems that the brigades are always tied to a connection. ICBW here, but ... Brigades are created on a pool. When the pool dies, so does the brigade. Most brigades are created on the Request or Connection pool, so die with the request or connection. ap_save_brigade lets you save a brigade into another brigade. To make that work across requests, you should be able to save to a brigade on the server pool. However, that still doesn't help if you want the saved brigade to be seen by a subsequent request, because that'll be handled by a random server, likely not the one whose pool you used. Your best bet may be to try and use a persistent connection, and handle breaking the connection as an error. -- Nick Kew
Re: Philosphical help - module or CGI
On Fri, 9 Jan 2004, Kean Johnston wrote: Good morning all, For a (private) project I am working on, I would appreciate a little advice. The system produces mostly dynamic content, very little static stuff. A lot of the data that will be served up by httpd comes from a daemon running on the same host, and all of the required data is in a shared memory segment. SHared memory is not easy with Apache. If you implement a pool / pointers in shared memory, then you're significantly advancing it. If you need shared memory pointers, you might be better-off writing a separate daemon, and connecting to your module. I can either produce the content with a CGI that attaches to the shared segment, gets the data and renders it in HTML, I'd suggest a persistent daemon. Perhaps prototype it as a single program, then separate off the shm via RPC. Of course if your shm use doesn't involve pointers then it's all much simpler. frequently and almost always in core. Of course, I can hack the web server to my hearts content, and possibly even have the main httpd create the semaphores and shared memory segments, so that when it preforks, all of the children already have all of those set up and simply need to use the semaphore for the read lock, render the data from the shared segment and be done with it. Take a look at the DB Pool modules at apache.webthing.com. A similar approach might be what you need. Do you think it would be overkill to write this as a module, or would the simplicity gained by writing a normal CGI be worth it? I've not written an Apache module before, so it would be a bit of a learning curve, but a worthwhile one I think. I agree with that. Once you're up the curve it becomes just as simple as CGI, and gives you more flexibility and modularity. I intend to implement this using httpd 2.0, if that makes any difference. Yes, that's a far more powerful development platform than 1.x. -- Nick Kew
Re: ReplaceModule directive!?
On Fri, 16 Jan 2004, Lars Eilebrecht wrote: According to Gerardo Reynaga: Is there a way to pass directives to httpd once the server is running? How about using a graceful restart? Would that be feasible in your case? Graceful restart (along with HUP restart and even stop) fails horribly when an installed module has been updated. We could do with a mechanism for that, and if Gerardo is going to implement it then great (though I can't help thinking it ought to be simpler: an unload-all-modules thing). Perhaps this wants a bug report. -- Nick Kew
Re: Capabilities to provide UDP services with Apache
On Wed, 28 Jan 2004, Matthew Gress wrote: In any case, I have not found a reference to how to configure apache to do this and need to know where I should start to create or adapt for this functionality. I think you just have to write it. The nearest thing Apache has to a utility library is the APR. Another question I have is, can we create a module that services UDP connections without hitting the cental apache server code. Are you sure this prospective module wants to live within Apache? What are you expecting this to gain for you over (say) an RPC-based daemon sitting alongside Apache? -- Nick Kew
Re: Help in Writing Apache Modules in C
On Wed, 28 Jan 2004, Will Lowe wrote: If you're looking for Apache 1.x (not 2.x) Given that he's writing C, and that 2.x is a vastly richer development environment than 1.3, why should he even consider that? In 1.3 days, application developers had to resort to all kinds of add-ons, none of which are in C. If it had had the power of the 2.0 API, we'd probably still have CGI and PHP at the bottom end, of add-ons, but the need for backends like Tomcat would probably never have been felt. Awaiting for a Helping Hand. Suggest looking at existing modules, and reading the nicely-documented Apache header files. -- Nick Kew
Re: mod_gcj project at sourceforge
On Sat, 20 Mar 2004, Hannes Wallnoefer wrote: Hi there, just wanted to drop a note that I've started a sourceforge project to create an module to run natively compiled Java inside Apache using the Gnu compiler for Java (GCJ). Erm, why does that need a module? Surely all it needs is to deal with the linkage from C. I've done that experimentally, building the W3C CSS validator with gcj, but considered this just too huge and unwieldy to contemplate for operational use. -- Nick Kew
Re: mod_gcj project at sourceforge
On Sat, 20 Mar 2004, Hannes Wallnoefer wrote: just wanted to drop a note that I've started a sourceforge project to create an module to run natively compiled Java inside Apache using the Gnu compiler for Java (GCJ). Erm, why does that need a module? Surely all it needs is to deal with the linkage from C. I've done that experimentally, building the W3C CSS validator with gcj, but considered this just too huge and unwieldy to contemplate for operational use. The idea is to have the module load and execute arbitrary Java byte code. In other words, mod_gcj will act as bridge between Apache and user-provided .class and .jar files. Aha! That's different - more akin to mod_perl or mod_python. And if you thought linking libgcj was unwieldy you probably haven't tried to run Apache + mod_jk + Tomcat recently. Indeed I haven't, and when I last did it (about three years ago) it was under protest:-) No, it's not linking libgcj that's the issue. It was loading a library that had not only compiled to a 7Mb .so itself, but also had two huge dependencies. Your module sounds as if it should be able to offer a more satisfactory alternative to that. And it might also be very useful for one of my wishlist-projects if that ever happens. Thanks for clarifying. I'm just off to your sourceforge page to learn more:-) -- Nick Kew
Re: to the non-committer folks in our communities...
On Wed, 24 Mar 2004, Jeff Trawick wrote: Sometimes people report bugs and/or post patches on these lists and for whatever reason they are never properly addressed. Discussion on the list is great, but it is all too easy for the e-mails move out of sight. The mail arrives all too quickly. The best action you can take to avoid the bit bucket for your bug reports and patches is to open a problem report at http://nagoya.apache.org/bugzilla/. If a patch is associated with it, once you create the bug report go back to the report to attach the patch and add PatchAvailable to the keywords field. Jeff, thanks for that. Having seen a couple of patches fall into a black hole - and one recently get committed - it had been in the back of my mind to ask about attaching patches to a bug report. Now you've answered for me, I'll do that in future. Perhaps that should go into the developer docs? -- Nick Kew
Re: RequestHeader directive cannot be made conditionnal of env vars
On Thu, 25 Mar 2004, Vincent Deffontaines wrote: As this seems quite simple to implement, here is my question : would a patch implementing env vars in RequestHeader be accepted? I would support that patch. Since you're new to this list, you'll have missed Jeff Trawick's recent post about third-party patches. It's most likely to get adopted if you raise it as a bug report, then attach the patch to that. -- Nick Kew
Re: mod_deflate updates
On Wed, 14 Apr 2004, Justin Erenkrantz wrote: Your changes sound fair enough in concept, but I won't really review until it becomes a patch. ;-) -- justin OK, I'll turn it into a patch. But maybe not just now after a second glass of wine:-) I'm thinking: my use of r-notes works well when another module is setting it, but I should implement a configuration directive as an alternative. Make existing behaviour the default, but let httpd.conf override it and force or suppress output compression. -- Nick Kew
[PATCH] Re: mod_deflate updates
As discussed previously, here are my updates as a patch against 2.0.49. They serve to enable working with compressed data coming from a proxy (or other backend) and processing content in the output filter chain. -- Nick Kew Nick's manifesto: http://www.htmlhelp.com/~nick/ --- mod_deflate.c.old 2004-04-15 23:35:38.0 +0100 +++ mod_deflate.c 2004-04-15 23:38:02.0 +0100 @@ -23,6 +23,12 @@ * * Written by Ian Holsman * + * Modified by Nick Kew, April 2004 + * + * FIX:deflate html content based on a NOTE from html-parse filter. + * ADD:inflate_out_filter to decompress content for output filters + * in a proxy with compressed backend. I don't understand the + * zlib stuff, so it's modified blind from the input filter. */ #include httpd.h @@ -339,6 +345,11 @@ /* if they don't have the line, then they can't play */ accepts = apr_table_get(r-headers_in, Accept-Encoding); + + /* NRK: accept it if we removed Accept-Encoding earlier */ +if (accepts == NULL) { + accepts = apr_table_get(r-notes, Accept-Encoding); + } if (accepts == NULL) { ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); @@ -834,10 +845,224 @@ return APR_SUCCESS; } + +/* Filter to inflate for a content-transforming proxy. */ +static apr_status_t inflate_out_filter(ap_filter_t *f, + apr_bucket_brigade *bb) +{ +int deflate_init = 1 ; +apr_bucket *bkt; +request_rec *r = f-r; +deflate_ctx *ctx = f-ctx; +int zRC; +apr_status_t rv; +deflate_filter_config *c; + +c = ap_get_module_config(r-server-module_config, deflate_module); + +if (!ctx) { +int found = 0; +char *token, deflate_hdr[10]; +const char *encoding; +apr_size_t len; + +/* only work on main request/no subrequests */ +if (r-main) { +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); +} + +/* Let's see what our current Content-Encoding is. + * If gzip is present, don't gzip again. (We could, but let's not.) + */ +encoding = apr_table_get(r-headers_out, Content-Encoding); +if (encoding) { +const char *tmp = encoding; + +token = ap_get_token(r-pool, tmp, 0); +while (token token[0]) { +if (!strcasecmp(token, gzip)) { +found = 1; +break; +} +/* Otherwise, skip token */ +tmp++; +token = ap_get_token(r-pool, tmp, 0); +} +} + +if (found == 0) { +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); +} + +f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx)); +ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc); +ctx-buffer = apr_palloc(r-pool, c-bufferSize); + + +zRC = inflateInit2(ctx-stream, c-windowSize); + +if (zRC != Z_OK) { +f-ctx = NULL; +inflateEnd(ctx-stream); +ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, + unable to init Zlib: + inflateInit2 returned %d: URL %s, + zRC, r-uri); +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); +} + +/* initialize deflate output buffer */ +ctx-stream.next_out = ctx-buffer; +ctx-stream.avail_out = c-bufferSize; + + deflate_init = 0 ; +} + + +APR_BRIGADE_FOREACH(bkt, bb) { +const char *data; +apr_size_t len; + +/* If we actually see the EOS, that means we screwed up! */ +if (APR_BUCKET_IS_EOS(bkt)) { +inflateEnd(ctx-stream); +return APR_EGENERAL; +} + +if (APR_BUCKET_IS_FLUSH(bkt)) { +apr_bucket *tmp_heap; +zRC = inflate((ctx-stream), Z_SYNC_FLUSH); +if (zRC != Z_OK) { +inflateEnd(ctx-stream); +return APR_EGENERAL; +} + +ctx-stream.next_out = ctx-buffer; +len = c-bufferSize - ctx-stream.avail_out; + +ctx-crc = crc32(ctx-crc, (const Bytef *)ctx-buffer, len); +tmp_heap = apr_bucket_heap_create((char *)ctx-buffer, len, + NULL, f-c-bucket_alloc); +APR_BRIGADE_INSERT_TAIL(ctx-proc_bb, tmp_heap); +ctx-stream.avail_out = c-bufferSize; + +/* Move everything to the returning brigade. */ +APR_BUCKET_REMOVE(bkt); +break; +} + +/* read */ +apr_bucket_read(bkt, data, len, APR_BLOCK_READ); + + /* first bucket
mod_deflate update
Attached: a one-line bugfix to my recent patch. The inflate output filter needs to unset the Content-Encoding header when it unsets the content encoding. Also a question: When I create a bucket brigade in a module, I always explicitly apr_brigade_destroy() it. None of the filters in mod_deflate destroy their brigades. A look at apr_brigade.c shows that it's not in fact necessary, but maybe a note to that effect would be in order? -- Nick Kew Nick's manifesto: http://www.htmlhelp.com/~nick/--- mod_deflate.c 2004-04-18 13:06:13.0 +0100 +++ mod_deflate.c.old 2004-04-18 13:07:44.0 +0100 @@ -895,6 +895,7 @@ ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); } + apr_table_unset(r-headers_out, Content-Encoding) ; f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx)); ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc);
Proposal: AP_FTYPE_PREPROCESS
Content-transforming filters are a major and increasingly-popular application of Apache, and serve many purposes. Implementing major functionality in an output filter rather than a handler has the great advantage of making it re-usable with different handlers, including mod_proxy. The place for such filters is of course AP_FTYPE_RESOURCE. But they may often require a pre-processing step. My recent update to mod_deflate provides a filter to decompress gzipped content for manipulation by a content-transforming filter. In the context of a proxy, this pre-processing can only happen in an output filter. I hacked this in mod_deflate by declaring the gunzip filter as of AP_FTYPE_RESOURCE-1. There are many similar situations. My own work in progress includes decoding image formats for an image processing filter, and is likely to include an error-recovering iconv filter to ensure graceful recovery when proxying content containing bogus characters through a markup filter. Rather than use hacks like AP_FTYPE_RESOURCE-1, would it not be better to introduce a new output filter type AP_FTYPE_PREPROCESS below RESOURCE for this kind of application? -- Nick Kew Nick's manifesto: http://www.htmlhelp.com/~nick/
Exception handling and MPMs
I'm writing a jpeg module, using libjpeg to implement cjpeg/djpeg filters. Looking at error handling, I find that libjpeg by default exits on fatal error. This can be overridden with a setjmp/longjmp construct. However, I seriously doubt setjmp/longjmp is safe with threaded MPMs, and there's no apr_setjmp. So that's not an attractive option. It seems that other libraries inherit this behaviour. For example, gd does both the above, and is harder to override than libjpeg, so that doesn't help. An alternative might be to use C++ try/catch, with a throw() in the fatal-error handler. This seems to offer the compiler more scope for generating thread-safe code than setjmp/longjmp, but I really don't know if that's wishful thinking ... Where do I stand using either setjmp/longjmp or try/throw/catch with different MPMs? -- Nick Kew Nick's manifesto: http://www.htmlhelp.com/~nick/
Re: is it possible to mark buckets to be copied only when to be set-aside?
On Tue, 18 May 2004, Stas Bekman wrote: extra allocation happens). But just now one user has reported that it breaks mod_xslt filter, which sets aside the buckets sent from the modperl handler, and then uses them after seeing EOS. That seems to me an unnecessarily complex and inefficient XSLT implementation. What XSLT needs to do with its data is to parse to a DOM. By using libxml2/libxslt we can use a parseChunk API, and thus feed every bucket to the parser as soon as it reaches the filter. No need at all to buffer or setaside it. We have at least one implementation that works like that (originally mine, but now more actively developed by others as mod_transform). Perhaps mod_perl users might benefit from switching? -- Nick Kew
Re: is it possible to mark buckets to be copied only when to be set-aside?
On Tue, 18 May 2004, Stas Bekman wrote: Frankly I even have no idea who is the author of mod_xslt, it's not part of the mod_perl project. There are several modules with that name. When you raised it as a problem you had encountered with mod_perl, I thought maybe mod_perl had specific hooks for it - although there would seem to be no reason to do so other than particular system optimisations. -- Nick Kew
Re: [PATCH] mod_deflate + mod_proxy bug
On Wed, 9 Jun 2004, Allan Edwards wrote: Running ProxyPass with mod_deflate results in an extraneous 20 bytes being tacked onto 304 responses from the backend. The problem is that mod_deflate doesn't handle the zero byte body, adds the gzip header and tries to compress 0 bytes. This patch detects the fact that there was no data to compress and removes the gzip header from the bucket brigade. Any comments before I commit to head? This is part of a slightly broader problem with proxying and mod_deflate: it'll also waste time gzipping already-compressed data from the backend in those cases where the compression is not explicitly indicated in the Content-Encoding header. Obvious examples are all the main image formats. I'm currently running a hack that works around this, and planning a better review when time permits (i.e. when I've caught up with things after http://www.theatreroyal.com/showpage.php?dd=1theid=2578 which now has three nights left to run). More interesting is the entire subject of filtering in a dynamic context such as a proxy. The directives available to control filtering are simply not up to it. Watch this space:-) -- Nick Kew
Re: [PATCH] mod_unique_id: Keep running w/ Broken DNS
On Tue, 15 Jun 2004, Paul Querna wrote: I see three ways to solve this issue: 1) Make the error we spit out more verbose when DNS is broken. +1 on that. Error message should suggest that luser disables the module. 2) Continue running, turning off mod_unique_id. Violates KISS (slightly), and keeps the module loaded as deadweight. 3) Complain to upstream vendors.(don't enable mod_unique_id by default!) Good idea, but we don't really control that unless we adopt DJB-style licensing. No matter what is done, It would be nice to have some sort of change into 2.0.50 before a release is made. Indeedie. -- Nick Kew
RE: Aborting a filter.
On Tue, 22 Jun 2004, Peter J. Cranstone wrote: Thanks... we're currently testing a new version of mod_gzip called mod_gzip64i For the record, I've fixed the problem. It was a failure to support some of the compression flags. Now I'll have to (side?)port it into a CVS version of mod_deflate ... grumbleWhy isn't this documented in the manpages or in zlib.h/grumble? -- Nick Kew
Proxy Cookie Support (Bug #10722)
I recently patched bug 10722. My patch was against 2.0.49, for a Client who needed it in a hurry. I'm just porting it to 2.1-HEAD. In doing so, I find there's an existing patch that saves and restores cookies without rewriting the Domain and Path components. AFAICS my patch would/should supersede that one. But I'm reluctant to do so without understanding the purpose of the other patch. The other patch is at http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/proxy/proxy_http.c?r1=1.184r2=1.185 It appears just to merge Set-Cookie headers in r-err_headers_out with those in r-headers_out. The latter have just come from the backend (the server proxied). But how/why should there be (any) cookies in r-err_headers_out at this point? Presumably they'd be from the proxy rather than the backend? And why merge them into a normal 2xx response? -- Nick Kew
Re: Proxy Cookie Support (Bug #10722)
On 25 Jun 2004, Joe Schaefer wrote: Jim Jagielski [EMAIL PROTECTED] writes: Nick Kew wrote: [...] It appears just to merge Set-Cookie headers in r-err_headers_out with those in r-headers_out. The latter have just come from the backend (the server proxied). But how/why should there be (any) cookies in r-err_headers_out at this point? Presumably they'd be from the proxy rather than the backend? And why merge them into a normal 2xx response? This is so Cookies added my the local proxy server (Apache) via internal custom modules do not loose those cookies when used also as a proxy. If a module has added Cookie information we should honor that and maintain it as is. Roy and I talked about this and both agreed that it made sense hence the patch. OK, on further consideration after posting, I've reached the view that the 2.0.49 patch should be modified. I'd propose to run the same transformations on Set-Cookie headers from the backend, but move the rewriting - along with the rewriting of Date and Location headers (ap_proxy_date_canon and ap_proxy_location_reverse_map) into ap_proxy_read_headers. That actually both cleans up the code (reduces adhockery) and makes it more efficient (reduces table operations), as well as making my patch orthogonal to yours. But is the err_headers_out logic in proxy_http.c HEAD really ok? That I still find puzzling. Perhaps there's a URL into the mail archives from your previous discussion that would explain it? -- Nick Kew
Re: Proxy Cookie Support (Bug #10722)
On Fri, 25 Jun 2004, Jim Jagielski wrote: So the apr_table_do(addit_dammit, save_table, r-err_headers_out, Set-Cookie, NULL); line should be removed. OK, that's what I needed to know. I'll still have to modify my patch slightly to work with yours, but at least it's now clear what's going on. -- Nick Kew
Re: URI lossage with ProxyPass
On Thu, 17 Jun 2004, Francois-Rene Rideau wrote: [ message quoted in full and crossposted to [EMAIL PROTECTED] ] I have experienced quite some trouble due to design bugs in ProxyPass, and have proposed a patch for apache 1.3. The very same bugs are present in apache 2.0, and a similar fix could be used. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=29554 I've reviewed this in the context of httpd-2.1, and it looks good to me with essentially the same patch. It works on your testcase, and I'm 99% satisfied that it doesn't break anything. Ready to commit if we can answer the remaining question: should proxy_fixup be removed altogether: Can you tell me if you'll fix the official mod_proxy, either using my patch or otherwise? The bug symptoms are that (1) when a request to a ProxyPass host contains %3A, the %3A is expanded to a colon, which yields an incorrect HTTP URL that confuses the remote host. (2) when a request to a ProxyPass host contains %2F, apache rejects the request with a 404 without even contacting the remote host. The bug causes are that (1) function modules/proxy/mod_proxy.c:proxy_fixup() makes a misguided attempt at URI canonicalization. It should definitely not try to when using PROXY_PASS, and probably not in STD_PROXY mode either. Since I don't understand all the ins and outs, my patch only adds a bypass in the case of PROXY_PASS, but I believe the whole function should be scrapped altogether (whoever checks in the patch should ponder that). Graham Leggett's reply seems to support that, and having figured out what you are talking about, I agree. Can anyone see why proxy_fixup should not be removed altogether? (2) r-proxyreq=PROXY_PASS is declared too late, only in modules/proxy/mod_proxy.c:proxy_trans(), so that main/http_request.c:process_request_internal() already messed up with the URL, not realizing there is a proxy request going on. Consequently, the ProxyPass alias detection MUST happen not in modules/proxy/mod_proxy.c:proxy_trans() but in modules/proxy/mod_proxy.c:proxy_detect(). This may or may not interfere with funky rewrites that some people may want to do before or after a ProxyPass is used. Someone who understands such issues should step in and tell. Maybe my change introduces some subtle incompatibilities in *actually deployed* setups, but I would bet not, and some mechanism could be devised to restore proper behaviour for those who would need such a feature. I hope my patch doesn't break any expected behaviour, but I can't be sure. What I'm certain of is that ProxyPass is quite broken without my patch. Please consider merging this patch into apache, and tell me when it's done. Cheers, [ François-René ÐVB Rideau | ReflectionCybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] The last good thing written in C was Franz Schubert's Symphony number 9. -- Erwin Dieterich [EMAIL PROTECTED] -- Nick Kew
PATCH: various mod_proxy issues
I've rolled a fairly extensive mod_proxy patch, which seems rather big to commit without review. Comments please: (1) Bug #10722 - cookie paths and domains in reverse proxy Following my patch to 2.0.49, I've adapted it for 2.1, taking into account Jim's patch. In doing so, I made some organisational changes: * moved rewriting of headers that need it (ap_proxy_date_canon and ap_proxy_location_reverse_map) to a new function process_proxy_header called from ap_proxy_read_headers. * Removed the same from ap_proxy_http_process_response * moved ap_proxy_read_headers from proxy_utils to proxy_http * Retained Jim's patch, but removed the line merging err_headers_out (2) Bug #29554 - URL munging I've ported Francois-Rene Rideau's patch to 2.1, subject to the question over proxy_fixup discussed in my last post. Any problems with committing this? -- Nick Kewdiff -u proxy-old/mod_proxy.c proxy/mod_proxy.c --- proxy-old/mod_proxy.c 2004-06-26 07:18:10.0 +0100 +++ proxy/mod_proxy.c 2004-06-26 07:13:46.0 +0100 @@ -94,9 +94,10 @@ static int proxy_detect(request_rec *r) { void *sconf = r-server-module_config; -proxy_server_conf *conf; - -conf = (proxy_server_conf *) ap_get_module_config(sconf, proxy_module); +proxy_server_conf *conf = + (proxy_server_conf *) ap_get_module_config(sconf, proxy_module); +int i, len; +struct proxy_alias *ent = (struct proxy_alias *)conf-aliases-elts; /* Ick... msvc (perhaps others) promotes ternary short results to int */ @@ -121,6 +122,19 @@ r-uri = r-unparsed_uri; r-filename = apr_pstrcat(r-pool, proxy:, r-uri, NULL); r-handler = proxy-server; +} else { +/* test for a ProxyPass */ +for (i = 0; i conf-aliases-nelts; i++) { +len = alias_match(r-unparsed_uri, ent[i].fake); +if (len 0) { +r-filename = apr_pstrcat(r-pool, proxy:, ent[i].real, + r-unparsed_uri + len, NULL); +r-handler = proxy-server; +r-proxyreq = PROXYREQ_REVERSE; +r-uri = r-unparsed_uri; +break; +} +} } return DECLINED; } @@ -140,26 +154,6 @@ return OK; } -/* XXX: since r-uri has been manipulated already we're not really - * compliant with RFC1945 at this point. But this probably isn't - * an issue because this is a hybrid proxy/origin server. - */ - -for (i = 0; i conf-aliases-nelts; i++) { -len = alias_match(r-uri, ent[i].fake); - - if (len 0) { - if ((ent[i].real[0] == '!' ) ( ent[i].real[1] == 0 )) { - return DECLINED; - } - - r-filename = apr_pstrcat(r-pool, proxy:, ent[i].real, - (r-uri + len ), NULL); - r-handler = proxy-server; - r-proxyreq = PROXYREQ_REVERSE; - return OK; - } -} return DECLINED; } @@ -221,7 +215,7 @@ return OK; } - +#if 0 /* -- */ /* Fixup the filename */ @@ -236,6 +230,13 @@ if (!r-proxyreq || !r-filename || strncmp(r-filename, proxy:, 6) != 0) return DECLINED; +/* We definitely shouldn't canonicalize a proxy_pass. + * But should we really canonicalize a STD_PROXY??? -- Fahree + */ +if (r-proxyreq == PROXYREQ_REVERSE) { +return OK; +} + /* XXX: Shouldn't we try this before we run the proxy_walk? */ url = r-filename[6]; @@ -250,7 +251,7 @@ return OK; /* otherwise; we've done the best we can */ } - +#endif /* Send a redirection if the request contains a hostname which is not */ /* fully qualified, i.e. doesn't have a domain name appended. Some proxy */ /* servers like Netscape's allow this and access hosts from the local */ @@ -439,6 +440,10 @@ ps-proxies = apr_array_make(p, 10, sizeof(struct proxy_remote)); ps-aliases = apr_array_make(p, 10, sizeof(struct proxy_alias)); ps-raliases = apr_array_make(p, 10, sizeof(struct proxy_alias)); +ps-cookie_paths = apr_array_make(p, 10, sizeof(struct proxy_alias)); +ps-cookie_domains = apr_array_make(p, 10, sizeof(struct proxy_alias)); +ps-cookie_path_str = apr_strmatch_precompile(p, path=, 0) ; +ps-cookie_domain_str = apr_strmatch_precompile(p, domain=, 0) ; ps-noproxies = apr_array_make(p, 10, sizeof(struct noproxy_entry)); ps-dirconn = apr_array_make(p, 10, sizeof(struct dirconn_entry)); ps-allowed_connect_ports = apr_array_make(p, 10, sizeof(int)); @@ -474,6 +479,12 @@ ps-sec_proxy = apr_array_append(p, base-sec_proxy, overrides-sec_proxy); ps-aliases = apr_array_append(p, base-aliases, overrides-aliases); ps-raliases = apr_array_append(p, base-raliases, overrides-raliases); +ps-cookie_paths += apr_array_append(p, base-cookie_paths, overrides-cookie_paths); +ps-cookie_domains +
Re: URI lossage with ProxyPass
On Sat, 26 Jun 2004, Graham Leggett wrote: Nick Kew wrote: Can anyone see why proxy_fixup should not be removed altogether? Proxy fixup seems to do the job of making sure the URL /%41%42%43 matches ProxyPass /ABC http://xxx/ABC;, so I don't think it should be removed altogether. I don't think that's right. Both proxy_detect and proxy_trans happen before proxy_fixup, and the comment in proxy_fixup refers to its relationship with mod_rewrite. The patched apache fails that test, but simply reinstating proxy_fixup makes no difference to that. Now I'm confused. I think you're right in your other post: separate patches for separate bugs. And not necessarily at 4 a.m. But having come this far, I want to see both fixed:-) And a trawl of bugzilla tells me that the URI Lossage is bug #15207 and probably others, while bug #16812 is a trivial corollary to the cookie patch. -- Nick Kew
Re: URI lossage with ProxyPass
On Sat, 26 Jun 2004, Graham Leggett wrote: Nick Kew wrote: Can anyone see why proxy_fixup should not be removed altogether? Proxy fixup seems to do the job of making sure the URL /%41%42%43 matches ProxyPass /ABC http://xxx/ABC;, so I don't think it should be removed altogether. OK, the reason for that is that the patch moved ProxyPass-ing from proxy_trans to proxy_detect. The latter happens before canonicalisation, which is both why the patch works and why it breaks the above. A fix is for alias_match() to recognise %xx sequences. I've now implemented it, but also separated out the URI-trouble stuff with #ifdef FIX_15207 on the grounds that it's still subject to debate. That still leaves us a proxy_fixup with no purpose I can see. Perhaps someone who uses it with mod_rewrite can say if it does anything for you? -- Nick Kew
Re: 2.2 Roadmap?
On Sun, 27 Jun 2004, Paul Querna wrote: The 2.0 branch was made over 18 months ago, it is time to make another stable branch. I believe many people want the AAA changes, and it brings even more features to encourage people to upgrade from 1.3. There's another consideration that could be relevant here: people who never touch an N.0 software release. Bump it to 2.3? :-) This is only a list from my initial thoughts, please comment and make suggestions. I will take the resulting thread and rewrite the ROADMAP file. Smart filtering. We need much better dynamic configuration of the filter chain, with processing depending on the headers. Think an AddOutputFilterByType that isn't a hackish afterthought, and extend that to work with more than just Content-Type. It also fixes the awkwardness currently involved in ordering a nontrivial filter chain. I've got this working with some minor hacks to 2.0. Need time to generalise/abstract it into a proposal. -- Nick Kew
PROPOSAL: Enhance mod_headers as a debug/test tool
(If this gets the thumbs up, I'll be happy to do the work:-) In testing new code, it's often helpful to simulate different browser requests, and responses. For handlers and filters, mod_headers enables us to set up testcases very easily, with the Header and (especially) RequestHeader directives. But that's in a fixups hook, so it's no use for any hooks running in earlier phases of a request. My proposal is to introduce an additional DEBUG keyword to the Header and RequestHeader directives. Headers marked as DEBUG will be set in post_read_request, so they are available to other modules. Without DEBUG, it will default to current (fixups) behaviour. Of course, DEBUG won't work with conditional (Request)Header directives. In addition to documenting this, attempts to do so will log a warning. -- Nick Kew
Re: PROPOSAL: Enhance mod_headers as a debug/test tool
On Sun, 4 Jul 2004, Nick Kew wrote: (If this gets the thumbs up, I'll be happy to do the work:-) Since reaction seemed broadly positive, I've checked it in to HEAD. Following suggestions in the replies, it's invoked by the keyword early which takes the place of the env clause. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Mon, 12 Jul 2004, Ian Holsman wrote: ok, now before I start this let me say one thing, this is not for *ALL* requests, it will only work for ones which don't have content-length modifiable filters (like gzip) applied to the request, and it would be left to the webserver admin to figure out what they were, and if you could use this. But that's not an issue if the byterange filter comes after any filters that modify content (CONTENT_SET). ok.. at the moment when a byterange request goes to a dynamic module, the dynamic module can not use any tricks to only serve the bytes requested, it *HAS* to serve the entire content up as buckets. Indeed. That only becomes a problem when a filter breaks pipelining. what I am proposing is something like: 1. the filter keeps a ordered list of range requests that the person requests. 2. it keeps state on how far it has processed in the file. thanks to knowing the length of the buckets processed so far. Q: when do the actual headers get put in.. I think they are after no? ITYM data, not the file. The case of a single file is trivial, and can more efficiently be handled in a separate optimised execution path. And some bucket types have to be read to get their length. 3. it then examines the bucket + bucket length to see which range requests match this range, if some do it grabs that range (possibly splitting/copying if it meets multiple ranges) and puts it on the right bits of each range request. 4. if the top range request is finished, it passes those buckets through. 5. repeat until EOS/Sentinel, flushing the ordered list at the end. This doesn't completely address the issue that this might cause excessive memory usage; particularly if we have to serve ranges in a perverse order. I would propose two admin-configurable limits: (1) Total data buffered in memory by the byterange filter. This can be computed in advance from the request headers. If this is exceeded, the filter should create a file bucket to store the data, and the ordered list then references offsets into the file. (2) A limit above which byteranges won't be served at all: most of us have neither the memory nor the /tmp space for a gigabyte. now.. this assumes that splitting a bucket (and copying) is a zero cost operation which doesn't actually *read* the bucket, is this true for most bucket types? would this kind of thing work? As I said, the trivial cases should (transparently) be treated separately and more simply. Otherwise ... well, as discussed on IRC. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Mon, 12 Jul 2004, Graham Leggett wrote: at the moment when a byterange request goes to a dynamic module, the dynamic module can not use any tricks to only serve the bytes requested, it *HAS* to serve the entire content up as buckets. In theory, if mod_proxy (for example) gets a byte range request, it should only serve that byte range - ideally modules/filters should not prop up other modules/filters. That will not always be practicable. mod_proxy should be configurable to pass byteranges headers straight through to the backend or strip them and assume the proxy will handle the ranges. If a filter somewhere in the filter stack is going to break the byte range request in any way (for example something like mod_include) then that filter should be responsible for removing the Range header from the request before mod_proxy gets a chance to service the request. Doesn't that break modularity rather badly? mod_include is concerned with simple content modifications, not HTTP. It doesn't need more complexity. In theory, as the byte range filter should be one of the topmost filters run, it would have seen the Range header and noted what range it should have been returning, so a downstream filter removing the Range header should not cause a problem for the byte range filter. But if you adopt that approach, then *every* filter has to faff about with range headers (just in case), the first one strips it out, and the others run in blissful ignorance. Makes more sense if only the byterange filter concerns itself with the header. In turn, if a downstream filter/content handler has returned a 206 Partial Content response, the byte range filter should know what to do (has my job already been done by a downstream filter? Yes, quietly remove itself from the chain. In fact thinking about this some more - mod_include might look at the byte range, and then intelligently decide to either include / not include certain included content based on the byte range. This could improve performance on some sites. For mod_include to do that is an order of magnitude extra complexity (even if you solve the problem of measuring the length of each include without actually executing it). For modules that generate entirely new data - such as those based on a markup processor (accessibility, xmlns, xinclude/xslt, proxy_html, annot - to name but a few) it becomes even bigger: we'd have to count every byte we write! So to sum up: - Teach the byte range filter that it might receive content from a content handler that already has the byte range applied, and to react intelligently when this happens. A content handler will indicate this by returning a 206 Partial Content and/or a Content-Range header, which is easily parsed by the byte range filter - no need for special flags or buckets. That has to be configurable, as some filters can only run on a complete datastream. - Teach certain content handlers (such as mod_proxy or mod_cache) to handle byte range requests themselves, using the standard RFC2616 headers and responses to indicate whether ranges have been applied. Which content handlers will be taught this will depend on whether there is a performance gain to be had by getting the content handler to know about byte ranges. mod_proxy needs only two modes: transparent (leave it to the backend) or opaque (get the entire document and leave it to the byteranges filter). The latter would be appropriate when cacheing and/or content-filtering. mod_cache in quick-handler mode is a special case. But since that's only serving from a complete document in-memory or on-disc, it's straightforward. - Teach certain problem modules (mod_gzip if appropriate) to react That'll be mod_byteranges. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Tue, 13 Jul 2004, Joe Orton wrote: On Mon, Jul 12, 2004 at 03:35:12AM +0100, Nick Kew wrote: This doesn't completely address the issue that this might cause excessive memory usage; particularly if we have to serve ranges in a perverse order. I would propose two admin-configurable limits: (1) Total data buffered in memory by the byterange filter. This can be computed in advance from the request headers. If this is exceeded, the filter should create a file bucket to store the data, and the ordered list then references offsets into the file. Buffering responses into temporary files so that the byterange filter can do its job sounds extremely messy. Not as messy as buffering it in memory regardless of size. But I agree it's got to be configurable, and probably not a default. Being able to send byteranges of arbitrary dynamically generated content doesn't seem like an essential feature; the filter is just saying I can't efficiently process a byterange request for this content. Clients must handle the 200 response fallback already. Indeed. The question is: can we offer admins alternatives, that might be better in some situations, without messing memory. I've put forward suggestions, amplifying Ian's, on how I believe we can. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Tue, 13 Jul 2004, Graham Leggett wrote: But in the case of mod_proxy, mod_jk, etc it is quite valid and very desirable for a range request to be passed all the way to the backend, in the hope that the backend sends just that range back to mod_proxy, which in turn sends it up a filter stack that isn't going to fall over because it received a 206 Partial Content response. Indeed. In a straight-through proxy that's right. But in the case of a cacheing proxy, it may be better for it to retrieve the entire document and manage byteranges locally. And in the case of a content-transforming proxy, the filters may need the entire content to function at all. Bottom line: this has to be controlled by the server admin. We offer the options of passthrough, process locally, or ignore ranges. The above is still true - there is (and should be) very little for the content handler to worry about when it comes to HTTP compliance, and content handlers should have the option to just generate content, as they do now. Agreed. That applies both to content handlers and content filters. The problem though is not with the content handlers but with the filters - filters must not make the assumption that all content handlers only serve content and not HTTP headers. When a content handler decides that it wants to handle more of the HTTP spec so as to improve performance, it should be free to do so, and should not be stopped from doing so due to limitations in the output filters. Indeed, historically (possibly still) content length has been a problem for filters. Simply removing the header may not be sufficient if the content-length filter reinserts it erroneously. Ranges are more complex. Basically a proxy or other content generator that takes care of byteranges itself is going to be incompatible with certain output filters. That has to be documented, and there has to be an easy way for filters to detect when they're not wanted, or for Apache to mark them inapplicable and refuse to run them at all. A situation where filters have to get their hands dirty with partial responses would be a serious problem. In other words if mod_proxy is taught how to pass Range requests to the backend server, the output filter stack should not stop proxy from doing so by removing Range headers unless it is absolutely necessary. Indeed. So in httpd.conf we have options for the proxy to pass range requests through or not. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Tue, 13 Jul 2004, Graham Leggett wrote: Nick Kew wrote: Indeed. In a straight-through proxy that's right. But in the case of a cacheing proxy, it may be better for it to retrieve the entire document and manage byteranges locally. And in the case of a content-transforming proxy, the filters may need the entire content to function at all. Remember that in our case there is no such thing as a caching proxy. Of course there is! It's apache with mod_proxy and mod_cache. Just as a content-transforming proxy is apache with mod_proxy and one or more content filter module. Bottom line: this has to be controlled by the server admin. We offer the options of passthrough, process locally, or ignore ranges. I think it's better to avoid adding extra directives, or giving the admin the power to override RFC2616. How to handle ranges is described fully in the HTTP/1.1 spec, the admin shouldn't really have the option to fiddle with it. Just adding more ways to get it wrong. AFAICS RFC2616 sanctions any of the three behaviours I'm proposing. Firstly noone is required to support ranges at all. Secondly the following passage from #14.35 seems to sum up the other options: If a proxy that supports ranges receives a Range request, forwards the request to an inbound server, and receives an entire entity in reply, it SHOULD only return the requested range to its client. It SHOULD store the entire received response in its cache if that is consistent with its cache allocation policies. That's not explicit about whether the Range was forwarded, but neither AFAICS is anything else in the RFC. Any filter that could get it's hands dirty (mod_include springs to mind) should just strip the Range header from the request, leaving the byte range filter to do the dirty work for it on the full response. That is dirtying the filter API further. If filter modules are to be responsible for that, we should at least provide a higher-level API for them, ideally in a declarative form. Maybe something along the lines of an AP_IS_TRANSFORM flag or flags, that will transparently deal with Content-Length, Range and Warning headers on behalf of a filter when it is inserted. If we can abstract out the common processing required of every content-transforming filter into a simple magic-API, I'll be happy. -- Nick Kew
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On Tue, 13 Jul 2004, William A. Rowe, Jr. wrote: It would be nice in apache 2.2 to finally clean up this contract, with two simple metadata element to pass through the filter chain: . this request is unfiltered . this request has a 1:1 filter (stateless) . this request has a arbitrary content transformation Each filter is the stack could promote the complexity but should never set it to a lower state. This would allow http/proxy modules to negotiate less complex transformations in more efficient ways. Nicely put. Thank you! +1 -- Nick Kew
Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c
On Sat, 17 Jul 2004, [ISO-8859-15] André Malo wrote: * [EMAIL PROTECTED] wrote: +f-ctx = ctx = (void*)-1; I personally consider defining arbitrary pointer values as bad style, though I'm not sure what the general opinion here is (if any). I'd suggest to use a static pointer, like a global static char foo_sentinel; /* choose a speaking name ;-) */ /* and later */ f-ctx = ctx = foo_sentinel; Additionally - afair - the use of arbirtrary pointer values can even lead to bus errors on not-so-usual systems (loading undefined bits into an address register...). Yes, you're right. Actually this patch has a deeper problem, as does the patch it fixes. Setting the headers at this point depends entirely on the behaviour of the headers filter. With current behaviour, the previous mod_deflate was broken (because it could delay setting headers until after the headers have been sent down the wire). With my patch it might still risk minor breakage (repeated gzip header) if the headers filter changes sometime in future. Any more issues with this? If not I'll make nd's fix and leave it. -- Nick Kew
Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c
On Mon, 19 Jul 2004, Joe Orton wrote: On Sat, Jul 17, 2004 at 03:22:35PM -, [EMAIL PROTECTED] wrote: niq 2004/07/17 08:22:35 Modified:modules/filters mod_deflate.c Log: Fix previous patch to deal correctly with multiple empty brigades before we know if there's any content, and not re-process the headers. Is there no simpler fix for this e.g. first thing the filter does is if (APR_BRIGADE_EMPTY(bb)) return APR_SUCCESS;. And to avoid the Yes, that should work (but leaves us to reprocess the whole thing next time round). Is that better or worse? Or come to think of it, we can both return APR_SUCCESS and set a flag. re-process issue just ap_remove_output_filter(f) if finding an EOS-only brigade? Do you recollect the discussion around when that patch went in? I don't in full, but I had a nagging recollection of someone having proposed a simpler solution but found it didn't work. Just enough to persuade me to preserve the loop-over-buckets test. -- Nick Kew
setjmp/longjmp vs try/throw/catch
I have a couple of modules using third-party libraries that require me to supply an abort function (or they'll abort by exiting). For example, libjpeg in my mod_jpeg. My preferred approach to this situation is usually to resort to C++, put my code in a try/catch loop, and provide an abort handler that throws an exception. However, this doesn't play well with Apache, and when I run it in gdb, the throw appears to generate an Abort. Switching to setjmp/longjmp does appear to work well with apache and gcc. But that leaves me wondering if I need to worry about thread-safety. Is using setjmp/longjmp with Worker or Windoze MPM asking for trouble? And if so, is there an alternative approach I could try? -- Nick Kew
Re: Invitation to HTTPD commiters in tomcat-dev
On Tue, 20 Jul 2004, Henri Gomez wrote: We're discussing on tomcat-dev about a new Apache to Tomcat Apache 2.x module. We'd like to see some of the core HTTPD developpers joins the discussion about the post JK/JK2 module. As a startingpoint, how about telling us what tomcat needs that mod_proxy and friends don't provide? -- Nick Kew
Re: Invitation to HTTPD commiters in tomcat-dev
On Tue, 20 Jul 2004, Henri Gomez wrote: [ chopped tomcat-dev because that bounces my mail ] As a startingpoint, how about telling us what tomcat needs that mod_proxy and friends don't provide? In mod_jk/jk2, there is support for load-balancing and fault-tolerance and it's a key feature. Good start. I'm guessing you're ahead of me here, and your reason for posting to [EMAIL PROTECTED] is that you can see that implementing these capabilities will be of general interest to more than just tomcat users? My gut feeling would be to keep this properly modular. Let mod_proxy be the core of it, and implement load-balancing and fault-tolerance in additional modules. As a matter of fact, one of my wishlist-projects is a connection-pooling module for backend HTTP connections in a proxy. That might actually be the same as your project. -- Nick Kew
Re: Invitation to HTTPD commiters in tomcat-dev
On Tue, 20 Jul 2004, Henri Gomez wrote: We agree and I wonder if a mod_ajp could be used in conjunction with mod_proxy ? A sort of alternative way to route requests to tomcat. We have proxy_http and proxy_ftp protocol modules. That begs the question: can't proxy_ajp live alongside them? Well let see my suggestion : Makes sense. With the caveat that proxying plain HTTP can do much more than some posts in this thread seem to think. So the motivation has to be people want AJP, not HTTP can't do things. -- Nick Kew
Re: setjmp/longjmp vs try/throw/catch
On Tue, 20 Jul 2004, William A. Rowe, Jr. wrote: IIRC - all setjmp and other usually-thread-agnostic calls in a normal clib were redesigned to use TLS in the Win32 msvcrt lib, long before most Unixes considered implementing threads :) I believe on win32 you will be fine, I'd be more worried about the thread implementations. I have it on credible authority (in IRC from someone I believe, after I asked) that POSIX requires it to be thread-safe. That's good enough for me: tells me I don't need to advise the Client to use prefork. This sure sounds like an abstraction we should assist with using apr. Agreed. But I don't have APR karma to introduce the idea there. -- Nick Kew
Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c
On Mon, 19 Jul 2004, Joe Orton wrote: Nothing like that was posted to the list, at least. Patch below is still sufficient to fix the proxy+304 case; does it work for you too? Yes, mostly (it fixes the important bug that was previously a showstopper). And it's an improvement on my hack by virtue of simplicity. But it should still set the Content-Encoding header on a HEAD request that would normally be deflated (and unset content-length if present). So your: +/* Deflating a zero-length response would make it longer; the + * proxy may pass through an empty response for a 304 too. */ +if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) { +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); +} + should move after the the if ( ! force-gzip ) block, and if then if we reach the EOS-only test we should fix up the headers. That test also seems to lose the pathological case of a brigade with no data but one or more FLUSH buckets followed by EOS. Could that ever happen in a HEAD or a 204/304? Investigating this has revealed a similar bug with HEAD requests in inflate_out_filter, which I shall now have to fix:-( -- Nick Kew
Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c
On Thu, 22 Jul 2004, Nick Kew wrote: On Mon, 19 Jul 2004, Joe Orton wrote: Nothing like that was posted to the list, at least. Patch below is still sufficient to fix the proxy+304 case; does it work for you too? Yes, mostly (it fixes the important bug that was previously a showstopper). I attach a new patch based on yours. it fixes my testcases including headers for HEAD requests. Look OK to you? -- Nick Kew--- mod_deflate.c.bak 2004-07-22 11:12:53.0 +0100 +++ mod_deflate.c 2004-07-22 12:17:13.0 +0100 @@ -247,7 +247,6 @@ apr_bucket_brigade *bb, *proc_bb; } deflate_ctx; -static void* const deflate_yes = (void*)YES; static apr_status_t deflate_out_filter(ap_filter_t *f, apr_bucket_brigade *bb) { @@ -255,14 +254,14 @@ request_rec *r = f-r; deflate_ctx *ctx = f-ctx; int zRC; -char* buf; -int eos_only = 1; -apr_bucket *bkt; -char *token; -const char *encoding = NULL; deflate_filter_config *c = ap_get_module_config(r-server-module_config, deflate_module); +/* Do nothing if asked to filter nothing. */ +if (APR_BRIGADE_EMPTY(bb)) { +return APR_SUCCESS; +} + /* If we don't have a context, we need to ensure that it is okay to send * the deflated content. If we have a context, that means we've done * this before and we liked it. @@ -270,6 +269,8 @@ * we're in better shape. */ if (!ctx) { +char *buf, *token; +const char *encoding; /* only work on main request/no subrequests */ if (r-main) { @@ -349,7 +350,6 @@ */ apr_table_setn(r-headers_out, Vary, Accept-Encoding); - /* force-gzip will just force it out regardless if the browser * can actually do anything with it. */ @@ -384,39 +384,22 @@ } } -/* don't deflate responses with zero length e.g. proxied 304's but - * we do set the header on eos_only at this point for headers_filter - * - * if we get eos_only and come round again, we want to avoid redoing - * what we've already done, so set f-ctx to a flag here +/* Deflating a zero-length response would make it longer; the + * proxy may pass through an empty response for a 304 too. + * So we just need to fix up the headers as if we had a body. */ -f-ctx = ctx = deflate_yes; -} -if (ctx == deflate_yes) { -/* deal with the pathological case of lots of empty brigades and - * no knowledge of whether content will follow - */ -for (bkt = APR_BRIGADE_FIRST(bb); - bkt != APR_BRIGADE_SENTINEL(bb); - bkt = APR_BUCKET_NEXT(bkt)) -{ -if (!APR_BUCKET_IS_EOS(bkt)) { - eos_only = 0; - break; -} -} -if (eos_only) { -if (!encoding || !strcasecmp(encoding, identity)) { +if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) { + if (!encoding || !strcasecmp(encoding, identity)) { apr_table_set(r-headers_out, Content-Encoding, gzip); } else { apr_table_merge(r-headers_out, Content-Encoding, gzip); } apr_table_unset(r-headers_out, Content-Length); + +ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); } -} -if (!ctx || (ctx==deflate_yes)) { /* We're cool with filtering this. */ ctx = f-ctx = apr_pcalloc(r-pool, sizeof(*ctx)); @@ -912,6 +895,11 @@ apr_status_t rv; deflate_filter_config *c; +/* Do nothing if asked to filter nothing. */ +if (APR_BRIGADE_EMPTY(bb)) { +return APR_SUCCESS; +} + c = ap_get_module_config(r-server-module_config, deflate_module); if (!ctx) { @@ -950,6 +938,13 @@ } apr_table_unset(r-headers_out, Content-Encoding); +/* No need to inflate HEAD or 204/304 */ +if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) { +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); +} + + f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx)); ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc); ctx-buffer = apr_palloc(r-pool, c-bufferSize); @@ -983,9 +978,10 @@ apr_size_t len; /* If we actually see the EOS, that means we screwed up! */ +/* no it doesn't - not in a HEAD or 204/304 */ if (APR_BUCKET_IS_EOS(bkt)) { inflateEnd(ctx-stream); -return APR_EGENERAL; +return ap_pass_brigade(f-next, bb); } if (APR_BUCKET_IS_FLUSH(bkt)) {
Ideas for Smart Filtering
The filter architecture periodically gets discussed here, and I've been meaning to write up my thoughts for some time. I'm using a module that implements a slightly different filter API, primarily for filtering in a proxy context. I've now written a brief discussion document on the subject. It's mostly an abstraction of what I'm currently using, although it does propose some additional improvements, primarily with regard to protocol handling (reflecting the recent byteranges thread here). It generated an interesting discussion, including some interesting alternative ideas, last night on IRC. Perhaps it can lead to a general-purpose module for 2.0 and an architecture update for 2.2? http://www.apachetutor.org/dev/smart-filter -- Nick Kew
Re: Ideas for Smart Filtering
On Fri, 30 Jul 2004, Joe Schaefer wrote: Um, could you please explain that bit about te need to remove filter-init from the API? That hook plays a pivotal role in libapreq2's input filter mod_apreq. mod_apreq needs to examine the entire input filter stack and modify it under certain conditions. This cannot be done in-flight, during ap_get_brigade. Ah. Two answers to that one: (1) I'm only really considering output filters. Input filters can't depend on the handler, so the dynamic configuration discussed is not relevant to them. (2) I propose getting rid of it because I cannot see any circumstance in which it's necessary in an output filter. But it's not a requirement of the proposed archictecture, except insofar as it saves the overhead of a filter initialising when it's not going to be run. -- Nick Kew
Re: Ideas for Smart Filtering
On Fri, 30 Jul 2004, Joe Schaefer wrote: Nick Kew [EMAIL PROTECTED] writes: [...] (2) I propose getting rid of it because I cannot see any circumstance in which it's necessary in an output filter. But you still need a simple way for an output filter to run some code before the content handler gets invoked. Fair enough. Then you register it unconditionally using the old API. FWIW I've been advising output filter authors that want to get at libapreq2's post data to use filter_init for that: http://cvs.apache.org/~joes/libapreq2-2.04-dev/docs/html/apreq_faq.html If there's a better approach, I'd be glad to update those docs. Wouldn't an ap_hook_insert_filter() handler be the ideal spot for that? But anyway, if there is a valid need for filter_init in its present form in some output filters, we can still use it, provided we document why it might be inefficient to use it with dynamic configuration. -- Nick Kew
Re: Ideas for Smart Filtering
On Fri, 30 Jul 2004, Joe Schaefer wrote: Nick Kew [EMAIL PROTECTED] writes: [...] If there's a better approach, I'd be glad to update those docs. Wouldn't an ap_hook_insert_filter() handler be the ideal spot for that? I thought such hooks were run on *every* request, not just the ones which require a particular output filter? If so, for performance reasons that's not a suitable solution for folks writing output filters with mod_perl. Fair enough. So we have a reason not to dispense with that handler. It just means we need to document when not to use it. So the next question is, is it sufficient to maintain the two APIs (the established 2.0 + my proposal)? I'm thinking about that one: it's not really a problem to provide a combined API if there's anything to gain by it. But in any case, you can rest assured I'm not suggesting we abolish the old API :-) A filter can register itself as an output filter (old API) and as a smart filter provider (new API). Unless of course someone can radically improve on my proposals ... -- Nick Kew
Re: Ideas for Smart Filtering
On Sat, 31 Jul 2004, Justin Erenkrantz wrote: Yet, I'm not sure I understand the intent of your proposal. Is it that you don't like the fact that each filter has to make a decision on whether it should stick around? Essentially, yes. We already have a double-digit number of content filters in one application, and that's growing. So, what you are proposing to do is to abstract those two decisions into separate functions - i.e. decide whether to accept, and another to perform the filter? I'm currently running something like this using the ap_provider API. The reason I'm not proposing just to use that is that we want more flexibility. For example Content-Type: text/html;charset=latin1 is two different keys we might wish to dispatch on, while a Cookie could enumerate an arbitrary number. ap_register_smart_filter(name, match, filter_func, ctx, protocol_flags) Now when the harness name is inserted in the filter chain, and there is a match with match, lookup_handler (referenced above) will returh our filter_func for filter name. I'm not sure what 'match' is in this context. In the above case, it could be text/html or latin1. ap_register_smart_filter(transcode, latin1, charset_filter, ctx, flags); ap_register_smart_filter(process, text/html, html_filter, ctx, flags); But that really needs the flexibility of a regexp, so latin1 becomes latin[-_]?1|iso[-_]?8859_?1 or might expand to include other close relatives like iso-8859-15 What is the point of protocol_flags? C.f. the recent thread on handling byteranges. Bill Rowe expressed the problem rather well in that thread. In view of your request not to cite URLs for substantive discussion, I'll quote from his post: The confusion results because mod_proxy isn't implemented as a content handler, it's a protocol handler in its own right. Rather than insist on the mod_http mod_proxy agreeing to streamline the response, we've put it on every content module author to: . remove output C-L header if the size is transformed . remove input range headers if the content isn't 1:1 transformed This is very kludgy and more an example of where mod_http mod_proxy didn't quite get it right, and made it a little more difficult for folks who are just trying to transform content bodies. It would be nice in apache 2.2 to finally clean up this contract, with two simple metadata element to pass through the filter chain: . this request is unfiltered . this request has a 1:1 filter (stateless) . this request has a arbitrary content transformation Why should the filter be forced to pre-declare these decisions? Why can't I determine that dynamically? Noone forces it. A filter that wants to take charge of protocol decisions is free to do so. But requiring every filter to do so is a burden on filter writers, and is bug-prone (c.f. the number of ways to generate a bogus Content-Length on a HEAD request). I think it'd be a bad idea to key such HTTP/1.1 protocol issues in the filter API. I think we should maintain protocol-agnosticism where possible. I have to disagree there. There are certain wheels I don't want to have to redesign every time I implement a content filter. A filter that wants to take full responsibility itself should be able to do so, but bearing in mind that whatever one filter does may be overridden by another. So, to sum up: splitting out the decision whether the filter should run from it's filter function sounds fine. But, I think the Filter* directives abstract too much in this particular case. Let the filter itself decide. -- justin You're right that these are two separable tasks, and in fact the filter dispatcher is the part I have implemented, whereas the protocol handling is merely a proposal. I'd be interested to hear other views on the subject. Are you disagreeing with my quote from Bill Rowe above, or merely with my proposed solution to that problem? -- Nick Kew
Re: Ideas for Smart Filtering
On Sun, 1 Aug 2004, Justin Erenkrantz wrote: --On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew [EMAIL PROTECTED] wrote: I'm not sure what 'match' is in this context. In the above case, it could be text/html or latin1. ap_register_smart_filter(transcode, latin1, charset_filter, ctx, flags); ap_register_smart_filter(process, text/html, html_filter, ctx, flags); But that really needs the flexibility of a regexp, so latin1 becomes latin[-_]?1|iso[-_]?8859_?1 or might expand to include other close relatives like iso-8859-15 Having an overhead of regexp's by default in our filter code would seem to be a severe bottleneck. Hmmm, how many configurations don't use any LocationMatch/family containers nor AliasMatch or Rewrite rules? But anyway, fair point. Regex vs simple strcasecmp should be a flag. I'd rather avoid that or push it on those few specific modules that want the power of regexp and willing to pay the ridiculous cost penalties. The other significant thing you are missing in your API is what to match against. (I think you are assuming Content-Type, but there's a lot of cases where you want to match against something other than Content-Type.) That's part of the proposed configuration, when we declare the name for the filter harness. FilterDeclare transcode AP_FTYPE_RESOURCE FilterDispatcher transcode Content-Type [charset=([^;]+)] FilterProvider transcode latin[-_]?1|iso[-_]?8859[-_]1 latin_1_filter FilterProvider transcode [other providers for other matches] (that's maybe a bit contrived - I don't have a real-life case where we want multiple filters other than on/off for different charsets) (btw, if you think AP_FTYPE_RESOURCE should be AP_FTYPE_CONTENT_SET, that's another weakness of the architecture. If we need to transcode *before* a content filter, then we can't use CONTENT_SET. Solution: this needs to be configurable). Remember that the content-length doesn't even need to be set *before* we go into the filter. (The fact that default_handler does it is more of an accident than anything else.) The content-length header is *not* normative and should almost always be ignored. (Of course, this is internally to httpd Yes of course. The point is that content-length *is* set by many handlers, and has to be unset by filters. The second point is that there *are* a bunch of bugs arising from that (e.g. mod_deflate in 2.0.x vs recent fixes in 2.1-HEAD). The KISS principle tells us that simplifying the task of filtering content will reduce the bug count. and brigades. It is not efficient to constantly compute the length as we push data through the filters. No, but it is efficient simply to *unset* the length if we have one or more filter that's going to change it. Likewise, we need to handle byteranges and Warning headers. And unset a Last-Modified header when a filter invalidates it (or make it configurable - c.f. XBitHack). Instead of requiring every filter to worry about that, we let filters simply declare their behaviour. So, if a filter is relying upon the content-length HTTP metadata header and not the brigades it sees, then it's severely broken. Trying to restrict filters to pre-declare what they will do is, IMHO, silly and pointless. I don't see how a solution for pre-declaring the intention of a filter is going to provide any real benefits. Nothing can make use of that knowledge anyway because they have to account for all cases! So, any benefit for corner-case optimization is lost by the increase in complexity just added. No, the whole point is to *reduce* complexity! -- Nick Kew
Re: Ideas for Smart Filtering
On Sun, 1 Aug 2004, Eli Marmor wrote: Great idea, Nick. By the way: Is it possible to integrate it with mod_rewrite, of course after extending mod_rewrite a little? This may save us the need to invent new directives (e.g. FilterProvider, FilterDispatche, etc.). After all, mod_rewrite has a very sophisticated system to define conditions. That's something that's been floating around the back of my mind. I wouldn't want it to be dependent on mod_rewrite. But if dispatch were managed internally from the env table, then mod_rewrite/setenvif could be used to configure it, by those who need that level of flexibility. I haven't figured out how I'd implement that ... -- Nick Kew
Protocol handling review (proxy/etc)
As part of my smart filtering proposal, I'm looking to tidy up protocol handling. We have several loose ends to deal with: * Zero-length responses setting bogus Content-length or unsetting it (e.g. bug 18757, mod_deflate compressing empty bodies). * Failing to respect no-transform in a proxy. No-transform should preclude not only content-transforming filters, but also those like content_length filter that affect headers. * Different paths between cached responses and origin responses. Fixes to the above should not break mod_cache. Is this a real risk? Regarding proxying, several bug reports speak of Windows update not working through an Apache proxy, but offer vague or conflicting diagnoses of the exact cause. Can anyone who uses windows update - or knows how to find it - check whether it sets no-transform? If it does then the breakage is our fault, and the fix should fall out of a general review of this. -- Nick Kew
Re: cvs commit: httpd-2.0/docs/conf httpd-std.conf.in
On Mon, 2 Aug 2004, [ISO-8859-15] Andr Malo wrote: Now we have your additional charsets twice... Erk! So we do. I guess the best fix is just another update to chop the duplicates? -- Nick Kew
AddDefaultCharset and Bug 23421
Our shipping with AddDefaultCharset preconfigured is causing lots of pages to be served with a bogus charset, typically where authors rely on meta http-equiv ... and either don't know how to fix it or lack permission. Bundling AddDefaultCharsets help users fix this. But really we need to do two more things. One is to update the documentataion - perhaps a tutorial on the subject. The other is to turn multiviews on by default, so authors whose sysops stick with defaults and offer no privileges can deal with it without having to hardwire a href=foo.html.gb2312my chinese page/a into their HTML. Does this make sense? Or could we simply drop the AddDefaultCharset from the installation default as suggested by Duerst and others? -- Nick Kew
Re: cvs commit: httpd-2.0/docs/conf httpd-std.conf.in
On Mon, 2 Aug 2004, [ISO-8859-15] André Malo wrote: -# The set below does not map to a specific (iso) standard -# but works on a fairly wide range of browsers. Note that -# capitalization actually matters (it should not, but it ^^^ -# does for some browsers). Aaargh! Charsets are case-insensitive, so why should anyone worry about case in an AddCharset? If some of those were in fact browser bug workarounds, shouldn't they be accompanied by *at least* a URL referencing a report/discussion of the bug concerned? BTW, this is supposed to be working on Bug 23421 - see my other post. BTW2, there seems to have already been a duplicate entry for Big5 :-) -- Nick Kew
Re: AddDefaultCharset and Bug 23421
On Mon, 2 Aug 2004, [ISO-8859-15] Andr Malo wrote: * Nick Kew [EMAIL PROTECTED] wrote: Our shipping with AddDefaultCharset preconfigured is causing lots of pages to be served with a bogus charset, typically where authors rely on meta http-equiv ... and either don't know how to fix it or lack permission. *shrug*, removing AddDefaultCharset creates the same kind of problem, just other way 'round. No, because it enables the de-facto meta http-equiv ... hack. Bundling AddDefaultCharsets help users fix this. What does that mean? It means I mistyped AddCharset. But really we need to do two more things. One is to update the documentataion - perhaps a tutorial on the subject. This doesn't solve the problem, you've described above. If it's a lack of permission, the tutorial won't help. If it's missing knowledge, the brand new tutorial won't even be read (same as documentation before). If there's a way to deal with this without .htaccess, then no permissions is less likely to be an issue. And the more clueful users will find a tutorial. The other is to turn multiviews on by default, so authors whose sysops stick with defaults and offer no privileges can deal with it without having to hardwire a href=foo.html.gb2312my chinese page/a into their HTML. The purpose of the shipped default config is not to administer all the boxes out there. It's just a goodie that you can see, that your apache is running. In theory ... I'm very -1 on turning on MultiViews, since it's very annonying (and expensive) if you don't want it (and you have a lazy sysadmin, lack of permissions like described above). OK, fair point. Actually I think, it would be way better to shorten the default config to something very small, which just shows the indexpage and let the people configure their server themselves. And hey, suddenly the bug reports go to the admins (where they belong) and not to us. Sounds to me like wishful thinking there. We still get expected to sort out bugs arising in rpm, deb, emerge, etc packages that bear little or no resemblence to httpd-std.conf. Does this make sense? Or could we simply drop the AddDefaultCharset from the installation default as suggested by Duerst and others? Pragmatically, I think, let's just drop it and we're fine :) Sounds good to me:-) -- Nick Kew
Re: POST without Content-Length
On Sat, 7 Aug 2004, Justin Erenkrantz wrote: That's a slightly different story. 2.1 has the fix for this (proxy_http.c r1.166), but it never got back ported to 2.0. We have a lot of proxy updates in 2.1, which are presumably getting test-driven over time. How would one go about proposing a wholesale backport? 2.0's STATUS says: * Rewrite how proxy sends its request to allow input bodies to morph the request bodies. Previously, if an input filter changed the request body, the original C-L would be sent which would be incorrect. This is basically the same as an output filter changing the content-length. In the 2.0 architecture, the filter must take responsibility for not sending a bogus length. The only difference is that Connection: close is an option in output. Due to HTTP compliance, we must either send the body T-E: chunked or include a C-L for the request body. Connection: Close is not an option. [jerenkrantz2002/12/08 21:37:27] +1: stoddard, striker, jim -1: brianp (we need a more robust solution than what's in 2.1 right now) jerenkrantz (should be fixed, but I don't have time to do this) At this date (about 20 months later), I have no earthly idea what was wrong. But, I'd suggest trying httpd-2.0 HEAD (aka httpd-2.1) and see if that fixes it. Perhaps someone can remember why I agreed with Brian and what I never fixed... ;-) -- justin Hmmm, did your fix merely chunk content, or compute C-L, or was it smart enough to do the Right Thing according to whether the backend is HTTP/1.1[1], whether the content is short enough to fit in one heap bucket, or whatever other criteria might be applied? [1] Presumably we can only assume HTTP/1.1 backend in a controlled - reverse proxy - case, and where the admin has configured it? -- Nick Kew
Re: POST without Content-Length
On Sat, 7 Aug 2004, Jan Kratochvil wrote: Hi, Thanks for the great support - httpd-2.0 HEAD 2004-08-07 really fixes it. It even provides env variable proxy-sendchunks to select between compatible Content-Length (default) and performance-wise chunked. Sounds pretty complete to me. Of course you'd need to stick to C-L unless you *know* the backend accepts chunks. It occurs to me that a similar situation arises with CGI and chunked input. The CGI spec guarantees a content-length header, so presumably(?) the code for dealing with that is already there somewhere, and will figure in the AP_CHUNKED_DECHUNK option to the old handler-read functions. We have a lot of proxy updates in 2.1, which are presumably getting test-driven over time. How would one go about proposing a wholesale backport? FYI Fedora Core 2 httpd already backports httpd-2.1 version of proxy_http.c although it was not so new snapshot to include resolving of my issues. Current CVS snapshot I Bugzilled them as https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129391 FYI backport of current mod_proxy is technically trivia - just copying raw mod_proxy.c mod_proxy.h proxy_http.c It would also need proxy_util.c. Not sure about ftp or connect. although it brings new domain-remapping functionality there. Indeed. Although the proxy is OK now there still remains one problem: I think HTTP server MUST accept the request: POST ... HTTP/1.0 or HTTP/1.1 [ no Content-Length ] [ no Transfer-Encoding ] Connection: close [ or even no Connection header at all] \r\n DATA according to RFC2616 section 4.4. Even httpd-2.1/CVS just assumes empty body. squid up to squid/2.5.STABLE5 at least responds by 411 Length Required. Surely that only applies if the server can infer there's a request body. How does it do that with neither C-L nor T-E to indicate a body? If we could infer a body in such a case, then AIUI the following applies: If a request contains a message-body and a Content-Length is not given, the server SHOULD respond with 400 (bad request) if it cannot determine the length of the message, or with 411 (length required) if it wishes to insist on receiving a valid Content-Length. Maybe we should infer a body (and hence apply the above logic) in any POST or PUT request? If we do that, it begs the question of how to treat unknown HTTP/extension methods (cf DAV), and suggests perhaps RequireRequestBody should be made a configuration directive. -- Nick Kew
Re: POST without Content-Length
On Sat, 7 Aug 2004, Roy T.Fielding wrote: Thanks for the great support - httpd-2.0 HEAD 2004-08-07 really fixes it. It even provides env variable proxy-sendchunks to select between compatible Content-Length (default) and performance-wise chunked. Sounds pretty complete to me. Of course you'd need to stick to C-L unless you *know* the backend accepts chunks. If the client sent chunks, then it is safe to assume that the proxy can send chunks as well. Generally speaking, user agents only send chunks to applications that they know will accept chunks. The client could be sending chunks precisely because it's designed to work with a proxy that is known to accept them. That doesn't imply any knowledge of the backend(s) proxied, which might be anything up to and including the 'net in general. Also bear in mind that we were discussing (also) the case where the request came with C-L but an input filter invalidated it. -- Nick Kew
Re: POST without Content-Length
On Sat, 7 Aug 2004, Jan Kratochvil wrote: What would happen in this case httpd would infer a body while no body would be found there? Just consider a bog-standard GET. Therefore it should be safe to assume if no Content-Length and no chunked headers are present there MUST follow an optional body with the connection-close afterwards as 'persistent connection' MUST NOT be present. Nope. GET requests routinely have keep-alive, but don't have bodies. -- Nick Kew
Re: POST without Content-Length
On Sun, 8 Aug 2004, [ISO-8859-15] André Malo wrote: A CGI script therefore should never trust Content-Length, but just read stdin until it meets an EOF. That is well-known to fail in CGI. A CGI must use Content-Length. -- Nick Kew
Re: POST without Content-Length
On Sat, 7 Aug 2004, Roy T.Fielding wrote: If the client sent chunks, then it is safe to assume that the proxy can send chunks as well. Generally speaking, user agents only send chunks to applications that they know will accept chunks. The client could be sending chunks precisely because it's designed to work with a proxy that is known to accept them. That doesn't imply any knowledge of the backend(s) proxied, which might be anything up to and including the 'net in general. Theoretically, yes. However, in practice, that is never the case. On the contrary! I myself have done a great deal of work on a proxy for mobile devices, for a household-name Client. The client software makes certain assumptions of the proxy that would not be valid on the Web at large. But the backend *is* the web at large. Also bear in mind that we were discussing (also) the case where the request came with C-L but an input filter invalidated it. I was not discussing that case. The answer to that case is don't do that. Fix the input filter if it is doing something stupid. That was one of the cases that started this thread. I don't have an example of this, but someone did. -- Nick Kew
Proxy Load Balancer
I've just looked at the new code - thanks folks. My own interest in proxying is with HTTP backends, both in forward and reverse contexts, and doing so efficiently. Couple of questions: (1) The proxy balancer directives are implemented in mod_proxy.c, not proxy_balancer.c. Was this necessary? (2) ISTR some discussion of generic connection pooling, but I don't see it in the code. Am I missing something, or is this still TBD? -- Nick Kew
Re: Proxy Load Balancer
On Tue, 17 Aug 2004, Graham Leggett wrote: (1) The proxy balancer directives are implemented in mod_proxy.c, not proxy_balancer.c. Was this necessary? proxy_balancer should in theory provide the algorithm to do the balancing, while the generic directives to specify the members of the cluster could be generically specified. Indeed. But not all of us have a cluster or want clustering code. (2) ISTR some discussion of generic connection pooling, but I don't see it in the code. Am I missing something, or is this still TBD? The connection pool is there, it's implemented using apr_reslist. /me kicks himself for not looking inside proxy_util.c :-( Thanks. -- Nick Kew
Re: mod_deflate and no-gzip
On Wed, 18 Aug 2004, Brian Akins wrote: Shouldn't we still set Vary: Accept-Encoding if no-gzip is set? Hmmm, makes sense. +1 Should we be looking at a wholesale backport from 2.1-head? There are a number of minor bugfixes, as well as the inflate output filter. -- Nick Kew
Re: [PROPOSAL] HTTPD Website Suggestion
On Sun, 22 Aug 2004, Shaun Evans wrote: This isn't about the source code but I thought this could be useful to somebody on here. Have you read the section at the site about contributing? Please find attached a number of files (in tar.gz) that I have made to help improve the Apache HTTP Server website. Please don't do that. A URL for a tar.gz file is much friendlier on peoples inboxes. -- Nick Kew
Re: AddOutputFilterByType oddness
On Tue, 24 Aug 2004, Graham Leggett wrote: I have just set up the most recent httpd v2.0.51-dev tree, and have configured a filter that strips leading whitespace from HTML: AddOutputFilterByType STRIP text/html The content is served by mod_proxy. As it stands, that can't work. It's a manifestation of the problem I'm addressing by reviewing the filter architecture: see http://www.apachetutor.org/dev/smart-filter and the Ideas for smart filtering thread here. I actually have an implementation based on the discussion document and addressing the concerns people raised in the thread. I hope to find time to finish the accompanying documentation and post it here round about this coming weekend. http://httpd.apache.org/docs-2.0/mod/core.html#addoutputfilterbytype it says that filters are not applied by proxied requests (It does not give a reason why not). The URL above makes it clear what's happening there. -- Nick Kew
Re: AddOutputFilterByType oddness
On Tue, 24 Aug 2004, Nick Kew wrote: I actually have an implementation based on the discussion document and addressing the concerns people raised in the thread. I hope to find time to finish the accompanying documentation and post it here round about this coming weekend. OK, since you seem to have a real-life use for it, here goes. As I said before, I wasn't planning to post without a little more testing and accompanying documents and discussion, but what the ? I'm sure I'll regret this premature posting Mini-Synopsis: # 1. Declare a smart filter that dispatches on Content-Type FilterDeclare myfilterContent-Type # 2. Declare your filter as a Provider, to run whenever Content-Type #includes the string text/html FilterProvider myfilterSTRIP $text/html # 3. Set the smart filter chain to this filter where you want to apply it Location scope-of-your-proxy FilterChain =myfilter /Location -- Nick Kew/* Copyright (C) 2004 Nick Kew This is experimental code. It may be copied and used only for evaluation and testing purposes. The copyright holder offers to the Apache Software Foundation permission to re-license this code under the ASF license. This offer applies if and when the ASF accepts this code or any derived work for inclusion in a future release of HTTPD. Regardless of the above, the author undertakes to release the work under a recognised open-source license in due course. Information will be available at http://apache.webthing.com/ and/or http://dev.apache.org/~niq/ */ #include ctype.h #include string.h /* apache */ #include httpd.h #include http_config.h #include http_log.h #include apr_strings.h #include util_filter.h #include apr_hash.h module AP_MODULE_DECLARE_DATA filter_module ; #ifndef NO_PROTOCOL #define PROTO_CHANGE 0x1 #define PROTO_CHANGE_LENGTH 0x2 #define PROTO_NO_BYTERANGE 0x4 #define PROTO_NO_PROXY 0x8 #define PROTO_NO_CACHE 0x10 #define PROTO_TRANSFORM 0x20 #endif typedef apr_status_t (*filter_func_t)(ap_filter_t*, apr_bucket_brigade*) ; typedef struct { const char* name ; filter_func_t func ; void* fctx ; } harness_ctx ; typedef struct mod_filter_provider { enum { STRING_MATCH, STRING_CONTAINS, REGEX_MATCH, INT_EQ, INT_LE, INT_GE, DEFINED } match_type ; union { const char* c ; regex_t* r ; int i ; } match ; ap_filter_rec_t* frec ; struct mod_filter_provider* next ; #ifndef NO_PROTOCOL unsigned int proto_flags ; #endif } mod_filter_provider ; typedef struct { ap_filter_rec_t frec ; enum { REQUEST_HEADERS, RESPONSE_HEADERS, SUBPROCESS_ENV, CONTENT_TYPE } dispatch ; const char* value ; mod_filter_provider* providers ; #ifndef NO_PROTOCOL unsigned int proto_flags ; const char* range ; #endif } mod_filter_rec ; typedef struct mod_filter_chain { const char* fname ; struct mod_filter_chain* next ; } mod_filter_chain ; typedef struct { apr_hash_t* live_filters ; mod_filter_chain* chain ; } mod_filter_cfg ; static int filter_init(ap_filter_t* f) { mod_filter_provider* p ; int err ; harness_ctx* ctx = f-ctx ; mod_filter_cfg* cfg = ap_get_module_config(f-r-per_dir_config, filter_module); mod_filter_rec* filter = apr_hash_get(cfg-live_filters, ctx-name, APR_HASH_KEY_STRING) ; for ( p = filter-providers ; p ; p = p-next ) { if ( p-frec-filter_init_func ) { if ( err = p-frec-filter_init_func(f), err != OK ) { break ; /* if anyone errors out here, so do we */ } } } return err ; } static filter_func_t filter_lookup(request_rec* r, mod_filter_rec* filter) { mod_filter_provider* provider ; const char* str ; const char* cachecontrol ; int match ; unsigned int proto_flags ; /* Check registered providers in order */ for ( provider = filter-providers; provider; provider = provider-next) { match = 1 ; switch ( filter-dispatch ) { case REQUEST_HEADERS: str = apr_table_get(r-headers_in, filter-value) ; break ; case RESPONSE_HEADERS: str = apr_table_get(r-headers_out, filter-value) ; break ; case SUBPROCESS_ENV: str = apr_table_get(r-subprocess_env, filter-value) ; break ; case CONTENT_TYPE: str = r-content_type ; break ; } /* treat nulls so we don't have to check every strcmp individually Not sure if there's anything better to do with them */ if ( str == NULL ) { if ( provider-match_type == DEFINED ) { if ( provider-match.c != NULL ) { match = 0 ; } } } else if ( provider-match.c == NULL ) { match = 0 ; } else { /* Now we have no nulls, so we can do string and regexp matching */ switch ( provider-match_type ) { case STRING_MATCH: if ( strcasecmp(str, provider
Smart filtering Module
I posted my proposed smart filter module a few days ago, in response a post here identifying a situation where it is relevant. I have now completed a first version of the accompanying manual page. I attach: mod_filter.xml mod_filter.xml.meta mod_filter.html Two images used to illustrate the module I've also uploaded the HTML to http://www.apache.org/~niq/ . I believe I have addressed the concerns raised when I mooted the idea of this some weeks ago: * Existing filters are binary-compatible with the new module * I've restored the filter_init handlers to the architecture * I've retained my proposal to enable dealing with aspects of the HTTP protocol on behalf of filter. However, the default is always for the filter harness to do nothing, and leave the filter provider (a content filter module) to deal with that as before. Working code and documentation (modulo bugs and TODOs) should help demonstrate the purpose and utility of the proposal, and move the discussion forward. I'd like to offer this as a contribution to the core httpd distribution, to be included as standard in 2.2. What is currently implemented is the basic architecture as described before. Configuration is fully dynamic, with my proposed set of configuration directives now implemented. Note that the module only applies to output filters and will only work with AP_FTYPE_RESOURCE or CONTENT_SET filters. I don't see a need for this functionality elsewhere (but I'm open to persuasion:-) The main TBD is an ap_filter... API interface for other modules to work actively with it. To implement that, I will need to merge the ap_filter_rec_t structure into the mod_filter_rec. This will be binary back-compatible (the new fields go on to the end of the ap_filter_rec_t), but will of course require commits to code outside the module, specifically util_filter. A second TODO is to enable mod_filter to run as a provider for itself. The purpose of this is to enable chaining of configuration rules beyond what we can already do by setting an environment variable with mod_rewrite and dispatching on an env= variable (example: insert DEFLATE depending on both Accept-Encoding request header and Content-Type response header. mod_rewrite can't do that because it runs too early to be sure to have the response headers). -- Nick Kew?xml version=1.0? !DOCTYPE modulesynopsis SYSTEM ../style/modulesynopsis.dtd ?xml-stylesheet type=text/xsl href=../style/manual.en.xsl? !-- $Revision: 1.18 $ -- !-- Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- modulesynopsis metafile=mod_filter.xml.meta namemod_filter/name descriptionContext-sensitive smart filter configuration module/description statusExtension/status sourcefilemod_filter.c/sourcefile identifierfilter_module/identifier compatibilityApache 2.0 and higher/compatibility summary pThis module enables smart, context-sensitive configuration of output content filters. For example, apache can be configured to process different content-types through different filters, even when the content-type is not known in advance (e.g. in a proxy). /p /summary section id=smarttitleSmart Filtering/title pIn the traditional filtering model, filters are inserted unconditionally using directive module=coreAddOutputFilter/directive and family. Each filter then needs to determine whether to run, and there is little flexibility available for server admins to allow the chain to be configured dynamically./p pmod_filter by contrast gives server administrators a great deal of flexibility in configuring the filter chain. In fact, filters can be inserted based on any Request Header, Response Header or Environment Variable. This generalises the limited flexibility offered by directive module=coreAddOutputFilterByType/directive, and fixes it to work correctly with dynamic content, regardless of the content generator. The ability to dispatch based on Environment Variables offers the full flexibility of configuration with modulemod_rewrite/module to anyone who needs it./p /section section id=termstitleFilter Declarations, Providers and Chains/title img src=oldfilter.gif alt=/ pIn the traditional model, output filters are a simple chain from the content generator (handler) to the client. This works well provided the filter chain can be correctly configured, but presents problems when the filters need to be configured dynamically based
Re: Smart filtering Module
On Sat, 28 Aug 2004, Graham Leggett wrote: Nick Kew wrote: I posted my proposed smart filter module a few days ago, in response a post here identifying a situation where it is relevant. Ideally there should be one way of loading modules, not two - if it is practical to fix AddOutputFilter, then that should be done, otherwise if mod_filter works then I would suggest scrapping AddOutputFilter in favour of the mod_filter module. Thanks for the comment. I agree in principle. In practice, there are a couple of issues to deal with. As it stands, mod_filter is only applicable to content filters, so protocol, connection and network filters have to be dealt with separately. That may be irrelevant: are there real-life examples of non-content filters being configured in httpd.conf? Secondly, removing existing directives will of course break configs for users. Do we want to do that without deprecating them first? Finally, what I've implemented is a 2.0 module. There's a fair bit of integration work to do in util_filter to eliminate duplication of functionality between the old and the new. And I expect breakage while that's work-in-progress. -- Nick Kew
Bug 18388: cookies
Trawling through a few bugs, this one looks valid to me: namely, Set-Cookie headers should be enabled on 304 responses. The current behaviour has a rationale, but I believe it's incorrectly applied. Set-Cookie is a response header and does not affect a cached entity body, so there's no reason to suppress it. The patch is a one-liner. Unless anyone can come up with a reason why it might open a security hole, I'll apply it. -- Nick Kew
Re: Bug 18388: cookies
On Sun, 29 Aug 2004, Jim Jagielski wrote: I myself would define the cookie header as an entity header, since it *is* meta data about the body, but I can also see it as a more traditional response header as well. But wouldn't adding new info about the response (either as a response header or entity header) invalidate it actually *being* 304 (Not Modified)? Would it? A cookie is not data about the body. The nearest analogy amongst headers explicitly discussed in rfc2616 is authentication, and the relevnt authentication headers *are* returned with a 304. So are Content-Location, ETag and Vary: surely headers that would invalidate the 304-ness if they were to change between requests? Perhaps a better approach to 304 headers would be to explicitly exclude entity headers as enumerated in rfc2616, rather than explicitly include non-entity headers? That means the default for proprietary extensions (which HTTP explicitly permits) becomes to allow them in a 304. -- Nick Kew
Re: FYI: bug statistics httpd-1.3/httpd-2.0
On Mon, 30 Aug 2004, Erik Abele wrote: I'm building some simple (but nice) statistics based on the weekly bug reports mailed out to the dev- and bug-list: http://www.apache.org/~erikabele/httpd/bugstats/ The stats are updated every sunday just after the reports are mailed out. The script which produces the PNGs is also available at the above URL. We've just been discussing this on IRC. We have a gradual but inexorable accumulation of bugs. Some of them have become kind-of permanent fixtures, perhaps because noone can reproduce them or noone feels willingable to deal with them. Rather a lot of them are simply too vague to deal with. Perhaps what we need is more clearly defined responsibilities in dealing with them. Each component has a bugmaster tasked with dealing with bugs attached to the component. That could mean fixing it, discussing it on this list (including asking for someone else to fix it), and crucially also taking responsibility for closing bugs with INVALID, WONTFIX or WORKSFORME where no satisfactory resolution is/seems feasible. -- Nick Kew
Smart filtering and mod_filter
I've made some further updates both to the code and documentation since posting. One change is to support inserting a harness anywhere in the filter chain. This addresses the point Graham raised about having two separate mechanisms: it means the old mechanism can be entirely replaced (provided of course I provide easy-reading howto upgrade documentation). I would find this easier if it were under CVS, and I'd like to put it under httpd CURRENT, at modules/experimental/mod_filter.c (plus the corresponding documentation pages, of course). That should help with integration ahead of 2.2, including test-driving with existing filter modules, and will make it easier to coordinate the API updates in util_filter. Is this something that wants a vote? Anyone else have strong feelings for or against putting mod_filter under CVS? -- Nick Kew
Re: Bug 18388: cookies
On Mon, 30 Aug 2004, Geoffrey Young wrote: [replying to my words - largely chopped] Perhaps a better approach to 304 headers would be to explicitly exclude entity headers as enumerated in rfc2616, rather than explicitly include non-entity headers? That means the default for proprietary extensions (which HTTP explicitly permits) becomes to allow them in a 304. fwiw, this was discussed a few times in the archives. the one that comes to mind for me is this from doug: http://marc.theaimsgroup.com/?l=apache-httpd-devm=99298523417784w=2 That thread seems to be the same basic issue, but with reference to RFC1945. 2616 includes additional explanation, and seems more clearly to support the view that not only cookies but arbitrary unknown headers (if any) should be allowed. In the bug report 18388, Ryan J Eberhard wrote: It is also important to note that all other major web servers (IIS, iPlanet, and Domino) will return Set-Cookie headers on a 304 status. I'm in no position to confirm or deny that, but it tends to support the proposition, and suggest that if it caused trouble in the Real World then we could expect to know about it. personally, I tend to see it more from doug and nick's perspective and would be inclined to fix a long-standing issue that never made sense to me, but roy wrote the book and has unique insight here, so... Hmm. Would proposing it in STATUS for a vote be appropriate here? I think if anyone wants to veto it, we should have a reason that addresses Doug's and Ryan's arguments on the record. -- Nick Kew
Re: cvs commit: httpd-2.0/server util.c
On Wed, 1 Sep 2004, Jeff Trawick wrote: I can't see how this ever worked before :( Any comments from the crowd? FWIW, I fised that one in the proxy context about two months ago. But I haven't looked at it in the general case. -- Nick Kew
Re: cvs commit: httpd-2.0/server util.c
On Wed, 1 Sep 2004, Jeff Trawick wrote: On Wed, 1 Sep 2004 20:36:07 +0100 (BST), Nick Kew [EMAIL PROTECTED] wrote: FWIW, I fised that one in the proxy context about two months ago. But I haven't looked at it in the general case. was that this change entry? *) mod_proxy: multiple bugfixes, principally support cookies in ProxyPassReverse, and don't canonicalise URL passed to backend. Documentation correspondingly updated. [Nick Kew nick webthing.com] Yes, that sounds right. Though I think the CHANGES entry may have lagged the actual update. A quick look at CVS shows a datestamp of Tue Jun 29 06:37:21 2004 UTC -- Nick Kew
Re: a simple question
On Thu, 2 Sep 2004, Manos Moschous wrote: (a dumb subject line) I have a file opened FILE *fcp; fcp = fopen(file_to_save, wb); using the apr_ file API is preferred. //I want to save the data to the file //How can i do that The tmpfile_filter in mod_upload does that. Feel free to look at the source. -- Nick Kew
Re: Time for 2.0.51 and 2.1.0
On Thu, 2 Sep 2004, Henri Gomez wrote: Bad news for me and many others since without AJP support included in 2.0.x, users will still require to have mod_jk to link there HTTPD to Tomcats. Could we hope the dev team to relax the situation for mod_proxy/ajp in future 2.0.x release, since Graham, Mladen and Jean-Frederic works hard to make mod_proxy as stable as possible even now with AJP support ? Have you tried running the new proxy code with 2.0.x? It worked fine last time I tested it seriously (following my updates to mod_proxy at the end of June). That way you're testing the new code in a stable harness. -- Nick Kew
Re: Removing the Experimental MPMs in 2.2?
On Thu, 2 Sep 2004, Paul Querna wrote: Any other opinions about not including these MPMs? Basically agree. But modules are on a sliding scale between fully-working and broken. We have modules/experimental that includes pre-stable stuff that may or may not get fixed within a reasonable timescale: what should their status be? I wouldn't suggest removing them, but perhaps we could flash up a prominent WARNING when you configure/build them? -- Nick Kew
Re: HTTP proxy working for folks on 2.1-dev?
On Thu, 9 Sep 2004, Mladen Turk wrote: Q: Is it possible to have forward and reverse proxies mixed together on the same box? Of course! I have that defined in different virtual hosts, but AFIACS it should also work fine simply using Location for the reverse proxies and Proxy for the forward. -- Nick Kew
ap_log_perror behaviour and LogLevel?
This has been nagging me for a while, first with reference to mod_diagnostics, and now with mod_filter. log_error_core takes a server_rec argument. If that argument is NULL, it will return without logging anything unless loglevel is APLOG_NOTICE or greater than ap_default_loglevel. ap_log_perror calls log_error_core with server==NULL, so verbose LogLevels fail. Later in log_error_core is another test: if ((level APLOG_STARTUP) != APLOG_STARTUP) { ... } which looks more appropriate. Is there a reason for this behavoiur? I'd like to be able to use ap_log_perror with LogLevel debug or info. -- Nick Kew
Re: Smart filtering Module
OK, following on from a couple of weeks ago, I've committed mod_filter to cvs. That includes mod_filter.c and relevant documentation, which are more-or-less in sync. Please review. Assuming the work gets and survives wider review, the next stage in this work is closer integration with util_filter (protocol should be configured in ap_ calls rather than httpd.conf), and to investigate whether it can be harnessed to fix architectural bugs like PR#17629. -- Nick Kew
Re: Smart filtering Module
On Sat, 11 Sep 2004, NormW wrote: ### mwccnlm Compiler: #File: mod_filter.c # - # 118: { apr_bucket_type_mmap, MMAP } , # Error: # undefined identifier 'apr_bucket_type_mmap' Can this be bracketed with #if APR_HAS_MMAP or is MMAP 'manatory'? Thanks for the feedback. Yes of course, I've just patched that (and grepped apr_buckets.h for any other APR_HAS_* that might bite on some other platform). The whole function it's in exists purely to support reporting bucket types for the FilterDebug option. -- Nick Kew
Re: Smart filtering Module
On Sat, 11 Sep 2004, NormW wrote: Good evening still :-) Got your update and it all now glues together nicely. I assume you will wait for a 'vote' to commit a hooked-in build file for mod_filter? Well, I'm more waiting for more feedback. So far I got quite a lot of comments when I first floated the concept, less when I posted a first- pass implementation, and only yours on introducing it to CVS. Maybe what it needs now is an updated roadmap to stimulate discussion? As for a hooked-in build file, I have yet to RTFM for what that involves. -- Nick Kew
Re: Bundling APR in 2.2
On Thu, 16 Sep 2004, Paul Querna wrote: In most of the Apache 2.0.XX releases, we have been using a CVS snapshot of APR and APR-Util. I would like to make it an official policy that for the 2.2 cycle, we will never use a CVS snapshot of APR. That makes httpd releases (relatively frequent) hostage to APR releases (extremely infrequent) when we need a bugfix in CVS. Is that acceptable? I believe we should still bundle APR and APR-Util with HTTPd, but we should only use the released versions of each. Release version ABI yes. Release version - only if that dependency can be fixed (i.e. APR folks can be hurried along where necessary). It will also make life much easier for System Packagers. If we only use released versions, APR and APR-Util can be easily placed into separate packages. This will become more important as more standalone applications use APR. Keeping binary-compatibility (ABI) is sufficient for that, innit? -- Nick Kew
Re: Shorten the default config and the distribution (was: IfModule in the Default Config)
On Tue, 14 Sep 2004, [ISO-8859-15] Andr Malo wrote: * Paul Querna [EMAIL PROTECTED] wrote: (chop) Using the Source File name seems completely non-intuitive to me. Agreed. I'm rather for removing the whole crap from the default config and simplifiy as much as possible. I'd be cautious about that. The default httpd.conf contains a fair chunk of documentation that isn't available elsewhere. We need to work carefully on making sure this isn't lost. A 30 KB default config, which nobody outside this circle here really understands, isn't helpful - especially for beginners. I disagree. Think about a situation where you're on the learningcurve for working with a big package. A big and well-commented config file is the most useful thing available. I'm thinking of compiling kernels, and contrasting Linux (where make menuconfig is very nice but hides what's really happening) with FreeBSD, where keeping the LINT config open in another window while editing my config is the absolute best documentation I could wish. If the default is shortened, we should package a long and highly-commented file in the manner of LINT. It would be nice also to integrate the documentation in httpd.conf into the main docs as and when round tuits can be sourced. In the same cycle we could remove the docs from the default distribution and start distributing them officially as separate packages. (But we could distribute a separate config snippet for the multilingual docs, which can be included in the httpd.conf). The more translations we add, the less applicable is it to include the whole doc tree. Hmmm, does that risk generating a higher volume of dumb-newbie questions in all the public fora? And perhaps also apache-is-hard articles in the press? -- Nick Kew
Re: AddOutputFilterByType oddness
On Sat, 18 Sep 2004, Justin Erenkrantz wrote: But ap_add_output_filters_by_type() explicitly does nothing for a proxied request. Anyone know why? AddOutputFilterByType DEFLATE text/plain text/html seems to work as expected here for a forward proxy with this applied: maybe I'm missing something fundamental... My recollection is initially it didn't have the proxy check, then FirstBill had a reason why proxied requests shouldn't work with AddOutputFilterByType. I've said it before and I'll say it again: AddOutputFilterByType is fundamentally unsatisfactory. This confusion is an effect, not cause. * Configuration is inconsistent with other filter directives. The relationship with [Set|Add|Remove]OutputFilter is utterly unintuitive and, from a user POV, broken. * Tying it to ap_set_content_type is, to say the least, hairy. IMO we shouldn't *require* modules to call this, and it's utterly unreasonable to expect that it will never be called more than once for a request, given the number of modules that might take an interest. Especially when subrequests and internal redirects may be involved. * It's a complexity just waiting for modules to break on it. I've made some more updates to mod_filter since I last posted on the subject, and I'm getting some very positive feedback from real users. For 2.2 I'd like to remove AddOutputFilterByType entirely, replacing it with mod_filter. mod_filter can also obsolete [Set|Add|Remove]OutputFilter, though I'm in no hurry to do that. What I can also do is re-implement all the outputfilter directives within mod_filter and its updated framework. -- Nick Kew
Reviewing the Filtering API
The 2.0 filter chain is a great tool: for me it's _the_ major innovation that turns httpd-2.0 from a (mere) webserver to a powerful applications platform. But extensive working with it highlights weaknesses. The introduction of AddOutputFilterByType sought to address one of the weaknesses, but it's a bolt-on that doesn't really fit, and is problematic. And even if fully successful, it's limited. As you know, I'm proposing a new filtering framework, and have implemented (modulo bugs) the main functionality in mod_filter. Until yesterday, this was implemented purely as a module, suitable for use with httpd-2.0 and its filters. That meant some inevitable duplication of data structures and inefficiency. It now has several users running the module with 2.0, and I propose to maintain a version that can be used with 2.0 without patching or recompiling anything. But the main thrust is towards tighter integration for 2.2. Yesterday I made the first move towards integration, by merging the most important data structs with util_filter and adding a couple of new API calls (on which more below). I got some useful feedback on this list when I first mooted the idea that is now mod_filter, and more recently from users of mod_filter (my filter_init is badly broken - fix to come). But I'd like to broaden that into a wider review of filtering. *** A few issues with util_filter in 2.0: ap_filter_type == Making this an enum and then using values like AP_FTYPE_[anything] + 5 (as is done in, for example, mod_ssl) makes no sense. An int with a set of #defined values makes more sense. ap_filter_t === This inclues both request_rec and conn_rec fields, but the request_rec is invalid in content-level filters, while the conn_rec is of course available from the request_rec where valid. So, shouldn't that be a union? Documentation = I recently fixed PR:19688, but there are other less critical issues outstanding, such as * @param ftype The type of filter function, either ::AP_FTYPE_CONTENT or * ::AP_FTYPE_CONNECTION * Simplifying Filtering Yesterday I introduced two new API functions in util_filter: ap_register_output_filter_protocol ap_filter_protocol together with a set of associated #defines The first function is ap_register_output_filter with an additional argument proto_flags. The second sets proto_flags during a request. The purpose of these is to enable filters to 'opt out' of concerning themselves with the lower-level details of supporting HTTP. Example: mod_include mod_include is a typical output content filter, in that it changes the data passing through, including changing the byte count. It's almost certainly the most widely known and used such filter. As it stands, it correctly unsets content length, and it deals with cacheing/Last-Modified in its own way based on configuration (XBitHack). But it also has some bugs: for example: * if a Content-MD5 is set, it doesn't unset it. Likewise an ETag. * it won't work correctly if served partial contents, but it does nothing to prevent that happening (vide discussion on handling ranges a couple of months ago). For mod_include to deal fully with these is a significant burden on the modules authors. The new API calls offers mod_include the opportunity to be simplified at the same time as fixing edge-case bugs such as those I've discussed. A simple way is to replace the existing ap_register_output_filter(INCLUDES, includes_filter, includes_setup, AP_FTYPE_RESOURCE); with the new variant ap_register_output_filter_protocol(INCLUDES, includes_filter, includes_setup, AP_FTYPE_RESOURCE, AP_FILTER_PROTO_CHANGE | AP_FILTER_PROTO_CHANGE_LENGTH | AP_FILTER_PROTO_NO_BYTERANGE ); This causes mod_filter to unset all headers that are invalidated by the module's content transformation, and prevent it getting byteranges from the backend. With this, mod_include still has to process XBitHack and cacheing headers itself - these are very specific to SSI and don't generalise to other filters - but mod_filter does everything else. As with any other filter, mod_include will run unchanged within the new framework by simply ignoring the additional API calls. I need review on this, and I need to fix my existing code. But looking ahead, any problems with a wider-ranging review of util-filter, including but not limited to fixing the problems identified above? -- Nick Kew
Re: AddOutputFilterByType oddness
On Wed, 22 Sep 2004, Justin Erenkrantz wrote: --On Wednesday, September 22, 2004 5:01 PM +0100 Nick Kew [EMAIL PROTECTED] wrote: I've said it before and I'll say it again: AddOutputFilterByType is fundamentally unsatisfactory. This confusion is an effect, not cause. Suffice to say, I disagree. * Configuration is inconsistent with other filter directives. The relationship with [Set|Add|Remove]OutputFilter is utterly unintuitive and, from a user POV, broken. I think it's really clear from the user's perspective. I think the problem comes in on the developer's side. It seems to me heavily counterintuitive that mixing ByType directives with anything else means that the ByType filters *always* come last. And that Remove won't affect them, but will affect others. * Tying it to ap_set_content_type is, to say the least, hairy. IMO we shouldn't *require* modules to call this, and it's utterly unreasonable to expect that it will never be called more than once for a request, given the number of modules that might take an interest. Especially when subrequests and internal redirects may be involved. We have *always* mandated that ap_set_content_type() should be called rather than setting r-content_type. (I wish we could remove content_type from request_rec instead.) Indeed. But that doesn't prevent it being called multiple times, perhaps from different modules. So using it to insert filters leaves lots of potantial for trouble. * It's a complexity just waiting for modules to break on it. Anything that depends upon content-type like this is going to be hairy because there may be several 'right' answers during the course of the request. Indeed. mod_filter addresses this by configuring at the last moment, so any earlier set_content_type()s are irrelevant. I don't suppose it's a panacaea for everything, but I do think it's a significant improvement on what we have. I've made some more updates to mod_filter since I last posted on the subject, and I'm getting some very positive feedback from real users. For 2.2 I'd like to remove AddOutputFilterByType entirely, replacing it with mod_filter. I've yet to see a clear and concise statement as to how mod_filter will solve this problem in a better and more efficient way. (Especially from a user's perspective, but also from a developer's perspective.) From the user's perspective, it's simply more powerful and flexible. Works with any request or response headers (not just content-type) or environment variables. Gets rid of constraints on ordering, like AddOutputFilterbyType filter always coming after other filters regardless of ordering in httpd.conf. Example: I have a user who wants to insert mod_deflate in a reverse proxy, but only for selected content-types AND not if the content length is below a threshold. How would he do that with the old filter framework? From a developers perspective, I wrote it for myself, and have at least two other developers using it operationally in their product. Time will tell what others may use it for. I will also comment that I looked in the mod_filter code the other day and was disappointed that it doesn't follow our coding style at all or even have comments that help people understand what it is trying to do inside the .c file. When was that? I made quite a lot of updates to the style towards conforming (like eliminating tabs and realigning some braces) before committing to CVS, but I'm willing to believe I need to look more carefully. -- Nick Kew
Re: AddOutputFilterByType oddness
On Wed, 22 Sep 2004, Justin Erenkrantz wrote: --On Wednesday, September 22, 2004 6:17 PM +0100 Nick Kew [EMAIL PROTECTED] wrote: It seems to me heavily counterintuitive that mixing ByType directives with anything else means that the ByType filters *always* come last. And that Remove won't affect them, but will affect others. I think we could get Remove*Filter to also delete the content-type filters. Indeed. mod_filter addresses this by configuring at the last moment, so any earlier set_content_type()s are irrelevant. I don't suppose it's a panacaea for everything, but I do think it's a significant improvement on what we have. I'm concerned about the overhead of mod_filter having to check all of its rules each time a filter is invoked. This is why I started to look through the code last night to see how it worked and how invasive it is. It's improving with time (except when I introduce bugs...). Merging in the structs with util_filter saves on having to do superfluous lookups. Basically it does the lookup/dispatch once per filter in the filterchain per request. It checks that filter's providers until it finds a match. So for anything you could do with an [Add|Set]OutputFilter[ByType] that's one lookup per request. How would you handle the situation when filter #1 sets C-T to be text/plain and then filter #2 sets C-T to be text/html? mod_filter takes the content-type as it is at that point in the chain. Isn't the real nightmare where a filter calls ap_set_content_type and some AddOutputFilterByTypes are in effect? I guess what *really* bothers me is the idea of adding filters *as a side-effect*. And, then mod_deflate needs to be conditionally added (sub-case #1: it needs to be added for 'text/plain'; sub-case #2: it needs to be added for 'text/html'). How and where is it added? Are you inserting dummy filters? I'm not sure I follow. It will dispatch to deflate based on the content-type (or other dispatch criterion) as it is at that point in the chain. So if the handler sets application/xml but that goes through an XSLT filter which sets it to text/html, then mod_filter sees application/xml if it's before the XSLT filter in the chain, or text/html after it. How can AddOutputFilterByType expect to cope with that? From the user's perspective, it's simply more powerful and flexible. Works with any request or response headers (not just content-type) or environment variables. Gets rid of constraints on ordering, like AddOutputFilterbyType filter always coming after other filters regardless of ordering in httpd.conf. Example: I have a user who wants to insert mod_deflate in a reverse proxy, but only for selected content-types AND not if the content length is below a threshold. How would he do that with the old filter framework? I guess I'm not clear what the syntax is (I guess I should go read the docs). That particular scenario is complex, and requires mod_filter to be used as its own provider. The point is, we *can* now support complex setups (or will be - that chaining is still broken in CVS). But FWIW I have that working locally with FilterDeclare filter1 Content-TypeCONTENT_SET FilterDeclare filter2 Content-Length CONTENT_SET FilterProvider filter1 filter2 $text FilterProvider filter2 DEFLATE 4000 FilterChain filter1 to deflate all text/* documents of 4k or greater. I definitely don't want to see the filters be configured like mod_rewrite. It needs to be fairly straightforward, but still fairly simplistic. I don't want to have users have to read a complicated manual or docs to set up filters. KISS. Indeed. Do you think the examples in the manual page are too complex? Bear in mind that the third example is no more complex than the first two, yet suddenly enables a frequently-requested capability that simply isn't possible with the old filtering. Well, the point by you committing it into our tree is that the rest of us are now responsible for it. That's why I brought up the code style issue: OK, OKOK! I promise to look harder at the code style guidelines! And I _did_ ask on the list a couple of weeks before introducing to CVS. I looked yesterday afternoon (and haven't seen any commits since then). I That'll be the latest version. Which FWIW was introduced prematurely because it introduced a new feature demanded by a user. Only that turned out to be broken, which is why I'm re-hacking that now. -- Nick Kew
Re: Bug 17629: SSI, CGI, and mod_deflate
On Mon, 11 Oct 2004, [ISO-8859-15] Andr Malo wrote: It seems that calling an internal redirect from anywhere in an output filter is completely wrong. Nope. The real problem is that it's a *redirect within a subrequest*. Erm - it seems to me you're both right. Surely the underlying problem - of which both the above are instances - is an internal redirect too late in the request processing cycle. An internal redirect after anything could possibly have been sent down the [output filter chain|wire] is broken. The filterchain suddenly gets disconnected. What we need is kind of a glue filter which connects a subrequest (at connection level (a subrequest doesn't own a connection)) with the main one. I'm struggling with how that should work, within the constraints of the architecture we have. I actually raised the question with Paul on IRC in the hope that a solution would fall straight out of his Capturing a Subrequest. But it seems we're all stuck on partial insights. I'm provisionally +1 on Paul's proposed fix, but I wonder if it should be conditional on ap_is_initial_req, to leave untouched the 'normal' CGI case. -- Nick Kew
Re: [RFC] Patch for mod_log_config to allow conditioning on status code
On Fri, 15 Oct 2004, Luc Pardon wrote: I patched mod_log_config.c (from the 2.0.51 distro) to allow conditional logging on HTTP status code, like so: CustomLog king-size.log common status=414 The patch also supports not and lists (like the %.. syntax) and wildcards, e.g.: CustomLog ungood.log common status=!20x,3xx The changes are non-intrusive and the patch is of course backward compatible. Sounds somewhat interesting, and (as you note) there's quite a lot of demand from people who don't like 414 crap in their logs. So that's a good start. But how does it work with piped log programs? If I were implementing this functionality, I'd probably hack rotatelogs rather than httpd. I already patched the docs and am willing to go the extra mile(s) to make it all nice, but the guidelines for contributing a patch say you're a conservative lot when it comes to new functionality. Indeed, that's true. But that's very minor functionality and clearly tied in to an established core module, so unlikely to fall down on that. The usual fate of patches in bugzilla is that, even if they are appropriate for inclusion, they need a committer to take sufficient interest to review and incorporate them. A chronic shortage of round tuits means this is rather hit-and-miss. One more thing: I became aware that the flexible interface for mod_log_config patch (# 25014) also allows conditioning on status code(s), and there are three other contributed patches against mod_log_config waiting for a decision (# 28037, 29449 and 31311). I am willing to ensure compatibility with any or all of them if desired. If you can fix a whole bunch of related bugs on bugzilla without your patch becoming big and complex, that adds value but still doesn't guarantee anything. Ask yourself: is your code sufficiently different to anything we already have to merit releasing separately as a third-party module? If yes, then do that. If no, then it's probably appropriate to offer a patch. My guess would be no. -- Nick Kew
Re: [RFC] Patch for mod_log_config to allow conditioning on status code
On Sat, 16 Oct 2004, Glenn Strauss wrote: I don't want to discourage Luc, but there's a steep uphill battle to getting anything into Apache 1.3. Of course. Apache 1.3 is an old, legacy application, and vastly less capable than current versions. It's still maintained, but noone is in the business of adding new *features*. 2.1 is where interesting things happen, while 2.0 is intermediate: new features may be added, but stability and binary-compatibility are more important. I might review and incorporate a third-party patch for 2.x, but certainly wouldn't for 1.x unless someone was paying. diff -ruN apache_1.3.31/src/main/http_log.c apache_1.3.31-new/src/main/http_log.c --- apache_1.3.31/src/main/http_log.c 2004-02-16 17:29:33.0 -0500 +++ apache_1.3.31-new/src/main/http_log.c 2004-05-24 12:26:06.0 -0400 Bugzilla is a good place for patches like that. People who want it can help themselves, without compromising stability. -- Nick Kew
Re:[Bug 31759] - default handler returns output filter apr_status_t value
On 12 Sep 2006, at 22:27, [EMAIL PROTECTED] wrote: --- Additional Comments From [EMAIL PROTECTED] 2006-09-12 21:27 --- The PUT handler is a small 10 line script. It absolutely doesn't return a code 70007 or anything other than 0 no matter how it finishes. This is not resolved nor fixed. The bug is fixed, because it refers explicitly to the default handler. However, mod_cgi at line 840 and mod_cgid at line 1390 have the same issue when the input filters return an error. I think the easy fix is to return 500 there, unless we can blame the client and return 400. -- Nick Kew
Re: svn commit: r442758 - in /httpd/httpd/trunk/modules/generators: mod_cgi.c mod_cgid.c
On Wednesday 13 September 2006 20:33, Ruediger Pluem wrote: Wouldn't it make sense to return OK even if rv != APR_SUCCESS in the case that c-aborted is set, just like in the default handler? I'm not sure. Presumably if c-aborted is set, then we have no client to respond to, so this is just about housekeeping and what ends up in the logs. Do we want to log a successful POST or PUT when it wasn't? -- Nick Kew
Re: svn commit: r442758 - in /httpd/httpd/trunk/modules/generators: mod_cgi.c mod_cgid.c
On Wednesday 13 September 2006 22:31, Jeff Trawick wrote: On 9/13/06, Nick Kew [EMAIL PROTECTED] wrote: On Wednesday 13 September 2006 20:33, Ruediger Pluem wrote: Wouldn't it make sense to return OK even if rv != APR_SUCCESS in the case that c-aborted is set, just like in the default handler? I'm not sure. Presumably if c-aborted is set, then we have no client to respond to, so this is just about housekeeping and what ends up in the logs. Do we want to log a successful POST or PUT when it wasn't? Here is my understanding: The connection status (%c) is what the admin should check to confirm that there were no network I/O issues (at least none that caused TCP to give us an error up through the point when the request was complete). In many cases, an HTTP status code has already been written to the client before the I/O problem occurs anyway so changing the status code doesn't make sense. A failure to read a request body would be prior to the point where we could write a status code, but I don't see why the log analysis heuristic should be different. So we should log an error, not a success. 500 won't always be the ideal error, but I don't really see how we can do better within the current API. 500 implies that there could be an action to take to resolve a problem (e.g., screwy filters bungled the return codes; screwy configuration; out of memory; ???). It doesn't apply when somebody bored with an upload hit the Stop button. So are you supporting Rüdiger's proposition? I can accept that if it's the popular view. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.prenhallprofessional.com/title/0132409674
Widespread confusion of apr_status_t and int values
PR#31579 identified a bug with the default handler returning apr_status_t values. That was fixed. But we have people reporting that the bug is not fixed. What they're seeing is the same bug elsewhere. I just hacked up a fix in mod_cgi and mod_cgid, which we've been discussing here over the last couple of days. It's also in mod_proxy (specifically, proxy_http - I didn't look elsewhere), in both 2.0.x and trunk. I wouldn't be at all surprised to find it in other content generators. I'm wondering if this would be working around one level up in the core. If a handler returns a value that's not OK/DECLINED and is out of range for HTTP, then return 500 to the client, and log a buggy content generator message to error_log. Thoughts? -- Nick Kew