Re: filtering huge request bodies (like 650MB files)

2003-12-10 Thread Nick Kew
On Wed, 10 Dec 2003, William A. Rowe, Jr. wrote:

 Is it Chrises' own filter or one of ours?  whichever it is, it would be nice to
 get this fixed.

Can I suggest Chris insert mod_diagnostics at different points in his
chain to identify exactly where it's buffering (if indeed that's where
his memory is going)?

I had a very similar situation to this, when a bug in a third-party
library caused it to buffer everything in my filter.  mod_diagnostics
rapidly tracked that down for a x300 performance improvement.

-- 
Nick Kew




Re: Thread terminatinos ?

2003-12-11 Thread Nick Kew

On Thu, 11 Dec 2003, Nasko wrote:

(can you please fix your mailer to post text and use a sensible
line length?)

 Hello everyone.
 I have Apache 2 module running under Windows.In each Apache's
  thread I have a separate ODBC connection to a database.

Why every thread?  Wouldn't it be more efficient to share a connection
pool between your threads?  You might possibly want to look at mod_pg_pool
(at http://apache.webthing.com/ ) as a template for that.

 My questions are :

Can't answer them.

ICBW, but AIUI threads in your MPM are merely an implementation of
Apache's abstract architecture, so if you tie something to them,
you're more-or-less fighting against the architecture.  That means
your module is likely to be non-portable and at risk of breaking on
future Apache updates.

-- 
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html




Re: Save brigade and buckets

2004-01-07 Thread Nick Kew
On Wed, 7 Jan 2004, Brian Akins wrote:

 This may not be apache-dev related, but I do not know where else to ask it.

apache-modules maybe?

 Is it possible to save an entire bucket bridade (including it's buckets)
 across requests.  I looked at  ap_save_brigade, but I'm sure that will
 work.  It seems that the brigades are always tied to a connection.

ICBW here, but ...

Brigades are created on a pool.  When the pool dies, so does the
brigade.  Most brigades are created on the Request or Connection
pool, so die with the request or connection.

ap_save_brigade lets you save a brigade into another brigade.
To make that work across requests, you should be able to save
to a brigade on the server pool.

However, that still doesn't help if you want the saved brigade to
be seen by a subsequent request, because that'll be handled by
a random server, likely not the one whose pool you used.
Your best bet may be to try and use a persistent connection,
and handle breaking the connection as an error.

-- 
Nick Kew




Re: Philosphical help - module or CGI

2004-01-09 Thread Nick Kew
On Fri, 9 Jan 2004, Kean Johnston wrote:

 Good morning all,

 For a (private) project I am working on, I would appreciate a little
 advice. The system produces mostly dynamic content, very little static
 stuff. A lot of the data that will be served up by httpd comes from a
 daemon running on the same host, and all of the required data is in a
 shared memory segment.

SHared memory is not easy with Apache.  If you implement a pool /
pointers in shared memory, then you're significantly advancing it.
If you need shared memory pointers, you might be better-off writing
a separate daemon, and connecting to your module.

 I can either produce the content with a CGI that attaches to the shared
 segment, gets the data and renders it in HTML,

I'd suggest a persistent daemon.  Perhaps prototype it as a single
program, then separate off the shm via RPC.  Of course if your shm
use doesn't involve pointers then it's all much simpler.

 frequently and almost always in core. Of course, I can hack the web
 server to my hearts content, and possibly even have the main httpd
 create the semaphores and shared memory segments, so that when it
 preforks, all of the children already have all of those set up and
 simply need to use the semaphore for the read lock, render the data from
 the shared segment and be done with it.

Take a look at the DB Pool modules at apache.webthing.com.
A similar approach might be what you need.

 Do you think it would be overkill to write this as a module, or would
 the simplicity gained by writing a normal CGI be worth it? I've not
 written an Apache module before, so it would be a bit of a learning
 curve, but a worthwhile one I think.


I agree with that.  Once you're up the curve it becomes just as
simple as CGI, and gives you more flexibility and modularity.

 I intend to implement this using httpd 2.0, if that makes any difference.

Yes, that's a far more powerful development platform than 1.x.

-- 
Nick Kew



Re: ReplaceModule directive!?

2004-01-16 Thread Nick Kew
On Fri, 16 Jan 2004, Lars Eilebrecht wrote:

 According to Gerardo Reynaga:

  Is there a way to pass directives to httpd
  once the server is running?

 How about using a graceful restart?
 Would that be feasible in your case?

Graceful restart (along with HUP restart and even stop) fails horribly
when an installed module has been updated.  We could do with a mechanism
for that, and if Gerardo is going to implement it then great (though
I can't help thinking it ought to be simpler: an unload-all-modules
thing).

Perhaps this wants a bug report.

-- 
Nick Kew



Re: Capabilities to provide UDP services with Apache

2004-01-28 Thread Nick Kew
On Wed, 28 Jan 2004, Matthew Gress wrote:

 In any case, I have not found a reference to how to configure apache to
 do this and need to know where I should start to create or adapt for
 this functionality.

I think you just have to write it.  The nearest thing Apache has to
a utility library is the APR.

 Another question I have is, can we create a module that services UDP
 connections without
 hitting the cental apache server code.

Are you sure this prospective module wants to live within Apache?
What are you expecting this to gain for you over (say) an RPC-based
daemon sitting alongside Apache?

-- 
Nick Kew


Re: Help in Writing Apache Modules in C

2004-01-28 Thread Nick Kew
On Wed, 28 Jan 2004, Will Lowe wrote:

 If you're looking for Apache 1.x (not 2.x)

Given that he's writing C, and that 2.x is a vastly richer development
environment than 1.3, why should he even consider that?

In 1.3 days, application developers had to resort to all kinds of
add-ons, none of which are in C.  If it had had the power of the
2.0 API, we'd probably still have CGI and PHP at the bottom end,
of add-ons, but the need for backends like Tomcat would probably
never have been felt.

  Awaiting for a Helping Hand.

Suggest looking at existing modules, and reading the nicely-documented
Apache header files.

-- 
Nick Kew


Re: mod_gcj project at sourceforge

2004-03-19 Thread Nick Kew
On Sat, 20 Mar 2004, Hannes Wallnoefer wrote:

 Hi there,

 just wanted to drop a note that I've started a sourceforge project to
 create an module to run natively compiled Java inside Apache using the
 Gnu compiler for Java (GCJ).

Erm, why does that need a module?  Surely all it needs is to deal with
the linkage from C.  I've done that experimentally, building the W3C
CSS validator with gcj, but considered this just too huge and unwieldy
to contemplate for operational use.

-- 
Nick Kew


Re: mod_gcj project at sourceforge

2004-03-20 Thread Nick Kew
On Sat, 20 Mar 2004, Hannes Wallnoefer wrote:

 just wanted to drop a note that I've started a sourceforge project to
 create an module to run natively compiled Java inside Apache using the
 Gnu compiler for Java (GCJ).
 
 
 
 Erm, why does that need a module?  Surely all it needs is to deal with
 the linkage from C.  I've done that experimentally, building the W3C
 CSS validator with gcj, but considered this just too huge and unwieldy
 to contemplate for operational use.
 
 

 The idea is to have the module load and execute arbitrary Java byte
 code. In other words, mod_gcj will act as bridge between Apache and
 user-provided .class and .jar files.

Aha!  That's different - more akin to mod_perl or mod_python.

And if you thought linking libgcj
 was unwieldy you probably haven't tried to run Apache + mod_jk + Tomcat
 recently.

Indeed I haven't, and when I last did it (about three years ago)
it was under protest:-)

No, it's not linking libgcj that's the issue.  It was loading a library
that had not only compiled to a 7Mb .so itself, but also had two huge
dependencies.  Your module sounds as if it should be able to offer a
more satisfactory alternative to that.  And it might also be very useful
for one of my wishlist-projects if that ever happens.

Thanks for clarifying.  I'm just off to your sourceforge page to
learn more:-)

-- 
Nick Kew


Re: to the non-committer folks in our communities...

2004-03-24 Thread Nick Kew
On Wed, 24 Mar 2004, Jeff Trawick wrote:

 Sometimes people report bugs and/or post patches on these lists and for
 whatever reason they are never properly addressed.  Discussion on the list is
 great, but it is all too easy for the e-mails move out of sight.  The mail
 arrives all too quickly.  The best action you can take to avoid the bit bucket
 for your bug reports and patches is to open a problem report at
 http://nagoya.apache.org/bugzilla/.  If a patch is associated with it, once you
 create the bug report go back to the report to attach the patch and add
 PatchAvailable to the keywords field.

Jeff, thanks for that.  Having seen a couple of patches fall into a
black hole - and one recently get committed - it had been in the back
of my mind to ask about attaching patches to a bug report.  Now you've
answered for me, I'll do that in future.

Perhaps that should go into the developer docs?

-- 
Nick Kew


Re: RequestHeader directive cannot be made conditionnal of env vars

2004-03-25 Thread Nick Kew
On Thu, 25 Mar 2004, Vincent Deffontaines wrote:

 As this seems quite simple to implement, here is my question : would a
 patch implementing env vars in RequestHeader be accepted?

I would support that patch.

Since you're new to this list, you'll have missed Jeff Trawick's
recent post about third-party patches.  It's most likely to get
adopted if you raise it as a bug report, then attach the patch to that.

-- 
Nick Kew


Re: mod_deflate updates

2004-04-14 Thread Nick Kew
On Wed, 14 Apr 2004, Justin Erenkrantz wrote:

 Your changes sound fair enough in concept, but I won't really review until it
 becomes a patch.  ;-)  -- justin

OK, I'll turn it into a patch.  But maybe not just now after a second
glass of wine:-)

I'm thinking: my use of r-notes works well when another module is
setting it, but I should implement a configuration directive as an
alternative.  Make existing behaviour the default, but let
httpd.conf override it and force or suppress output compression.

-- 
Nick Kew


[PATCH] Re: mod_deflate updates

2004-04-15 Thread Nick Kew

As discussed previously, here are my updates as a patch against 2.0.49.
They serve to enable working with compressed data coming from a proxy
(or other backend) and processing content in the output filter chain.

-- 
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/
--- mod_deflate.c.old   2004-04-15 23:35:38.0 +0100
+++ mod_deflate.c   2004-04-15 23:38:02.0 +0100
@@ -23,6 +23,12 @@
  *
  * Written by Ian Holsman
  *
+ * Modified by Nick Kew, April 2004
+ *
+ * FIX:deflate html content based on a NOTE from html-parse filter.
+ * ADD:inflate_out_filter to decompress content for output filters
+ * in a proxy with compressed backend.  I don't understand the
+ * zlib stuff, so it's modified blind from the input filter.
  */
 
 #include httpd.h
@@ -339,6 +345,11 @@
 
 /* if they don't have the line, then they can't play */
 accepts = apr_table_get(r-headers_in, Accept-Encoding);
+
+   /* NRK: accept it if we removed Accept-Encoding earlier */
+if (accepts == NULL) {
+   accepts = apr_table_get(r-notes, Accept-Encoding);
+   }
 if (accepts == NULL) {
 ap_remove_output_filter(f);
 return ap_pass_brigade(f-next, bb);
@@ -834,10 +845,224 @@
 return APR_SUCCESS;
 }
 
+
+/* Filter to inflate for a content-transforming proxy.  */
+static apr_status_t inflate_out_filter(ap_filter_t *f,
+  apr_bucket_brigade *bb)
+{
+int deflate_init = 1 ;
+apr_bucket *bkt;
+request_rec *r = f-r;
+deflate_ctx *ctx = f-ctx;
+int zRC;
+apr_status_t rv;
+deflate_filter_config *c;
+
+c = ap_get_module_config(r-server-module_config, deflate_module);
+
+if (!ctx) {
+int found = 0;
+char *token, deflate_hdr[10];
+const char *encoding;
+apr_size_t len;
+
+/* only work on main request/no subrequests */
+if (r-main) {
+ap_remove_output_filter(f);
+return ap_pass_brigade(f-next, bb);
+}
+
+/* Let's see what our current Content-Encoding is.
+ * If gzip is present, don't gzip again.  (We could, but let's not.)
+ */
+encoding = apr_table_get(r-headers_out, Content-Encoding);
+if (encoding) {
+const char *tmp = encoding;
+
+token = ap_get_token(r-pool, tmp, 0);
+while (token  token[0]) {
+if (!strcasecmp(token, gzip)) {
+found = 1;
+break;
+}
+/* Otherwise, skip token */
+tmp++;
+token = ap_get_token(r-pool, tmp, 0);
+}
+}
+
+if (found == 0) {
+ap_remove_output_filter(f);
+return ap_pass_brigade(f-next, bb);
+}
+
+f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx));
+ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc);
+ctx-buffer = apr_palloc(r-pool, c-bufferSize);
+
+
+zRC = inflateInit2(ctx-stream, c-windowSize);
+
+if (zRC != Z_OK) {
+f-ctx = NULL;
+inflateEnd(ctx-stream);
+ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+  unable to init Zlib: 
+  inflateInit2 returned %d: URL %s,
+  zRC, r-uri);
+ap_remove_output_filter(f);
+return ap_pass_brigade(f-next, bb);
+}
+
+/* initialize deflate output buffer */
+ctx-stream.next_out = ctx-buffer;
+ctx-stream.avail_out = c-bufferSize;
+
+   deflate_init = 0 ;
+}
+
+
+APR_BRIGADE_FOREACH(bkt, bb) {
+const char *data;
+apr_size_t len;
+
+/* If we actually see the EOS, that means we screwed up! */
+if (APR_BUCKET_IS_EOS(bkt)) {
+inflateEnd(ctx-stream);
+return APR_EGENERAL;
+}
+
+if (APR_BUCKET_IS_FLUSH(bkt)) {
+apr_bucket *tmp_heap;
+zRC = inflate((ctx-stream), Z_SYNC_FLUSH);
+if (zRC != Z_OK) {
+inflateEnd(ctx-stream);
+return APR_EGENERAL;
+}
+
+ctx-stream.next_out = ctx-buffer;
+len = c-bufferSize - ctx-stream.avail_out;
+
+ctx-crc = crc32(ctx-crc, (const Bytef *)ctx-buffer, len);
+tmp_heap = apr_bucket_heap_create((char *)ctx-buffer, len,
+ NULL, f-c-bucket_alloc);
+APR_BRIGADE_INSERT_TAIL(ctx-proc_bb, tmp_heap);
+ctx-stream.avail_out = c-bufferSize;
+
+/* Move everything to the returning brigade. */
+APR_BUCKET_REMOVE(bkt);
+break;
+}
+
+/* read */
+apr_bucket_read(bkt, data, len, APR_BLOCK_READ);
+
+   /* first bucket

mod_deflate update

2004-04-18 Thread Nick Kew

Attached: a one-line bugfix to my recent patch.  The inflate output
filter needs to unset the Content-Encoding header when it unsets
the content encoding.

Also a question: When I create a bucket brigade in a module, I always
explicitly apr_brigade_destroy() it.  None of the filters in mod_deflate
destroy their brigades.  A look at apr_brigade.c shows that it's not
in fact necessary, but maybe a note to that effect would be in order?

-- 
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/--- mod_deflate.c   2004-04-18 13:06:13.0 +0100
+++ mod_deflate.c.old   2004-04-18 13:07:44.0 +0100
@@ -895,6 +895,7 @@
 ap_remove_output_filter(f);
 return ap_pass_brigade(f-next, bb);
 }
+   apr_table_unset(r-headers_out, Content-Encoding) ;
 
 f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx));
 ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc);


Proposal: AP_FTYPE_PREPROCESS

2004-04-25 Thread Nick Kew

Content-transforming filters are a major and increasingly-popular
application of Apache, and serve many purposes.  Implementing
major functionality in an output filter rather than a handler
has the great advantage of making it re-usable with different
handlers, including mod_proxy.

The place for such filters is of course AP_FTYPE_RESOURCE.
But they may often require a pre-processing step.  My recent update
to mod_deflate provides a filter to decompress gzipped content for
manipulation by a content-transforming filter.  In the context
of a proxy, this pre-processing can only happen in an output
filter.  I hacked this in mod_deflate by declaring the gunzip
filter as of AP_FTYPE_RESOURCE-1.

There are many similar situations.  My own work in progress includes
decoding image formats for an image processing filter, and is
likely to include an error-recovering iconv filter to ensure
graceful recovery when proxying content containing bogus characters
through a markup filter.

Rather than use hacks like AP_FTYPE_RESOURCE-1, would it not be
better to introduce a new output filter type AP_FTYPE_PREPROCESS
below RESOURCE for this kind of application?


-- 
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/


Exception handling and MPMs

2004-04-25 Thread Nick Kew

I'm writing a jpeg module, using libjpeg to implement cjpeg/djpeg filters.

Looking at error handling, I find that libjpeg by default exits on
fatal error.  This can be overridden with a setjmp/longjmp construct.

However, I seriously doubt setjmp/longjmp is safe with threaded MPMs,
and there's no apr_setjmp.  So that's not an attractive option.

It seems that other libraries inherit this behaviour.  For example,
gd does both the above, and is harder to override than libjpeg,
so that doesn't help.

An alternative might be to use C++ try/catch, with a throw()
in the fatal-error handler.  This seems to offer the compiler
more scope for generating thread-safe code than setjmp/longjmp,
but I really don't know if that's wishful thinking ...

Where do I stand using either setjmp/longjmp or try/throw/catch
with different MPMs?


-- 
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/


Re: is it possible to mark buckets to be copied only when to be set-aside?

2004-05-19 Thread Nick Kew
On Tue, 18 May 2004, Stas Bekman wrote:

 extra allocation happens). But just now one user has reported that it breaks
 mod_xslt filter, which sets aside the buckets sent from the modperl handler,
 and then uses them after seeing EOS.

That seems to me an unnecessarily complex and inefficient XSLT
implementation.

What XSLT needs to do with its data is to parse to a DOM.  By using
libxml2/libxslt we can use a parseChunk API, and thus feed every bucket
to the parser as soon as it reaches the filter.  No need at all to buffer
or setaside it.

We have at least one implementation that works like that (originally
mine, but now more actively developed by others as mod_transform).
Perhaps mod_perl users might benefit from switching?

-- 
Nick Kew


Re: is it possible to mark buckets to be copied only when to be set-aside?

2004-05-19 Thread Nick Kew
On Tue, 18 May 2004, Stas Bekman wrote:

 Frankly I even have no idea who is the author of mod_xslt, it's not part of
 the mod_perl project.

There are several modules with that name.  When you raised it as a
problem you had encountered with mod_perl, I thought maybe mod_perl
had specific hooks for it - although there would seem to be no reason
to do so other than particular system optimisations.

-- 
Nick Kew


Re: [PATCH] mod_deflate + mod_proxy bug

2004-06-10 Thread Nick Kew
On Wed, 9 Jun 2004, Allan Edwards wrote:

 Running ProxyPass with mod_deflate results in
 an extraneous 20 bytes being tacked onto 304
 responses from the backend.

 The problem is that mod_deflate doesn't handle
 the zero byte body, adds the gzip header and
 tries to compress 0 bytes.

 This patch detects the fact that there was no
 data to compress and removes the gzip header
 from the bucket brigade.

 Any comments before I commit to head?

This is part of a slightly broader problem with proxying and mod_deflate:
it'll also waste time gzipping already-compressed data from the backend
in those cases where the compression is not explicitly indicated in the
Content-Encoding header.  Obvious examples are all the main image formats.

I'm currently running a hack that works around this, and planning a
better review when time permits (i.e. when I've caught up with things
after http://www.theatreroyal.com/showpage.php?dd=1theid=2578
which now has three nights left to run).

More interesting is the entire subject of filtering in a dynamic
context such as a proxy.  The directives available to control filtering
are simply not up to it.  Watch this space:-)

-- 
Nick Kew


Re: [PATCH] mod_unique_id: Keep running w/ Broken DNS

2004-06-16 Thread Nick Kew
On Tue, 15 Jun 2004, Paul Querna wrote:

 I see three ways to solve this issue:
 1) Make the error we spit out more verbose when DNS is broken.

+1 on that.  Error message should suggest that luser disables the module.

 2) Continue running, turning off mod_unique_id.

Violates KISS (slightly), and keeps the module loaded as deadweight.

 3) Complain to upstream vendors.(don't enable mod_unique_id by default!)

Good idea, but we don't really control that unless we adopt DJB-style
licensing.

 No matter what is done, It would be nice to have some sort of change
 into 2.0.50 before a release is made.

Indeedie.

-- 
Nick Kew


RE: Aborting a filter.

2004-06-24 Thread Nick Kew
On Tue, 22 Jun 2004, Peter J. Cranstone wrote:

 Thanks... we're currently testing a new version of mod_gzip called
 mod_gzip64i

For the record, I've fixed the problem.  It was a failure to support
some of the compression flags.  Now I'll have to (side?)port it into
a CVS version of mod_deflate ...

grumbleWhy isn't this documented in the manpages or in zlib.h/grumble?

-- 
Nick Kew


Proxy Cookie Support (Bug #10722)

2004-06-25 Thread Nick Kew

I recently patched bug 10722.  My patch was against 2.0.49, for a Client
who needed it in a hurry.

I'm just porting it to 2.1-HEAD.  In doing so, I find there's an existing
patch that saves and restores cookies without rewriting the Domain and
Path components.  AFAICS my patch would/should supersede that one.
But I'm reluctant to do so without understanding the purpose of the
other patch.

The other patch is at
http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/proxy/proxy_http.c?r1=1.184r2=1.185

It appears just to merge Set-Cookie headers in r-err_headers_out with
those in r-headers_out.  The latter have just come from the backend
(the server proxied).  But how/why should there be (any) cookies in
r-err_headers_out at this point?  Presumably they'd be from the proxy
rather than the backend?  And why merge them into a normal 2xx response?


-- 
Nick Kew


Re: Proxy Cookie Support (Bug #10722)

2004-06-25 Thread Nick Kew
On 25 Jun 2004, Joe Schaefer wrote:

 Jim Jagielski [EMAIL PROTECTED] writes:

  Nick Kew wrote:

 [...]

   It appears just to merge Set-Cookie headers in r-err_headers_out with
   those in r-headers_out.  The latter have just come from the backend
   (the server proxied).  But how/why should there be (any) cookies in
   r-err_headers_out at this point?  Presumably they'd be from the proxy
   rather than the backend?  And why merge them into a normal 2xx response?
  
 
  This is so Cookies added my the local proxy server (Apache) via
  internal custom modules do not loose those cookies when used also
  as a proxy. If a module has added Cookie information we should
  honor that and maintain it as is. Roy and I talked about
  this and both agreed that it made sense hence the patch.

OK, on further consideration after posting, I've reached the view that
the 2.0.49 patch should be modified.  I'd propose to run the same
transformations on Set-Cookie headers from the backend, but move
the rewriting - along with the rewriting of Date and Location headers
(ap_proxy_date_canon and ap_proxy_location_reverse_map) into
ap_proxy_read_headers.  That actually both cleans up the code (reduces
adhockery) and makes it more efficient (reduces table operations),
as well as making my patch orthogonal to yours.

 But is the err_headers_out logic in proxy_http.c HEAD really ok?

That I still find puzzling.  Perhaps there's a URL into the mail
archives from your previous discussion that would explain it?

-- 
Nick Kew



Re: Proxy Cookie Support (Bug #10722)

2004-06-25 Thread Nick Kew
On Fri, 25 Jun 2004, Jim Jagielski wrote:

So the

   apr_table_do(addit_dammit, save_table, r-err_headers_out,
  Set-Cookie, NULL);

 line should be removed.

OK, that's what I needed to know.  I'll still have to modify my patch
slightly to work with yours, but at least it's now clear what's going on.

-- 
Nick Kew


Re: URI lossage with ProxyPass

2004-06-25 Thread Nick Kew
On Thu, 17 Jun 2004, Francois-Rene Rideau wrote:

[ message quoted in full and crossposted to [EMAIL PROTECTED] ]

 I have experienced quite some trouble due to design bugs in ProxyPass,
 and have proposed a patch for apache 1.3.
 The very same bugs are present in apache 2.0, and a similar fix could be used.
   http://nagoya.apache.org/bugzilla/show_bug.cgi?id=29554

I've reviewed this in the context of httpd-2.1, and it looks good to me
with essentially the same patch.  It works on your testcase, and I'm
99% satisfied that it doesn't break anything.  Ready to commit if we
can answer the remaining question: should proxy_fixup be removed
altogether:


 Can you tell me if you'll fix the official mod_proxy,
 either using my patch or otherwise?


 The bug symptoms are that
 (1) when a request to a ProxyPass host contains %3A, the %3A is expanded
  to a colon, which yields an incorrect HTTP URL that confuses the remote host.
 (2) when a request to a ProxyPass host contains %2F, apache rejects the
  request with a 404 without even contacting the remote host.

 The bug causes are that
 (1) function modules/proxy/mod_proxy.c:proxy_fixup() makes a misguided attempt
  at URI canonicalization. It should definitely not try to
  when using PROXY_PASS, and probably not in STD_PROXY mode either.
  Since I don't understand all the ins and outs, my patch only adds a bypass
  in the case of PROXY_PASS, but I believe the whole function should be
  scrapped altogether (whoever checks in the patch should ponder that).

Graham Leggett's reply seems to support that, and having figured out
what you are talking about, I agree.

Can anyone see why proxy_fixup should not be removed altogether?


 (2) r-proxyreq=PROXY_PASS is declared too late, only in
  modules/proxy/mod_proxy.c:proxy_trans(), so that
  main/http_request.c:process_request_internal() already messed up
  with the URL, not realizing there is a proxy request going on.
  Consequently, the ProxyPass alias detection MUST happen not in
  modules/proxy/mod_proxy.c:proxy_trans() but in
  modules/proxy/mod_proxy.c:proxy_detect().
  This may or may not interfere with funky rewrites that some people
  may want to do before or after a ProxyPass is used. Someone who understands
  such issues should step in and tell. Maybe my change introduces some
  subtle incompatibilities in *actually deployed* setups, but I would bet not,
  and some mechanism could be devised to restore proper behaviour
  for those who would need such a feature.

 I hope my patch doesn't break any expected behaviour, but I can't be sure.
 What I'm certain of is that ProxyPass is quite broken without my patch.
 Please consider merging this patch into apache, and tell me when it's done.

 Cheers,

 [ François-René ÐVB Rideau | ReflectionCybernethics | http://fare.tunes.org ]
 [  TUNES project for a Free Reflective Computing System  | http://tunes.org  ]
 The last good thing written in C was Franz Schubert's Symphony number 9.
 -- Erwin Dieterich [EMAIL PROTECTED]


-- 
Nick Kew


PATCH: various mod_proxy issues

2004-06-26 Thread Nick Kew

I've rolled a fairly extensive mod_proxy patch, which seems rather
big to commit without review.  Comments please:

(1) Bug #10722 - cookie paths and domains in reverse proxy

Following my patch to 2.0.49, I've adapted it for 2.1, taking into account
Jim's patch.  In doing so, I made some organisational changes:

* moved rewriting of headers that need it (ap_proxy_date_canon and
  ap_proxy_location_reverse_map) to a new function process_proxy_header
  called from ap_proxy_read_headers.
* Removed the same from ap_proxy_http_process_response
* moved ap_proxy_read_headers from proxy_utils to proxy_http
* Retained Jim's patch, but removed the line merging err_headers_out


(2) Bug #29554 - URL munging

I've ported Francois-Rene Rideau's patch to 2.1, subject to the
question over proxy_fixup discussed in my last post.


Any problems with committing this?

-- 
Nick Kewdiff -u proxy-old/mod_proxy.c proxy/mod_proxy.c
--- proxy-old/mod_proxy.c   2004-06-26 07:18:10.0 +0100
+++ proxy/mod_proxy.c   2004-06-26 07:13:46.0 +0100
@@ -94,9 +94,10 @@
 static int proxy_detect(request_rec *r)
 {
 void *sconf = r-server-module_config;
-proxy_server_conf *conf;
-
-conf = (proxy_server_conf *) ap_get_module_config(sconf, proxy_module);
+proxy_server_conf *conf =
+   (proxy_server_conf *) ap_get_module_config(sconf, proxy_module);
+int i, len;
+struct proxy_alias *ent = (struct proxy_alias *)conf-aliases-elts;
 
 /* Ick... msvc (perhaps others) promotes ternary short results to int */
 
@@ -121,6 +122,19 @@
 r-uri = r-unparsed_uri;
 r-filename = apr_pstrcat(r-pool, proxy:, r-uri, NULL);
 r-handler = proxy-server;
+} else {
+/* test for a ProxyPass */
+for (i = 0; i  conf-aliases-nelts; i++) {
+len = alias_match(r-unparsed_uri, ent[i].fake);
+if (len  0) {
+r-filename = apr_pstrcat(r-pool, proxy:, ent[i].real,
+ r-unparsed_uri + len, NULL);
+r-handler = proxy-server;
+r-proxyreq = PROXYREQ_REVERSE;
+r-uri = r-unparsed_uri;
+break;
+}
+}
 }
 return DECLINED;
 }
@@ -140,26 +154,6 @@
 return OK;
 }
 
-/* XXX: since r-uri has been manipulated already we're not really
- * compliant with RFC1945 at this point.  But this probably isn't
- * an issue because this is a hybrid proxy/origin server.
- */
-
-for (i = 0; i  conf-aliases-nelts; i++) {
-len = alias_match(r-uri, ent[i].fake);
-
-   if (len  0) {
-   if ((ent[i].real[0] == '!' )  ( ent[i].real[1] == 0 )) {
-   return DECLINED;
-   }
-
-   r-filename = apr_pstrcat(r-pool, proxy:, ent[i].real,
- (r-uri + len ), NULL);
-   r-handler = proxy-server;
-   r-proxyreq = PROXYREQ_REVERSE;
-   return OK;
-   }
-}
 return DECLINED;
 }
 
@@ -221,7 +215,7 @@
 
 return OK;
 }
-
+#if 0
 /* -- */
 /* Fixup the filename */
 
@@ -236,6 +230,13 @@
 if (!r-proxyreq || !r-filename || strncmp(r-filename, proxy:, 6) != 0)
 return DECLINED;
 
+/* We definitely shouldn't canonicalize a proxy_pass.
+ * But should we really canonicalize a STD_PROXY??? -- Fahree
+ */
+if (r-proxyreq == PROXYREQ_REVERSE) {
+return OK;
+}
+
 /* XXX: Shouldn't we try this before we run the proxy_walk? */
 url = r-filename[6];
 
@@ -250,7 +251,7 @@
 
 return OK; /* otherwise; we've done the best we can */
 }
-
+#endif
 /* Send a redirection if the request contains a hostname which is not */
 /* fully qualified, i.e. doesn't have a domain name appended. Some proxy */
 /* servers like Netscape's allow this and access hosts from the local */
@@ -439,6 +440,10 @@
 ps-proxies = apr_array_make(p, 10, sizeof(struct proxy_remote));
 ps-aliases = apr_array_make(p, 10, sizeof(struct proxy_alias));
 ps-raliases = apr_array_make(p, 10, sizeof(struct proxy_alias));
+ps-cookie_paths = apr_array_make(p, 10, sizeof(struct proxy_alias));
+ps-cookie_domains = apr_array_make(p, 10, sizeof(struct proxy_alias));
+ps-cookie_path_str = apr_strmatch_precompile(p, path=, 0) ;
+ps-cookie_domain_str = apr_strmatch_precompile(p, domain=, 0) ;
 ps-noproxies = apr_array_make(p, 10, sizeof(struct noproxy_entry));
 ps-dirconn = apr_array_make(p, 10, sizeof(struct dirconn_entry));
 ps-allowed_connect_ports = apr_array_make(p, 10, sizeof(int));
@@ -474,6 +479,12 @@
 ps-sec_proxy = apr_array_append(p, base-sec_proxy, overrides-sec_proxy);
 ps-aliases = apr_array_append(p, base-aliases, overrides-aliases);
 ps-raliases = apr_array_append(p, base-raliases, overrides-raliases);
+ps-cookie_paths
+= apr_array_append(p, base-cookie_paths, overrides-cookie_paths);
+ps-cookie_domains
+ 

Re: URI lossage with ProxyPass

2004-06-26 Thread Nick Kew
On Sat, 26 Jun 2004, Graham Leggett wrote:

 Nick Kew wrote:

  Can anyone see why proxy_fixup should not be removed altogether?

 Proxy fixup seems to do the job of making sure the URL /%41%42%43
 matches ProxyPass /ABC http://xxx/ABC;, so I don't think it should be
 removed altogether.

I don't think that's right.  Both proxy_detect and proxy_trans happen
before proxy_fixup, and the comment in proxy_fixup refers to its
relationship with mod_rewrite.

The patched apache fails that test, but simply reinstating proxy_fixup
makes no difference to that.  Now I'm confused.

I think you're right in your other post: separate patches for separate
bugs.  And not necessarily at 4 a.m. 

But having come this far, I want to see both fixed:-)  And a trawl of
bugzilla tells me that the URI Lossage is bug #15207 and probably others,
while bug #16812 is a trivial corollary to the cookie patch.

-- 
Nick Kew


Re: URI lossage with ProxyPass

2004-06-26 Thread Nick Kew
On Sat, 26 Jun 2004, Graham Leggett wrote:

 Nick Kew wrote:

  Can anyone see why proxy_fixup should not be removed altogether?

 Proxy fixup seems to do the job of making sure the URL /%41%42%43
 matches ProxyPass /ABC http://xxx/ABC;, so I don't think it should be
 removed altogether.

OK, the reason for that is that the patch moved ProxyPass-ing from
proxy_trans to proxy_detect.  The latter happens before canonicalisation,
which is both why the patch works and why it breaks the above.

A fix is for alias_match() to recognise %xx sequences.  I've now
implemented it, but also separated out the URI-trouble stuff with
#ifdef FIX_15207
on the grounds that it's still subject to debate.

That still leaves us a proxy_fixup with no purpose I can see.
Perhaps someone who uses it with mod_rewrite can say if it
does anything for you?

-- 
Nick Kew


Re: 2.2 Roadmap?

2004-06-27 Thread Nick Kew
On Sun, 27 Jun 2004, Paul Querna wrote:

 The 2.0 branch was made over 18 months ago, it is time to make another
 stable branch. I believe many people want the AAA changes, and it brings
 even more features to encourage people to upgrade from 1.3.

There's another consideration that could be relevant here: people who
never touch an N.0 software release.  Bump it to 2.3? :-)

 This is only a list from my initial thoughts, please comment and make
 suggestions. I will take the resulting thread and rewrite the ROADMAP
 file.

Smart filtering.  We need much better dynamic configuration of the
filter chain, with processing depending on the headers.  Think an
AddOutputFilterByType that isn't a hackish afterthought, and extend
that to work with more than just Content-Type.  It also fixes the
awkwardness currently involved in ordering a nontrivial filter chain.

I've got this working with some minor hacks to 2.0.  Need time to
generalise/abstract it into a proposal.

-- 
Nick Kew


PROPOSAL: Enhance mod_headers as a debug/test tool

2004-07-04 Thread Nick Kew

(If this gets the thumbs up, I'll be happy to do the work:-)

In testing new code, it's often helpful to simulate different
browser requests, and responses.

For handlers and filters, mod_headers enables us to set up testcases
very easily, with the Header and (especially) RequestHeader directives.
But that's in a fixups hook, so it's no use for any hooks running in
earlier phases of a request.

My proposal is to introduce an additional DEBUG keyword to the
Header and RequestHeader directives.  Headers marked as DEBUG will
be set in post_read_request, so they are available to other modules.
Without DEBUG, it will default to current (fixups) behaviour.

Of course, DEBUG won't work with conditional (Request)Header directives.
In addition to documenting this, attempts to do so will log a warning.

-- 
Nick Kew


Re: PROPOSAL: Enhance mod_headers as a debug/test tool

2004-07-05 Thread Nick Kew
On Sun, 4 Jul 2004, Nick Kew wrote:

 (If this gets the thumbs up, I'll be happy to do the work:-)

Since reaction seemed broadly positive, I've checked it in to HEAD.

Following suggestions in the replies, it's invoked by the keyword
early which takes the place of the env clause.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-11 Thread Nick Kew
On Mon, 12 Jul 2004, Ian Holsman wrote:

 ok, now before I start this let me say one thing, this is not for *ALL*
 requests, it will only work for ones which don't have content-length
 modifiable filters (like gzip) applied to the request, and it would be
 left to the webserver admin to figure out what they were, and if you
 could use this.

But that's not an issue if the byterange filter comes after any filters
that modify content (CONTENT_SET).

 ok..
 at the moment when a byterange request goes to a dynamic module, the
 dynamic module can not use any tricks to only serve the bytes requested,
 it *HAS* to serve the entire content up as buckets.

Indeed.  That only becomes a problem when a filter breaks pipelining.

 what I am proposing is something like:

 1. the filter keeps a ordered list of range requests that the person
 requests.

 2. it keeps state on how far it has processed in the file. thanks to
 knowing the length of the buckets processed so far.
Q: when do the actual headers get put in.. I think they are after no?

ITYM data, not the file.  The case of a single file is trivial, and
can more efficiently be handled in a separate optimised execution path.
And some bucket types have to be read to get their length.

 3. it then examines the bucket + bucket length to see which range
 requests match this range, if some do it grabs that range (possibly
 splitting/copying if it meets multiple ranges) and puts it on the right
 bits of each range request.

 4. if the top range request is finished, it passes those buckets through.

 5. repeat until EOS/Sentinel, flushing the ordered list at the end.

This doesn't completely address the issue that this might cause excessive
memory usage; particularly if we have to serve ranges in a perverse order.
I would propose two admin-configurable limits:

(1) Total data buffered in memory by the byterange filter.  This can be
computed in advance from the request headers.  If this is exceeded, the
filter should create a file bucket to store the data, and the ordered
list then references offsets into the file.

(2) A limit above which byteranges won't be served at all: most of us
have neither the memory nor the /tmp space for a gigabyte.

 now.. this assumes that splitting a bucket (and copying) is a zero cost
 operation which doesn't actually *read* the bucket, is this true for
 most bucket types?

 would this kind of thing work?

As I said, the trivial cases should (transparently) be treated separately
and more simply.  Otherwise ... well, as discussed on IRC.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-12 Thread Nick Kew
On Mon, 12 Jul 2004, Graham Leggett wrote:

  at the moment when a byterange request goes to a dynamic module, the
  dynamic module can not use any tricks to only serve the bytes requested,
  it *HAS* to serve the entire content up as buckets.

 In theory, if mod_proxy (for example) gets a byte range request, it
 should only serve that byte range - ideally modules/filters should not
 prop up other modules/filters.

That will not always be practicable.  mod_proxy should be configurable
to pass byteranges headers straight through to the backend or strip them
and assume the proxy will handle the ranges.

 If a filter somewhere in the filter stack is going to break the byte
 range request in any way (for example something like mod_include) then
 that filter should be responsible for removing the Range header from the
 request before mod_proxy gets a chance to service the request.

Doesn't that break modularity rather badly?  mod_include is concerned with
simple content modifications, not HTTP.  It doesn't need more complexity.

 In theory, as the byte range filter should be one of the topmost filters
 run, it would have seen the Range header and noted what range it should
 have been returning, so a downstream filter removing the Range header
 should not cause a problem for the byte range filter.

But if you adopt that approach, then *every* filter has to faff about
with range headers (just in case), the first one strips it out, and
the others run in blissful ignorance.  Makes more sense if only the
byterange filter concerns itself with the header.

 In turn, if a downstream filter/content handler has returned a 206
 Partial Content response, the byte range filter should know what to do
 (has my job already been done by a downstream filter?

Yes, quietly remove itself from the chain.

 In fact thinking about this some more - mod_include might look at the
 byte range, and then intelligently decide to either include / not
 include certain included content based on the byte range. This could
 improve performance on some sites.

For mod_include to do that is an order of magnitude extra complexity
(even if you solve the problem of measuring the length of each include
without actually executing it).  For modules that generate entirely new
data - such as those based on a markup processor (accessibility, xmlns,
xinclude/xslt, proxy_html, annot - to name but a few) it becomes even
bigger: we'd have to count every byte we write!

 So to sum up:

 - Teach the byte range filter that it might receive content from a
 content handler that already has the byte range applied, and to react
 intelligently when this happens. A content handler will indicate this by
 returning a 206 Partial Content and/or a Content-Range header, which is
 easily parsed by the byte range filter - no need for special flags or
 buckets.

That has to be configurable, as some filters can only run on a
complete datastream.

 - Teach certain content handlers (such as mod_proxy or mod_cache) to
 handle byte range requests themselves, using the standard RFC2616
 headers and responses to indicate whether ranges have been applied.
 Which content handlers will be taught this will depend on whether there
 is a performance gain to be had by getting the content handler to know
 about byte ranges.

mod_proxy needs only two modes: transparent (leave it to the backend)
or opaque (get the entire document and leave it to the byteranges filter).
The latter would be appropriate when cacheing and/or content-filtering.

mod_cache in quick-handler mode is a special case.  But since that's
only serving from a complete document in-memory or on-disc, it's
straightforward.

 - Teach certain problem modules (mod_gzip if appropriate) to react

That'll be mod_byteranges.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-13 Thread Nick Kew
On Tue, 13 Jul 2004, Joe Orton wrote:

 On Mon, Jul 12, 2004 at 03:35:12AM +0100, Nick Kew wrote:
  This doesn't completely address the issue that this might cause
  excessive memory usage; particularly if we have to serve ranges in a
  perverse order. I would propose two admin-configurable limits:
 
  (1) Total data buffered in memory by the byterange filter.  This can be
  computed in advance from the request headers.  If this is exceeded, the
  filter should create a file bucket to store the data, and the ordered
  list then references offsets into the file.

 Buffering responses into temporary files so that the byterange filter
 can do its job sounds extremely messy.

Not as messy as buffering it in memory regardless of size.  But I agree
it's got to be configurable, and probably not a default.

 Being able to send byteranges of arbitrary dynamically generated content
 doesn't seem like an essential feature; the filter is just saying I
 can't efficiently process a byterange request for this content.
 Clients must handle the 200 response fallback already.

Indeed.

The question is: can we offer admins alternatives, that might be
better in some situations, without messing memory.  I've put forward
suggestions, amplifying Ian's, on how I believe we can.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-13 Thread Nick Kew
On Tue, 13 Jul 2004, Graham Leggett wrote:

 But in the case of mod_proxy, mod_jk, etc it is quite valid and very
 desirable for a range request to be passed all the way to the backend,
 in the hope that the backend sends just that range back to mod_proxy,
 which in turn sends it up a filter stack that isn't going to fall over
 because it received a 206 Partial Content response.

Indeed.  In a straight-through proxy that's right.

But in the case of a cacheing proxy, it may be better for it to retrieve
the entire document and manage byteranges locally.  And in the case of
a content-transforming proxy, the filters may need the entire content to
function at all.

Bottom line: this has to be controlled by the server admin.  We offer
the options of passthrough, process locally, or ignore ranges.

 The above is still true - there is (and should be) very little for the
 content handler to worry about when it comes to HTTP compliance, and
 content handlers should have the option to just generate content, as
 they do now.

Agreed.  That applies both to content handlers and content filters.

 The problem though is not with the content handlers but with the filters
   - filters must not make the assumption that all content handlers only
 serve content and not HTTP headers. When a content handler decides that
 it wants to handle more of the HTTP spec so as to improve performance,
 it should be free to do so, and should not be stopped from doing so due
 to limitations in the output filters.

Indeed, historically (possibly still) content length has been a problem
for filters.  Simply removing the header may not be sufficient if the
content-length filter reinserts it erroneously.  Ranges are more
complex.

Basically a proxy or other content generator that takes care of
byteranges itself is going to be incompatible with certain output
filters.  That has to be documented, and there has to be an easy
way for filters to detect when they're not wanted, or for Apache
to mark them inapplicable and refuse to run them at all.
A situation where filters have to get their hands dirty with
partial responses would be a serious problem.

 In other words if mod_proxy is taught how to pass Range requests to the
 backend server, the output filter stack should not stop proxy from doing
 so by removing Range headers unless it is absolutely necessary.

Indeed.  So in httpd.conf we have options for the proxy to pass range
requests through or not.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-13 Thread Nick Kew
On Tue, 13 Jul 2004, Graham Leggett wrote:

 Nick Kew wrote:

  Indeed.  In a straight-through proxy that's right.
 
  But in the case of a cacheing proxy, it may be better for it to retrieve
  the entire document and manage byteranges locally.  And in the case of
  a content-transforming proxy, the filters may need the entire content to
  function at all.

 Remember that in our case there is no such thing as a caching proxy.

Of course there is!  It's apache with mod_proxy and mod_cache.  Just
as a content-transforming proxy is apache with mod_proxy and one or
more content filter module.

  Bottom line: this has to be controlled by the server admin.  We offer
  the options of passthrough, process locally, or ignore ranges.

 I think it's better to avoid adding extra directives, or giving the
 admin the power to override RFC2616. How to handle ranges is described
 fully in the HTTP/1.1 spec, the admin shouldn't really have the option
 to fiddle with it. Just adding more ways to get it wrong.

AFAICS RFC2616 sanctions any of the three behaviours I'm proposing.
Firstly noone is required to support ranges at all.  Secondly the
following passage from #14.35 seems to sum up the other options:

   If a proxy that supports ranges receives a Range request, forwards
   the request to an inbound server, and receives an entire entity in
   reply, it SHOULD only return the requested range to its client. It
   SHOULD store the entire received response in its cache if that is
   consistent with its cache allocation policies.

That's not explicit about whether the Range was forwarded, but neither
AFAICS is anything else in the RFC.

 Any filter that could get it's hands dirty (mod_include springs to mind)
 should just strip the Range header from the request, leaving the byte
 range filter to do the dirty work for it on the full response.

That is dirtying the filter API further.  If filter modules are to
be responsible for that, we should at least provide a higher-level
API for them, ideally in a declarative form.  Maybe something along the
lines of an AP_IS_TRANSFORM flag or flags, that will transparently deal
with Content-Length, Range and Warning headers on behalf of a filter
when it is inserted.

If we can abstract out the common processing required of every
content-transforming filter into a simple magic-API, I'll be happy.

-- 
Nick Kew


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-07-13 Thread Nick Kew
On Tue, 13 Jul 2004, William A. Rowe, Jr. wrote:

 It would be nice in apache 2.2 to finally clean up this contract, with two
 simple metadata element to pass through the filter chain:

 . this request is unfiltered
 . this request has a 1:1 filter (stateless)
 . this request has a arbitrary content transformation

 Each filter is the stack could promote the complexity but should never set
 it to a lower state.  This would allow http/proxy modules to negotiate less
 complex transformations in more efficient ways.

Nicely put.  Thank you!

+1

-- 
Nick Kew


Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

2004-07-17 Thread Nick Kew
On Sat, 17 Jul 2004, [ISO-8859-15] André Malo wrote:

 * [EMAIL PROTECTED] wrote:

+f-ctx = ctx = (void*)-1;

 I personally consider defining arbitrary pointer values as bad style, though
 I'm not sure what the general opinion here is (if any).

 I'd suggest to use a static pointer, like a global

 static char foo_sentinel; /* choose a speaking name ;-) */
 /* and later */
 f-ctx = ctx = foo_sentinel;

 Additionally - afair - the use of arbirtrary pointer values can even lead
 to bus errors on not-so-usual systems (loading undefined bits into an address
 register...).

Yes, you're right.

Actually this patch has a deeper problem, as does the patch it fixes.
Setting the headers at this point depends entirely on the behaviour
of the headers filter.  With current behaviour, the previous mod_deflate
was broken (because it could delay setting headers until after the
headers have been sent down the wire).  With my patch it might still
risk minor breakage (repeated gzip header) if the headers filter changes
sometime in future.

Any more issues with this?  If not I'll make nd's fix and leave it.

-- 
Nick Kew


Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

2004-07-19 Thread Nick Kew
On Mon, 19 Jul 2004, Joe Orton wrote:

 On Sat, Jul 17, 2004 at 03:22:35PM -, [EMAIL PROTECTED] wrote:
  niq 2004/07/17 08:22:35
 
Modified:modules/filters mod_deflate.c
Log:
Fix previous patch to deal correctly with multiple empty brigades before
we know if there's any content, and not re-process the headers.

 Is there no simpler fix for this e.g. first thing the filter does is if
 (APR_BRIGADE_EMPTY(bb)) return APR_SUCCESS;.  And to avoid the

Yes, that should work (but leaves us to reprocess the whole thing next
time round).  Is that better or worse?  Or come to think of it, we can
both return APR_SUCCESS and set a flag.

 re-process issue just ap_remove_output_filter(f) if finding an EOS-only
 brigade?

Do you recollect the discussion around when that patch went in?
I don't in full, but I had a nagging recollection of someone having
proposed a simpler solution but found it didn't work.  Just enough
to persuade me to preserve the loop-over-buckets test.

-- 
Nick Kew


setjmp/longjmp vs try/throw/catch

2004-07-19 Thread Nick Kew

I have a couple of modules using third-party libraries that require me
to supply an abort function (or they'll abort by exiting).
For example, libjpeg in my mod_jpeg.

My preferred approach to this situation is usually to resort to C++,
put my code in a try/catch loop, and provide an abort handler that
throws an exception.  However, this doesn't play well with Apache,
and when I run it in gdb, the throw appears to generate an Abort.

Switching to setjmp/longjmp does appear to work well with apache and gcc.
But that leaves me wondering if I need to worry about thread-safety.
Is using setjmp/longjmp with Worker or Windoze MPM asking for trouble?
And if so, is there an alternative approach I could try?

-- 
Nick Kew


Re: Invitation to HTTPD commiters in tomcat-dev

2004-07-20 Thread Nick Kew
On Tue, 20 Jul 2004, Henri Gomez wrote:

 We're discussing on tomcat-dev about a new Apache to Tomcat
 Apache 2.x module.

 We'd like to see some of the core HTTPD developpers joins
 the discussion about the post JK/JK2 module.

As a startingpoint, how about telling us what tomcat needs that
mod_proxy and friends don't provide?

-- 
Nick Kew


Re: Invitation to HTTPD commiters in tomcat-dev

2004-07-20 Thread Nick Kew
On Tue, 20 Jul 2004, Henri Gomez wrote:

[ chopped tomcat-dev because that bounces my mail ]

  As a startingpoint, how about telling us what tomcat needs that
  mod_proxy and friends don't provide?

 In mod_jk/jk2, there is support for load-balancing and fault-tolerance
 and it's a key feature.

Good start.

I'm guessing you're ahead of me here, and your reason for posting to
[EMAIL PROTECTED] is that you can see that implementing these capabilities
will be of general interest to more than just tomcat users?

My gut feeling would be to keep this properly modular.  Let mod_proxy
be the core of it, and implement load-balancing and fault-tolerance
in additional modules.  As a matter of fact, one of my wishlist-projects
is a connection-pooling module for backend HTTP connections in a proxy.
That might actually be the same as your project.

-- 
Nick Kew


Re: Invitation to HTTPD commiters in tomcat-dev

2004-07-20 Thread Nick Kew
On Tue, 20 Jul 2004, Henri Gomez wrote:

 We agree and I wonder if a mod_ajp could be used in conjunction with
 mod_proxy ? A sort of alternative way to route requests to tomcat.

We have proxy_http and proxy_ftp protocol modules.  That begs the
question: can't proxy_ajp live alongside them?

 Well let see my suggestion :

Makes sense.

With the caveat that proxying plain HTTP can do much more than some
posts in this thread seem to think.  So the motivation has to be
people want AJP, not HTTP can't do things.

-- 
Nick Kew


Re: setjmp/longjmp vs try/throw/catch

2004-07-20 Thread Nick Kew
On Tue, 20 Jul 2004, William A. Rowe, Jr. wrote:

 IIRC - all setjmp and other usually-thread-agnostic calls in a normal clib
 were redesigned to use TLS in the Win32 msvcrt lib, long before most
 Unixes considered implementing threads :)  I believe on win32 you will
 be fine, I'd be more worried about the thread implementations.

I have it on credible authority (in IRC from someone I believe, after
I asked) that POSIX requires it to be thread-safe.  That's good enough
for me: tells me I don't need to advise the Client to use prefork.

 This sure sounds like an abstraction we should assist with using apr.

Agreed.  But I don't have APR karma to introduce the idea there.

-- 
Nick Kew


Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

2004-07-22 Thread Nick Kew
On Mon, 19 Jul 2004, Joe Orton wrote:

 Nothing like that was posted to the list, at least.  Patch below is
 still sufficient to fix the proxy+304 case; does it work for you too?

Yes, mostly (it fixes the important bug that was previously a
showstopper).  And it's an improvement on my hack by virtue of
simplicity.

But it should still set the Content-Encoding header on a HEAD request
that would normally be deflated (and unset content-length if present).
So your:

+/* Deflating a zero-length response would make it longer; the
+ * proxy may pass through an empty response for a 304 too. */
+if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) {
+ap_remove_output_filter(f);
+return ap_pass_brigade(f-next, bb);
+}
+

should move after the the
if ( ! force-gzip )
block, and if then if we reach the EOS-only test we should fix up
the headers.

That test also seems to lose the pathological case of a brigade
with no data but one or more FLUSH buckets followed by EOS.
Could that ever happen in a HEAD or a 204/304?


Investigating this has revealed a similar bug with HEAD requests
in inflate_out_filter, which I shall now have to fix:-(


-- 
Nick Kew


Re: cvs commit: httpd-2.0/modules/filters mod_deflate.c

2004-07-22 Thread Nick Kew
On Thu, 22 Jul 2004, Nick Kew wrote:

 On Mon, 19 Jul 2004, Joe Orton wrote:

  Nothing like that was posted to the list, at least.  Patch below is
  still sufficient to fix the proxy+304 case; does it work for you too?

 Yes, mostly (it fixes the important bug that was previously a
 showstopper).

I attach a new patch based on yours.  it fixes my testcases including
headers for HEAD requests.  Look OK to you?

-- 
Nick Kew--- mod_deflate.c.bak   2004-07-22 11:12:53.0 +0100
+++ mod_deflate.c   2004-07-22 12:17:13.0 +0100
@@ -247,7 +247,6 @@
 apr_bucket_brigade *bb, *proc_bb;
 } deflate_ctx;
 
-static void* const deflate_yes = (void*)YES;
 static apr_status_t deflate_out_filter(ap_filter_t *f,
apr_bucket_brigade *bb)
 {
@@ -255,14 +254,14 @@
 request_rec *r = f-r;
 deflate_ctx *ctx = f-ctx;
 int zRC;
-char* buf;
-int eos_only = 1;
-apr_bucket *bkt;
-char *token;
-const char *encoding = NULL;
 deflate_filter_config *c = ap_get_module_config(r-server-module_config,
 deflate_module);
 
+/* Do nothing if asked to filter nothing. */
+if (APR_BRIGADE_EMPTY(bb)) {
+return APR_SUCCESS;
+}
+
 /* If we don't have a context, we need to ensure that it is okay to send
  * the deflated content.  If we have a context, that means we've done
  * this before and we liked it.
@@ -270,6 +269,8 @@
  * we're in better shape.
  */
 if (!ctx) {
+char *buf, *token;
+const char *encoding;
 
 /* only work on main request/no subrequests */
 if (r-main) {
@@ -349,7 +350,6 @@
  */
 apr_table_setn(r-headers_out, Vary, Accept-Encoding);
 
-
 /* force-gzip will just force it out regardless if the browser
  * can actually do anything with it.
  */
@@ -384,39 +384,22 @@
 }
 }
 
-/* don't deflate responses with zero length e.g. proxied 304's but
- * we do set the header on eos_only at this point for headers_filter
- *
- * if we get eos_only and come round again, we want to avoid redoing
- * what we've already done, so set f-ctx to a flag here
+/* Deflating a zero-length response would make it longer; the
+ * proxy may pass through an empty response for a 304 too.
+ * So we just need to fix up the headers as if we had a body.
  */
-f-ctx = ctx = deflate_yes;
-}
-if (ctx == deflate_yes) {
-/* deal with the pathological case of lots of empty brigades and
- * no knowledge of whether content will follow
- */
-for (bkt = APR_BRIGADE_FIRST(bb);
- bkt != APR_BRIGADE_SENTINEL(bb);
- bkt = APR_BUCKET_NEXT(bkt))
-{
-if (!APR_BUCKET_IS_EOS(bkt)) {
- eos_only = 0; 
- break;
-}
-}
-if (eos_only) {
-if (!encoding || !strcasecmp(encoding, identity)) {
+if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) {
+   if (!encoding || !strcasecmp(encoding, identity)) {
 apr_table_set(r-headers_out, Content-Encoding, gzip);
 }
 else {
 apr_table_merge(r-headers_out, Content-Encoding, gzip);
 }
 apr_table_unset(r-headers_out, Content-Length);
+
+ap_remove_output_filter(f);
 return ap_pass_brigade(f-next, bb);
 }
-}
-if (!ctx || (ctx==deflate_yes)) {
 
 /* We're cool with filtering this. */
 ctx = f-ctx = apr_pcalloc(r-pool, sizeof(*ctx));
@@ -912,6 +895,11 @@
 apr_status_t rv;
 deflate_filter_config *c;
 
+/* Do nothing if asked to filter nothing. */
+if (APR_BRIGADE_EMPTY(bb)) {
+return APR_SUCCESS;
+}
+
 c = ap_get_module_config(r-server-module_config, deflate_module);
 
 if (!ctx) {
@@ -950,6 +938,13 @@
 }
 apr_table_unset(r-headers_out, Content-Encoding);
 
+/* No need to inflate HEAD or 204/304 */
+if (APR_BUCKET_IS_EOS(APR_BRIGADE_FIRST(bb))) {
+ap_remove_output_filter(f);
+return ap_pass_brigade(f-next, bb);
+}
+
+
 f-ctx = ctx = apr_pcalloc(f-r-pool, sizeof(*ctx));
 ctx-proc_bb = apr_brigade_create(r-pool, f-c-bucket_alloc);
 ctx-buffer = apr_palloc(r-pool, c-bufferSize);
@@ -983,9 +978,10 @@
 apr_size_t len;
 
 /* If we actually see the EOS, that means we screwed up! */
+/* no it doesn't - not in a HEAD or 204/304 */
 if (APR_BUCKET_IS_EOS(bkt)) {
 inflateEnd(ctx-stream);
-return APR_EGENERAL;
+return ap_pass_brigade(f-next, bb);
 }
 
 if (APR_BUCKET_IS_FLUSH(bkt)) {


Ideas for Smart Filtering

2004-07-29 Thread Nick Kew

The filter architecture periodically gets discussed here, and I've
been meaning to write up my thoughts for some time.  I'm using a
module that implements a slightly different filter API, primarily
for filtering in a proxy context.

I've now written a brief discussion document on the subject.  It's
mostly an abstraction of what I'm currently using, although it does
propose some additional improvements, primarily with regard to
protocol handling (reflecting the recent byteranges thread here).

It generated an interesting discussion, including some interesting
alternative ideas, last night on IRC.  Perhaps it can lead to a
general-purpose module for 2.0 and an architecture update for 2.2?

http://www.apachetutor.org/dev/smart-filter

-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-07-30 Thread Nick Kew
On Fri, 30 Jul 2004, Joe Schaefer wrote:

 Um, could you please explain that bit about te need to remove
 filter-init from the API?  That hook plays a pivotal role in
 libapreq2's input filter mod_apreq.  mod_apreq needs to examine
 the entire input filter stack and modify it under certain
 conditions.  This cannot be done in-flight, during ap_get_brigade.

Ah.  Two answers to that one:

(1) I'm only really considering output filters.  Input filters
can't depend on the handler, so the dynamic configuration
discussed is not relevant to them.

(2) I propose getting rid of it because I cannot see any circumstance
in which it's necessary in an output filter.  But it's not a
requirement of the proposed archictecture, except insofar as
it saves the overhead of a filter initialising when it's not
going to be run.

-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-07-30 Thread Nick Kew
On Fri, 30 Jul 2004, Joe Schaefer wrote:

 Nick Kew [EMAIL PROTECTED] writes:

 [...]

  (2) I propose getting rid of it because I cannot see any circumstance
  in which it's necessary in an output filter.

 But you still need a simple way for an output filter to run some code
 before the content handler gets invoked.

Fair enough.  Then you register it unconditionally using the old API.


 FWIW I've been advising output
 filter authors that want to get at libapreq2's post data to use
 filter_init for that:

   http://cvs.apache.org/~joes/libapreq2-2.04-dev/docs/html/apreq_faq.html

 If there's a better approach, I'd be glad to update those docs.

Wouldn't an ap_hook_insert_filter() handler be the ideal spot for that?

But anyway, if there is a valid need for filter_init in its present
form in some output filters, we can still use it, provided we document
why it might be inefficient to use it with dynamic configuration.


-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-07-30 Thread Nick Kew
On Fri, 30 Jul 2004, Joe Schaefer wrote:

 Nick Kew [EMAIL PROTECTED] writes:

 [...]

   If there's a better approach, I'd be glad to update those docs.
 
  Wouldn't an ap_hook_insert_filter() handler be the ideal spot for
  that?

 I thought such hooks were run on *every* request, not
 just the ones which require a particular output filter?
 If so, for performance reasons that's not a suitable
 solution for folks writing output filters with mod_perl.

Fair enough.  So we have a reason not to dispense with that
handler.  It just means we need to document when not to use it.

So the next question is, is it sufficient to maintain the two
APIs (the established 2.0 + my proposal)?  I'm thinking about that
one: it's not really a problem to provide a combined API if there's
anything to gain by it.  But in any case, you can rest assured I'm
not suggesting we abolish the old API :-)  A filter can register
itself as an output filter (old API) and as a smart filter provider
(new API).

Unless of course someone can radically improve on my proposals ...

-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-08-01 Thread Nick Kew
On Sat, 31 Jul 2004, Justin Erenkrantz wrote:

 Yet, I'm not sure I understand the intent of your proposal.  Is it that you
 don't like the fact that each filter has to make a decision on whether it
 should stick around?

Essentially, yes.  We already have a double-digit number of content
filters in one application, and that's growing.

 So, what you are proposing to do is to abstract those
 two decisions into separate functions - i.e. decide whether to accept, and
 another to perform the filter?

I'm currently running something like this using the ap_provider API.
The reason I'm not proposing just to use that is that we want more
flexibility.  For example Content-Type: text/html;charset=latin1
is two different keys we might wish to dispatch on, while a Cookie
could enumerate an arbitrary number.

  ap_register_smart_filter(name, match, filter_func, ctx, protocol_flags)
 
  Now when the harness name is inserted in the filter chain, and there is a
  match with match, lookup_handler (referenced above) will returh our
  filter_func for filter name.

 I'm not sure what 'match' is in this context.

In the above case, it could be text/html or latin1.
  ap_register_smart_filter(transcode, latin1, charset_filter, ctx, flags);
  ap_register_smart_filter(process, text/html, html_filter, ctx, flags);

But that really needs the flexibility of a regexp, so latin1 becomes
  latin[-_]?1|iso[-_]?8859_?1
or might expand to include other close relatives like iso-8859-15

 What is the point of protocol_flags?

C.f. the recent thread on handling byteranges.   Bill Rowe expressed
the problem rather well in that thread.  In view of your request not
to cite URLs for substantive discussion, I'll quote from his post:


The confusion results because mod_proxy isn't implemented as a content
handler, it's a protocol handler in its own right.  Rather than insist on
the mod_http  mod_proxy agreeing to streamline the response, we've put
it on every content module author to:

. remove output C-L header if the size is transformed
. remove input range headers if the content isn't 1:1 transformed

This is very kludgy and more an example of where mod_http  mod_proxy
didn't quite get it right, and made it a little more difficult for folks
who are just trying to transform content bodies.

It would be nice in apache 2.2 to finally clean up this contract, with two
simple metadata element to pass through the filter chain:

. this request is unfiltered
. this request has a 1:1 filter (stateless)
. this request has a arbitrary content transformation


 Why should the filter be forced to
 pre-declare these decisions?  Why can't I determine that dynamically?

Noone forces it.  A filter that wants to take charge of protocol decisions
is free to do so.  But requiring every filter to do so is a burden on
filter writers, and is bug-prone (c.f. the number of ways to generate
a bogus Content-Length on a HEAD request).

   I
 think it'd be a bad idea to key such HTTP/1.1 protocol issues in the filter
 API.  I think we should maintain protocol-agnosticism where possible.

I have to disagree there.  There are certain wheels I don't want to have
to redesign every time I implement a content filter.  A filter that wants
to take full responsibility itself should be able to do so, but bearing
in mind that whatever one filter does may be overridden by another.

 So, to sum up: splitting out the decision whether the filter should run from
 it's filter function sounds fine.  But, I think the Filter* directives
 abstract too much in this particular case.  Let the filter itself decide.  --
 justin

You're right that these are two separable tasks, and in fact the filter
dispatcher is the part I have implemented, whereas the protocol handling
is merely a proposal.  I'd be interested to hear other views on the
subject.  Are you disagreeing with my quote from Bill Rowe above,
or merely with my proposed solution to that problem?

-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-08-01 Thread Nick Kew
On Sun, 1 Aug 2004, Justin Erenkrantz wrote:

 --On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew [EMAIL PROTECTED] wrote:

  I'm not sure what 'match' is in this context.
 
  In the above case, it could be text/html or latin1.
ap_register_smart_filter(transcode, latin1, charset_filter, ctx,
  flags);   ap_register_smart_filter(process, text/html, html_filter, ctx,
  flags);
 
  But that really needs the flexibility of a regexp, so latin1 becomes
latin[-_]?1|iso[-_]?8859_?1
  or might expand to include other close relatives like iso-8859-15

 Having an overhead of regexp's by default in our filter code would seem to be
 a severe bottleneck.

Hmmm, how many configurations don't use any LocationMatch/family
containers nor AliasMatch or Rewrite rules?

But anyway, fair point.  Regex vs simple strcasecmp should be a flag.

 I'd rather avoid that or push it on those few specific
 modules that want the power of regexp and willing to pay the ridiculous cost
 penalties.  The other significant thing you are missing in your API is what to
 match against.  (I think you are assuming Content-Type, but there's a lot of
 cases where you want to match against something other than Content-Type.)

That's part of the proposed configuration, when we declare the name for
the filter harness.

  FilterDeclare transcode AP_FTYPE_RESOURCE
  FilterDispatcher transcode Content-Type [charset=([^;]+)]
  FilterProvider transcode latin[-_]?1|iso[-_]?8859[-_]1 latin_1_filter
  FilterProvider transcode [other providers for other matches]

(that's maybe a bit contrived - I don't have a real-life case where we
want multiple filters other than on/off for different charsets)

(btw, if you think AP_FTYPE_RESOURCE should be AP_FTYPE_CONTENT_SET,
that's another weakness of the architecture.  If we need to transcode
*before* a content filter, then we can't use CONTENT_SET.
Solution: this needs to be configurable).

 Remember that the content-length doesn't even need to be set *before* we go
 into the filter.  (The fact that default_handler does it is more of an
 accident than anything else.)  The content-length header is *not* normative
 and should almost always be ignored.  (Of course, this is internally to httpd

Yes of course.

The point is that content-length *is* set by many handlers, and has to be
unset by filters.  The second point is that there *are* a bunch of bugs
arising from that (e.g. mod_deflate in 2.0.x vs recent fixes in 2.1-HEAD).
The KISS principle tells us that simplifying the task of filtering
content will reduce the bug count.

 and brigades.  It is not efficient to constantly compute the length as we push
 data through the filters.

No, but it is efficient simply to *unset* the length if we have one or
more filter that's going to change it.  Likewise, we need to handle
byteranges and Warning headers.  And unset a Last-Modified header when
a filter invalidates it (or make it configurable - c.f. XBitHack).

Instead of requiring every filter to worry about that, we let filters
simply declare their behaviour.

 So, if a filter is relying upon the content-length HTTP metadata header and
 not the brigades it sees, then it's severely broken.  Trying to restrict
 filters to pre-declare what they will do is, IMHO, silly and pointless.  I
 don't see how a solution for pre-declaring the intention of a filter is going
 to provide any real benefits.  Nothing can make use of that knowledge anyway
 because they have to account for all cases!  So, any benefit for corner-case
 optimization is lost by the increase in complexity just added.

No, the whole point is to *reduce* complexity!

-- 
Nick Kew


Re: Ideas for Smart Filtering

2004-08-01 Thread Nick Kew
On Sun, 1 Aug 2004, Eli Marmor wrote:

 Great idea, Nick.

 By the way: Is it possible to integrate it with mod_rewrite, of course
 after extending mod_rewrite a little?  This may save us the need to
 invent new directives (e.g. FilterProvider, FilterDispatche, etc.).
 After all, mod_rewrite has a very sophisticated system to define
 conditions.

That's something that's been floating around the back of my mind.

I wouldn't want it to be dependent on mod_rewrite.  But if dispatch
were managed internally from the env table, then mod_rewrite/setenvif
could be used to configure it, by those who need that level of
flexibility.

I haven't figured out how I'd implement that ...

-- 
Nick Kew


Protocol handling review (proxy/etc)

2004-08-02 Thread Nick Kew

As part of my smart filtering proposal, I'm looking to tidy up protocol
handling.  We have several loose ends to deal with:

* Zero-length responses setting bogus Content-length or unsetting it
  (e.g. bug 18757, mod_deflate compressing empty bodies).

* Failing to respect no-transform in a proxy.  No-transform should
  preclude not only content-transforming filters, but also those
  like content_length filter that affect headers.

* Different paths between cached responses and origin responses.
  Fixes to the above should not break mod_cache.  Is this a real risk?

Regarding proxying, several bug reports speak of Windows update not
working through an Apache proxy, but offer vague or conflicting
diagnoses of the exact cause.  Can anyone who uses windows update -
or knows how to find it - check whether it sets no-transform?  If it
does then the breakage is our fault, and the fix should fall out of
a general review of this.

-- 
Nick Kew


Re: cvs commit: httpd-2.0/docs/conf httpd-std.conf.in

2004-08-02 Thread Nick Kew
On Mon, 2 Aug 2004, [ISO-8859-15] Andr Malo wrote:

 Now we have your additional charsets twice...

Erk! So we do.  I guess the best fix is just another update to chop the
duplicates?

-- 
Nick Kew


AddDefaultCharset and Bug 23421

2004-08-02 Thread Nick Kew

Our shipping with AddDefaultCharset preconfigured is causing lots of
pages to be served with a bogus charset, typically where authors
rely on meta http-equiv ... and either don't know how to fix it
or lack permission.

Bundling AddDefaultCharsets help users fix this.  But really we need
to do two more things.  One is to update the documentataion - perhaps
a tutorial on the subject.  The other is to turn multiviews on by
default, so authors whose sysops stick with defaults and offer no
privileges can deal with it without having to hardwire
a href=foo.html.gb2312my chinese page/a into their HTML.

Does this make sense?  Or could we simply drop the AddDefaultCharset
from the installation default as suggested by Duerst and others?

-- 
Nick Kew


Re: cvs commit: httpd-2.0/docs/conf httpd-std.conf.in

2004-08-02 Thread Nick Kew
On Mon, 2 Aug 2004, [ISO-8859-15] André Malo wrote:

-# The set below does not map to a specific (iso) standard
-# but works on a fairly wide range of browsers. Note that
-# capitalization actually matters (it should not, but it
^^^
-# does for some browsers).

Aaargh!   Charsets are case-insensitive, so why should anyone worry
about case in an AddCharset?

If some of those were in fact browser bug workarounds, shouldn't
they be accompanied by *at least* a URL referencing a
report/discussion of the bug concerned?

BTW, this is supposed to be working on Bug 23421 - see my other post.

BTW2, there seems to have already been a duplicate entry for Big5 :-)

-- 
Nick Kew


Re: AddDefaultCharset and Bug 23421

2004-08-02 Thread Nick Kew
On Mon, 2 Aug 2004, [ISO-8859-15] Andr Malo wrote:

 * Nick Kew [EMAIL PROTECTED] wrote:

  Our shipping with AddDefaultCharset preconfigured is causing lots of
  pages to be served with a bogus charset, typically where authors
  rely on meta http-equiv ... and either don't know how to fix it
  or lack permission.

 *shrug*, removing AddDefaultCharset creates the same kind of problem, just
 other way 'round.

No, because it enables the de-facto meta http-equiv ... hack.

  Bundling AddDefaultCharsets help users fix this.

 What does that mean?

It means I mistyped AddCharset.

  But really we need
  to do two more things.  One is to update the documentataion - perhaps
  a tutorial on the subject.

 This doesn't solve the problem, you've described above. If it's a lack of
 permission, the tutorial won't help. If it's missing knowledge, the brand new
 tutorial won't even be read (same as documentation before).

If there's a way to deal with this without .htaccess, then no permissions
is less likely to be an issue.  And the more clueful users will find a
tutorial.

  The other is to turn multiviews on by
  default, so authors whose sysops stick with defaults and offer no
  privileges can deal with it without having to hardwire
  a href=foo.html.gb2312my chinese page/a into their HTML.

 The purpose of the shipped default config is not to administer all the
 boxes out there. It's just a goodie that you can see, that your apache is
 running.

In theory ...

 I'm very -1 on turning on MultiViews, since it's very annonying (and
 expensive) if you don't want it (and you have a lazy sysadmin, lack of
 permissions like described above).

OK, fair point.

 Actually I think, it would be way better to shorten the default config to
 something very small, which just shows the indexpage and let the people
 configure their server themselves. And hey, suddenly the bug reports go to the
 admins (where they belong) and not to us.

Sounds to me like wishful thinking there.  We still get expected to
sort out bugs arising in rpm, deb, emerge, etc packages that bear little
or no resemblence to httpd-std.conf.

  Does this make sense?  Or could we simply drop the AddDefaultCharset
  from the installation default as suggested by Duerst and others?

 Pragmatically, I think, let's just drop it and we're fine :)

Sounds good to me:-)

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sat, 7 Aug 2004, Justin Erenkrantz wrote:

 That's a slightly different story.  2.1 has the fix for this (proxy_http.c
 r1.166), but it never got back ported to 2.0.

We have a lot of proxy updates in 2.1, which are presumably getting
test-driven over time.  How would one go about proposing a wholesale
backport?

 2.0's STATUS says:
  * Rewrite how proxy sends its request to allow input bodies to
morph the request bodies.  Previously, if an input filter
changed the request body, the original C-L would be sent which
would be incorrect.

This is basically the same as an output filter changing the
content-length.  In the 2.0 architecture, the filter must take
responsibility for not sending a bogus length.  The only difference
is that Connection: close is an option in output.


Due to HTTP compliance, we must either send the body T-E: chunked
or include a C-L for the request body.  Connection: Close is not
an option. [jerenkrantz2002/12/08 21:37:27]
+1: stoddard, striker, jim
-1: brianp (we need a more robust solution than what's in 2.1 right now)
jerenkrantz (should be fixed, but I don't have time to do this)

 At this date (about 20 months later), I have no earthly idea what was
 wrong.  But, I'd suggest trying httpd-2.0 HEAD (aka httpd-2.1) and see if
 that fixes it.  Perhaps someone can remember why I agreed with Brian and
 what I never fixed...  ;-)  -- justin

Hmmm, did your fix merely chunk content, or compute C-L, or was it smart
enough to do the Right Thing according to whether the backend is
HTTP/1.1[1],  whether the content is short enough to fit in one heap
bucket, or whatever other criteria might be applied?

[1] Presumably we can only assume HTTP/1.1 backend in a controlled -
reverse proxy - case, and where the admin has configured it?

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sat, 7 Aug 2004, Jan Kratochvil wrote:

 Hi,

 Thanks for the great support - httpd-2.0 HEAD 2004-08-07 really fixes it.
 It even provides env variable proxy-sendchunks to select between compatible
 Content-Length (default) and performance-wise chunked.

Sounds pretty complete to me.  Of course you'd need to stick to C-L unless
you *know* the backend accepts chunks.

It occurs to me that a similar situation arises with CGI and chunked
input.  The CGI spec guarantees a content-length header, so presumably(?)
the code for dealing with that is already there somewhere, and will figure
in the AP_CHUNKED_DECHUNK option to the old handler-read functions.

  We have a lot of proxy updates in 2.1, which are presumably getting
  test-driven over time.  How would one go about proposing a wholesale
  backport?

 FYI Fedora Core 2 httpd already backports httpd-2.1 version of proxy_http.c
 although it was not so new snapshot to include resolving of my issues.
 Current CVS snapshot I Bugzilled them as
   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129391

 FYI backport of current mod_proxy is technically trivia - just copying raw
   mod_proxy.c
   mod_proxy.h
   proxy_http.c

It would also need proxy_util.c.  Not sure about ftp or connect.

 although it brings new domain-remapping functionality there.

Indeed.

 Although the proxy is OK now there still remains one problem:

 I think HTTP server MUST accept the request:
   POST ... HTTP/1.0 or HTTP/1.1
   [ no Content-Length ]
   [ no Transfer-Encoding ]
   Connection: close [ or even no Connection header at all]
   \r\n
   DATA

 according to RFC2616 section 4.4. Even httpd-2.1/CVS just assumes empty body.
 squid up to squid/2.5.STABLE5 at least responds by 411 Length Required.

Surely that only applies if the server can infer there's a request body.
How does it do that with neither C-L nor T-E to indicate a body?

If we could infer a body in such a case, then AIUI the following applies:
If a
   request contains a message-body and a Content-Length is not given,
   the server SHOULD respond with 400 (bad request) if it cannot
   determine the length of the message, or with 411 (length required) if
   it wishes to insist on receiving a valid Content-Length.

Maybe we should infer a body (and hence apply the above logic) in any
POST or PUT request?  If we do that, it begs the question of how to treat
unknown HTTP/extension methods (cf DAV), and suggests perhaps
RequireRequestBody should be made a configuration directive.

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sat, 7 Aug 2004, Roy T.Fielding wrote:

  Thanks for the great support - httpd-2.0 HEAD 2004-08-07 really fixes
  it.
  It even provides env variable proxy-sendchunks to select between
  compatible
  Content-Length (default) and performance-wise chunked.
 
  Sounds pretty complete to me.  Of course you'd need to stick to C-L
  unless
  you *know* the backend accepts chunks.

 If the client sent chunks, then it is safe to assume that the proxy
 can send chunks as well.  Generally speaking, user agents only send
 chunks to applications that they know will accept chunks.

The client could be sending chunks precisely because it's designed to
work with a proxy that is known to accept them.  That doesn't imply
any knowledge of the backend(s) proxied, which might be anything up to
and including the 'net in general.

Also bear in mind that we were discussing (also) the case where the
request came with C-L but an input filter invalidated it.

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sat, 7 Aug 2004, Jan Kratochvil wrote:

 What would happen in this case httpd would infer a body while no body would be
 found there?

Just consider a bog-standard GET.

 Therefore it should be safe to assume if no Content-Length and no chunked
 headers are present there MUST follow an optional body with the
 connection-close afterwards as 'persistent connection' MUST NOT be present.

Nope.  GET requests routinely have keep-alive, but don't have bodies.

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sun, 8 Aug 2004, [ISO-8859-15] André Malo wrote:

 A CGI script therefore should never trust Content-Length, but just read
 stdin until it meets an EOF.

That is well-known to fail in CGI.  A CGI must use Content-Length.

-- 
Nick Kew


Re: POST without Content-Length

2004-08-07 Thread Nick Kew
On Sat, 7 Aug 2004, Roy T.Fielding wrote:

  If the client sent chunks, then it is safe to assume that the proxy
  can send chunks as well.  Generally speaking, user agents only send
  chunks to applications that they know will accept chunks.
 
  The client could be sending chunks precisely because it's designed to
  work with a proxy that is known to accept them.  That doesn't imply
  any knowledge of the backend(s) proxied, which might be anything up to
  and including the 'net in general.

 Theoretically, yes.  However, in practice, that is never the case.

On the contrary!  I myself have done a great deal of work on a proxy
for mobile devices, for a household-name Client.  The client software
makes certain assumptions of the proxy that would not be valid on the
Web at large.  But the backend *is* the web at large.

  Also bear in mind that we were discussing (also) the case where the
  request came with C-L but an input filter invalidated it.

 I was not discussing that case.  The answer to that case is don't do
 that.
 Fix the input filter if it is doing something stupid.

That was one of the cases that started this thread.  I don't have an
example of this, but someone did.

-- 
Nick Kew


Proxy Load Balancer

2004-08-17 Thread Nick Kew

I've just looked at the new code - thanks folks.  My own interest in
proxying is with HTTP backends, both in forward and reverse contexts,
and doing so efficiently.

Couple of questions:

(1) The proxy balancer directives are implemented in mod_proxy.c,
not proxy_balancer.c.  Was this necessary?

(2) ISTR some discussion of generic connection pooling, but I don't
see it in the code.  Am I missing something, or is this still TBD?

-- 
Nick Kew


Re: Proxy Load Balancer

2004-08-17 Thread Nick Kew
On Tue, 17 Aug 2004, Graham Leggett wrote:

  (1) The proxy balancer directives are implemented in mod_proxy.c,
  not proxy_balancer.c.  Was this necessary?

 proxy_balancer should in theory provide the algorithm to do the
 balancing, while the generic directives to specify the members of the
 cluster could be generically specified.

Indeed.  But not all of us have a cluster or want clustering code.

  (2) ISTR some discussion of generic connection pooling, but I don't
  see it in the code.  Am I missing something, or is this still TBD?

 The connection pool is there, it's implemented using apr_reslist.

/me kicks himself for not looking inside proxy_util.c :-(

Thanks.

-- 
Nick Kew


Re: mod_deflate and no-gzip

2004-08-18 Thread Nick Kew
On Wed, 18 Aug 2004, Brian Akins wrote:

 Shouldn't we still set Vary: Accept-Encoding if no-gzip is set?

Hmmm, makes sense.  +1

Should we be looking at a wholesale backport from 2.1-head?  There are
a number of minor bugfixes, as well as the inflate output filter.

-- 
Nick Kew


Re: [PROPOSAL] HTTPD Website Suggestion

2004-08-22 Thread Nick Kew
On Sun, 22 Aug 2004, Shaun Evans wrote:

 This isn't about the source code but I thought this could be useful to
 somebody on here.

Have you read the section at the site about contributing?

 Please find attached a number of files (in tar.gz) that I have made to
 help improve the Apache HTTP Server website.

Please don't do that.  A URL for a tar.gz file is much friendlier on
peoples inboxes.

-- 
Nick Kew


Re: AddOutputFilterByType oddness

2004-08-24 Thread Nick Kew
On Tue, 24 Aug 2004, Graham Leggett wrote:

 I have just set up the most recent httpd v2.0.51-dev tree, and have
 configured a filter that strips leading whitespace from HTML:

 AddOutputFilterByType STRIP text/html

 The content is served by mod_proxy.

As it stands, that can't work.

It's a manifestation of the problem I'm addressing by reviewing
the filter architecture: see http://www.apachetutor.org/dev/smart-filter
and the Ideas for smart filtering thread here.

I actually have an implementation based on the discussion document and
addressing the concerns people raised in the thread.  I hope to find
time to finish the accompanying documentation and post it here round
about this coming weekend.

 http://httpd.apache.org/docs-2.0/mod/core.html#addoutputfilterbytype

 it says that filters are not applied by proxied requests (It does not
 give a reason why not).

The URL above makes it clear what's happening there.

-- 
Nick Kew


Re: AddOutputFilterByType oddness

2004-08-24 Thread Nick Kew
On Tue, 24 Aug 2004, Nick Kew wrote:

 I actually have an implementation based on the discussion document and
 addressing the concerns people raised in the thread.  I hope to find
 time to finish the accompanying documentation and post it here round
 about this coming weekend.

OK, since you seem to have a real-life use for it, here goes.  As I
said before, I wasn't planning to post without a little more testing
and accompanying documents and discussion, but what the ?
I'm sure I'll regret this premature posting 

Mini-Synopsis:


# 1. Declare a smart filter that dispatches on Content-Type
FilterDeclare   myfilterContent-Type


# 2. Declare your filter as a Provider, to run whenever Content-Type
#includes the string text/html
FilterProvider  myfilterSTRIP   $text/html


# 3. Set the smart filter chain to this filter where you want to apply it
Location scope-of-your-proxy
FilterChain =myfilter
/Location

-- 
Nick Kew/*  Copyright (C) 2004 Nick Kew

This is experimental code.  It may be copied and used only for
evaluation and testing purposes.

The copyright holder offers to the Apache Software Foundation
permission to re-license this code under the ASF license. 
This offer applies if and when the ASF accepts this code or
any derived work for inclusion in a future release of HTTPD.

Regardless of the above, the author undertakes to release the
work under a recognised open-source license in due course.
Information will be available at http://apache.webthing.com/
and/or http://dev.apache.org/~niq/
*/
#include ctype.h
#include string.h

/* apache */
#include httpd.h
#include http_config.h
#include http_log.h
#include apr_strings.h
#include util_filter.h
#include apr_hash.h

module AP_MODULE_DECLARE_DATA filter_module ;


#ifndef NO_PROTOCOL
#define PROTO_CHANGE 0x1
#define PROTO_CHANGE_LENGTH 0x2
#define PROTO_NO_BYTERANGE 0x4
#define PROTO_NO_PROXY 0x8
#define PROTO_NO_CACHE 0x10
#define PROTO_TRANSFORM 0x20
#endif

typedef apr_status_t (*filter_func_t)(ap_filter_t*, apr_bucket_brigade*) ;

typedef struct {
  const char* name ;
  filter_func_t func ;
  void* fctx ;
} harness_ctx ;

typedef struct mod_filter_provider {
  enum {
STRING_MATCH,
STRING_CONTAINS,
REGEX_MATCH,
INT_EQ,
INT_LE,
INT_GE,
DEFINED
  } match_type ;
  union {
const char* c ;
regex_t* r ;
int i ;
  } match ;
  ap_filter_rec_t* frec ;
  struct mod_filter_provider* next ;
#ifndef NO_PROTOCOL
  unsigned int proto_flags ;
#endif
} mod_filter_provider ;

typedef struct {
  ap_filter_rec_t frec ;
  enum {
REQUEST_HEADERS,
RESPONSE_HEADERS,
SUBPROCESS_ENV,
CONTENT_TYPE
  } dispatch ;
  const char* value ;
  mod_filter_provider* providers ;
#ifndef NO_PROTOCOL
  unsigned int proto_flags ;
  const char* range ;
#endif
} mod_filter_rec ;

typedef struct mod_filter_chain {
  const char* fname ;
  struct mod_filter_chain* next ;
} mod_filter_chain ;

typedef struct {
  apr_hash_t* live_filters ;
  mod_filter_chain* chain ;
} mod_filter_cfg ;

static int filter_init(ap_filter_t* f) {
  mod_filter_provider* p ;
  int err ;
  harness_ctx* ctx = f-ctx ;
  mod_filter_cfg* cfg
= ap_get_module_config(f-r-per_dir_config, filter_module);
  mod_filter_rec* filter
= apr_hash_get(cfg-live_filters, ctx-name, APR_HASH_KEY_STRING) ;
  for ( p = filter-providers ; p ; p = p-next ) {
if ( p-frec-filter_init_func ) {
  if ( err =  p-frec-filter_init_func(f), err != OK ) {
break ; /* if anyone errors out here, so do we */
  }
}
  }
  return err ;
}
static filter_func_t filter_lookup(request_rec* r, mod_filter_rec* filter) {
  mod_filter_provider* provider ;
  const char* str ;
  const char* cachecontrol ;
  int match ;
  unsigned int proto_flags ;

  /* Check registered providers in order */
  for ( provider = filter-providers; provider; provider = provider-next) {
match = 1 ;
switch ( filter-dispatch ) {
  case REQUEST_HEADERS:
str = apr_table_get(r-headers_in, filter-value) ;
break ;
  case RESPONSE_HEADERS:
str = apr_table_get(r-headers_out, filter-value) ;
break ;
  case SUBPROCESS_ENV:
str = apr_table_get(r-subprocess_env, filter-value) ;
break ;
  case CONTENT_TYPE:
str = r-content_type ;
break ;
}
/* treat nulls so we don't have to check every strcmp individually
 Not sure if there's anything better to do with them
*/
if ( str == NULL ) {
  if ( provider-match_type == DEFINED ) {
if ( provider-match.c != NULL ) {
  match = 0 ;
}
  }
} else if ( provider-match.c == NULL ) {
  match = 0 ;
} else {
/* Now we have no nulls, so we can do string and regexp matching */
  switch ( provider-match_type ) {
case STRING_MATCH:
  if ( strcasecmp(str, provider

Smart filtering Module

2004-08-28 Thread Nick Kew

I posted my proposed smart filter module a few days ago, in response
a post here identifying a situation where it is relevant.

I have now completed a first version of the accompanying manual page.
I attach:
mod_filter.xml
mod_filter.xml.meta
mod_filter.html
Two images used to illustrate the module
I've also uploaded the HTML to http://www.apache.org/~niq/ .

I believe I have addressed the concerns raised when I mooted the idea
of this some weeks ago:
  * Existing filters are binary-compatible with the new module
  * I've restored the filter_init handlers to the architecture
  * I've retained my proposal to enable dealing with aspects of
the HTTP protocol on behalf of filter.  However, the default
is always for the filter harness to do nothing, and leave the
filter provider (a content filter module) to deal with that
as before.

Working code and documentation (modulo bugs and TODOs) should help
demonstrate the purpose and utility of the proposal, and move the
discussion forward.  I'd like to offer this as a contribution to the
core httpd distribution, to be included as standard in 2.2.

What is currently implemented is the basic architecture as described
before.  Configuration is fully dynamic, with my proposed set of
configuration directives now implemented.

Note that the module only applies to output filters and will only
work with AP_FTYPE_RESOURCE or CONTENT_SET filters.  I don't see a
need for this functionality elsewhere (but I'm open to persuasion:-)

The main TBD is an ap_filter... API interface for other modules to
work actively with it.  To implement that, I will need to merge the
ap_filter_rec_t structure into the mod_filter_rec.  This will be
binary back-compatible (the new fields go on to the end of the
ap_filter_rec_t), but will of course require commits to code outside
the module, specifically util_filter.

A second TODO is to enable mod_filter to run as a provider for itself.
The purpose of this is to enable chaining of configuration rules beyond
what we can already do by setting an environment variable with
mod_rewrite and dispatching on an env= variable (example: insert
DEFLATE depending on both Accept-Encoding request header and
Content-Type response header.  mod_rewrite can't do that because
it runs too early to be sure to have the response headers).


-- 
Nick Kew?xml version=1.0?
!DOCTYPE modulesynopsis SYSTEM ../style/modulesynopsis.dtd
?xml-stylesheet type=text/xsl href=../style/manual.en.xsl?
!-- $Revision: 1.18 $ --

!--
 Copyright 2004 The Apache Software Foundation

 Licensed under the Apache License, Version 2.0 (the License);
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

modulesynopsis metafile=mod_filter.xml.meta

namemod_filter/name
descriptionContext-sensitive smart filter configuration module/description
statusExtension/status
sourcefilemod_filter.c/sourcefile
identifierfilter_module/identifier
compatibilityApache 2.0 and higher/compatibility

summary
pThis module enables smart, context-sensitive configuration of
output content filters.  For example, apache can be configured to
process different content-types through different filters, even
when the content-type is not known in advance (e.g. in a proxy).
/p
/summary

section id=smarttitleSmart Filtering/title
pIn the traditional filtering model, filters are inserted unconditionally
using directive module=coreAddOutputFilter/directive and family.
Each filter then needs to determine whether to run, and there is little
flexibility available for server admins to allow the chain to be
configured dynamically./p
pmod_filter by contrast gives server administrators a great deal of
flexibility in configuring the filter chain.  In fact, filters can be
inserted based on any Request Header, Response Header or Environment
Variable.  This generalises the limited flexibility offered by
directive module=coreAddOutputFilterByType/directive, and fixes
it to work correctly with dynamic content, regardless of the
content generator.  The ability to dispatch based on Environment
Variables offers the full flexibility of configuration with
modulemod_rewrite/module to anyone who needs it./p

/section
section id=termstitleFilter Declarations, Providers and Chains/title
img src=oldfilter.gif alt=/
pIn the traditional model, output filters are a simple chain
from the content generator (handler) to the client.  This works well
provided the filter chain can be correctly configured, but presents
problems when the filters need to be configured dynamically based

Re: Smart filtering Module

2004-08-28 Thread Nick Kew
On Sat, 28 Aug 2004, Graham Leggett wrote:

 Nick Kew wrote:

  I posted my proposed smart filter module a few days ago, in response
  a post here identifying a situation where it is relevant.

 Ideally there should be one way of loading modules, not two - if it is
 practical to fix AddOutputFilter, then that should be done, otherwise if
 mod_filter works then I would suggest scrapping AddOutputFilter in
 favour of the mod_filter module.

Thanks for the comment.  I agree in principle.
In practice, there are a couple of issues to deal with.

As it stands, mod_filter is only applicable to content filters, so
protocol, connection and network  filters have to be dealt with
separately.  That may be irrelevant: are there real-life examples
of non-content filters being configured in httpd.conf?

Secondly, removing existing directives will of course break configs
for users.  Do we want to do that without deprecating them first?

Finally, what I've implemented is a 2.0 module.  There's a fair bit
of integration work to do in util_filter to eliminate duplication of
functionality between the old and the new.  And I expect breakage
while that's work-in-progress.

-- 
Nick Kew


Bug 18388: cookies

2004-08-29 Thread Nick Kew

Trawling through a few bugs, this one looks valid to me: namely,
Set-Cookie headers should be enabled on 304 responses.

The current behaviour has a rationale, but I believe it's incorrectly
applied.  Set-Cookie is a response header and does not affect a
cached entity body, so there's no reason to suppress it.

The patch is a one-liner.  Unless anyone can come up with a reason
why it might open a security hole, I'll apply it.

-- 
Nick Kew


Re: Bug 18388: cookies

2004-08-29 Thread Nick Kew
On Sun, 29 Aug 2004, Jim Jagielski wrote:

 I myself would define the cookie header as an entity header,
 since it *is* meta data about the body, but I can also see
 it as a more traditional response header as well.
 But wouldn't adding new info about the response (either
 as a response header or entity header) invalidate it
 actually *being* 304 (Not Modified)?

Would it?  A cookie is not data about the body.  The nearest analogy
amongst headers explicitly discussed in rfc2616 is authentication,
and the relevnt authentication headers *are* returned with a 304.
So are Content-Location, ETag and Vary: surely headers that would
invalidate the 304-ness if they were to change between requests?

Perhaps a better approach to 304 headers would be to explicitly
exclude entity headers as enumerated in rfc2616, rather than
explicitly include non-entity headers?  That means the default
for proprietary extensions (which HTTP explicitly permits) becomes
to allow them in a 304.

-- 
Nick Kew


Re: FYI: bug statistics httpd-1.3/httpd-2.0

2004-08-29 Thread Nick Kew
On Mon, 30 Aug 2004, Erik Abele wrote:

 I'm building some simple (but nice) statistics based on the
 weekly bug reports mailed out to the dev- and bug-list:

 http://www.apache.org/~erikabele/httpd/bugstats/

 The stats are updated every sunday just after the reports
 are mailed out. The script which produces the PNGs is also
 available at the above URL.

We've just been discussing this on IRC.

We have a gradual but inexorable accumulation of bugs.  Some of them
have become kind-of permanent fixtures, perhaps because noone can
reproduce them or noone feels willingable to deal with them.
Rather a lot of them are simply too vague to deal with.

Perhaps what we need is more clearly defined responsibilities in
dealing with them.  Each component has a bugmaster tasked with
dealing with bugs attached to the component.  That could mean
fixing it, discussing it on this list (including asking for
someone else to fix it), and crucially also taking responsibility
for closing bugs with INVALID, WONTFIX or WORKSFORME where no
satisfactory resolution is/seems feasible.


-- 
Nick Kew


Smart filtering and mod_filter

2004-08-31 Thread Nick Kew

I've made some further updates both to the code and documentation
since posting.  One change is to support inserting a harness anywhere
in the filter chain.  This addresses the point Graham raised about
having two separate mechanisms: it means the old mechanism can be
entirely replaced (provided of course I provide easy-reading
howto upgrade documentation).

I would find this easier if it were under CVS, and I'd like to put
it under httpd CURRENT, at modules/experimental/mod_filter.c (plus
the corresponding documentation pages, of course).  That should help
with integration ahead of 2.2, including test-driving with existing
filter modules, and will make it easier to coordinate the API
updates in util_filter.

Is this something that wants a vote?  Anyone else have strong
feelings for or against putting mod_filter under CVS?

-- 
Nick Kew


Re: Bug 18388: cookies

2004-08-31 Thread Nick Kew
On Mon, 30 Aug 2004, Geoffrey Young wrote:

 [replying to my words - largely chopped]
  Perhaps a better approach to 304 headers would be to explicitly
  exclude entity headers as enumerated in rfc2616, rather than
  explicitly include non-entity headers?  That means the default
  for proprietary extensions (which HTTP explicitly permits) becomes
  to allow them in a 304.

 fwiw, this was discussed a few times in the archives.  the one that comes to
 mind for me is this from doug:

   http://marc.theaimsgroup.com/?l=apache-httpd-devm=99298523417784w=2

That thread seems to be the same basic issue, but with reference to
RFC1945.  2616 includes additional explanation, and seems more clearly
to support the view that not only cookies but arbitrary unknown headers
(if any) should be allowed.

In the bug report 18388,  Ryan J Eberhard wrote:
It is also important to note that all other major web servers
(IIS, iPlanet, and Domino) will return Set-Cookie headers on a
304 status.
I'm in no position to confirm or deny that, but it tends to support the
proposition, and suggest that if it caused trouble in the Real World
then we could expect to know about it.

 personally, I tend to see it more from doug and nick's perspective and would
 be inclined to fix a long-standing issue that never made sense to me, but
 roy wrote the book and has unique insight here, so...

Hmm.  Would proposing it in STATUS for a vote be appropriate here?
I think if anyone wants to veto it, we should have a reason that
addresses Doug's and Ryan's arguments on the record.

-- 
Nick Kew


Re: cvs commit: httpd-2.0/server util.c

2004-09-01 Thread Nick Kew
On Wed, 1 Sep 2004, Jeff Trawick wrote:


 I can't see how this ever worked before :(  Any comments from the crowd?

FWIW, I fised that one in the proxy context about two months ago.
But I haven't looked at it in the general case.

-- 
Nick Kew


Re: cvs commit: httpd-2.0/server util.c

2004-09-01 Thread Nick Kew
On Wed, 1 Sep 2004, Jeff Trawick wrote:

 On Wed, 1 Sep 2004 20:36:07 +0100 (BST), Nick Kew [EMAIL PROTECTED] wrote:
 
  FWIW, I fised that one in the proxy context about two months ago.
  But I haven't looked at it in the general case.

 was that this change entry?

   *) mod_proxy: multiple bugfixes, principally support cookies in
   ProxyPassReverse, and don't canonicalise URL passed to backend.
   Documentation correspondingly updated. [Nick Kew nick webthing.com]

Yes, that sounds right.  Though I think the CHANGES entry may have
lagged the actual update.  A quick look at CVS shows a datestamp of
Tue Jun 29 06:37:21 2004 UTC

-- 
Nick Kew


Re: a simple question

2004-09-02 Thread Nick Kew
On Thu, 2 Sep 2004, Manos Moschous wrote:

 (a dumb subject line)

 I have a file opened

 FILE *fcp;
 fcp = fopen(file_to_save, wb);

using the apr_ file API is preferred.

 //I want to save the data to the file
 //How can i do that

The tmpfile_filter in mod_upload does that.  Feel free to look at the
source.

-- 
Nick Kew


Re: Time for 2.0.51 and 2.1.0

2004-09-02 Thread Nick Kew
On Thu, 2 Sep 2004, Henri Gomez wrote:

 Bad news for me and many others since without AJP support included in
 2.0.x, users will still require to have mod_jk to link there HTTPD to
 Tomcats.

 Could we hope the dev team to relax the situation for mod_proxy/ajp in
 future 2.0.x release, since Graham, Mladen and Jean-Frederic works
 hard to make mod_proxy as stable as possible even now with AJP support
 ?

Have you tried running the new proxy code with 2.0.x?  It worked fine
last time I tested it seriously (following my updates to mod_proxy at
the end of June).

That way you're testing the new code in a stable harness.

-- 
Nick Kew


Re: Removing the Experimental MPMs in 2.2?

2004-09-03 Thread Nick Kew
On Thu, 2 Sep 2004, Paul Querna wrote:

 Any other opinions about not including these MPMs?

Basically agree.

But modules are on a sliding scale between fully-working and broken.
We have modules/experimental that includes pre-stable stuff that may
or may not get fixed within a reasonable timescale: what should their
status be?  I wouldn't suggest removing them, but perhaps we could
flash up a prominent WARNING when you configure/build them?

-- 
Nick Kew



Re: HTTP proxy working for folks on 2.1-dev?

2004-09-09 Thread Nick Kew
On Thu, 9 Sep 2004, Mladen Turk wrote:

 Q:
 Is it possible to have forward and reverse proxies mixed together
 on the same box?

Of course!  I have that defined in different virtual hosts,
but AFIACS it should also work fine simply using Location for
the reverse proxies and Proxy for the forward.

-- 
Nick Kew


ap_log_perror behaviour and LogLevel?

2004-09-10 Thread Nick Kew

This has been nagging me for a while, first with reference to
mod_diagnostics, and now with mod_filter.

log_error_core takes a server_rec argument.  If that argument is
NULL, it will return without logging anything unless loglevel is
APLOG_NOTICE or greater than ap_default_loglevel.  ap_log_perror
calls log_error_core with server==NULL, so verbose LogLevels fail.

Later in log_error_core is another test:
   if ((level  APLOG_STARTUP) != APLOG_STARTUP) { ... }
which looks more appropriate.

Is there a reason for this behavoiur?  I'd like to be able to use
ap_log_perror with LogLevel debug or info.


-- 
Nick Kew


Re: Smart filtering Module

2004-09-10 Thread Nick Kew

OK, following on from a couple of weeks ago, I've committed mod_filter
to cvs.  That includes mod_filter.c and relevant documentation, which
are more-or-less in sync.  Please review.

Assuming the work gets and survives wider review, the next stage in
this work is closer integration with util_filter (protocol should be
configured in ap_ calls rather than httpd.conf), and to investigate
whether it can be harnessed to fix architectural bugs like PR#17629.

-- 
Nick Kew


Re: Smart filtering Module

2004-09-11 Thread Nick Kew
On Sat, 11 Sep 2004, NormW wrote:

 ### mwccnlm Compiler:
 #File: mod_filter.c
 # -
 # 118:  { apr_bucket_type_mmap, MMAP } ,
 #   Error: 
 #   undefined identifier 'apr_bucket_type_mmap'

 Can this be bracketed with #if APR_HAS_MMAP or is MMAP 'manatory'?

Thanks for the feedback.  Yes of course, I've just patched that
(and grepped apr_buckets.h for any other APR_HAS_* that might
bite on some other platform).

The  whole function it's in exists purely to support reporting bucket
types for the FilterDebug option.

-- 
Nick Kew


Re: Smart filtering Module

2004-09-11 Thread Nick Kew
On Sat, 11 Sep 2004, NormW wrote:

 Good evening still

:-)

 Got your update and it all now glues together nicely.
 I assume you will wait for a 'vote' to commit a hooked-in build file for
 mod_filter?

Well, I'm more waiting for more feedback.  So far I got quite a lot of
comments when I first floated the concept, less when I posted a first-
pass implementation, and only yours on introducing it to CVS.  Maybe
what it needs now is an updated roadmap to stimulate discussion?

As for a hooked-in build file, I have yet to RTFM for what that involves.

-- 
Nick Kew


Re: Bundling APR in 2.2

2004-09-16 Thread Nick Kew
On Thu, 16 Sep 2004, Paul Querna wrote:

 In most of the Apache 2.0.XX releases, we have been using a CVS snapshot
 of APR and APR-Util.

 I would like to make it an official policy that for the 2.2 cycle, we
 will never use a CVS snapshot of APR.

That makes httpd releases (relatively frequent) hostage to APR releases
(extremely infrequent) when we need a bugfix in CVS.  Is that acceptable?

 I believe we should still bundle APR and APR-Util with HTTPd, but we
 should only use the released versions of each.

Release version ABI yes.  Release version - only if that dependency
can be fixed (i.e. APR folks can be hurried along where necessary).


 It will also make life much easier for System Packagers.  If we only use
 released versions, APR and APR-Util can be easily placed into separate
 packages.  This will become more important as more standalone
 applications use APR.

Keeping binary-compatibility (ABI) is sufficient for that, innit?

-- 
Nick Kew


Re: Shorten the default config and the distribution (was: IfModule in the Default Config)

2004-09-19 Thread Nick Kew
On Tue, 14 Sep 2004, [ISO-8859-15] Andr Malo wrote:

 * Paul Querna [EMAIL PROTECTED] wrote:
 (chop)
  Using the Source File name seems completely non-intuitive to me.

Agreed.

 I'm rather for removing the whole crap from the default config and simplifiy
 as much as possible.

I'd be cautious about that.  The default httpd.conf contains a fair chunk
of documentation that isn't available elsewhere.  We need to work
carefully on making sure this isn't lost.

 A 30 KB default config, which nobody outside this circle here
 really understands, isn't helpful - especially for beginners.

I disagree.

Think about a situation where you're on the learningcurve for working with
a big package.  A big and well-commented config file is the most useful
thing available.  I'm thinking of compiling kernels, and contrasting Linux
(where make menuconfig is very nice but hides what's really happening)
with FreeBSD, where keeping the LINT config open in another window
while editing my config is the absolute best documentation I could wish.

If the default is shortened, we should package a long and highly-commented
file in the manner of LINT.  It would be nice also to integrate the
documentation in httpd.conf into the main docs as and when round tuits
can be sourced.

 In the same cycle we could remove the docs from the default distribution and
 start distributing them officially as separate packages. (But we could
 distribute a separate config snippet for the multilingual docs, which can
 be included in the httpd.conf). The more translations we add, the less
 applicable is it to include the whole doc tree.

Hmmm, does that risk generating a higher volume of dumb-newbie questions
in all the public fora?  And perhaps also apache-is-hard articles in
the press?

-- 
Nick Kew


Re: AddOutputFilterByType oddness

2004-09-22 Thread Nick Kew
On Sat, 18 Sep 2004, Justin Erenkrantz wrote:

  But ap_add_output_filters_by_type() explicitly does nothing for a
  proxied request.  Anyone know why?  AddOutputFilterByType DEFLATE
  text/plain text/html seems to work as expected here for a forward proxy
  with this applied: maybe I'm missing something fundamental...

 My recollection is initially it didn't have the proxy check, then FirstBill
 had a reason why proxied requests shouldn't work with AddOutputFilterByType.

I've said it before and I'll say it again: AddOutputFilterByType is
fundamentally unsatisfactory.  This confusion is an effect, not cause.

* Configuration is inconsistent with other filter directives.  The
  relationship with [Set|Add|Remove]OutputFilter is utterly unintuitive
  and, from a user POV, broken.
* Tying it to ap_set_content_type is, to say the least, hairy.
  IMO we shouldn't *require* modules to call this, and it's utterly
  unreasonable to expect that it will never be called more than once
  for a request, given the number of modules that might take an interest.
  Especially when subrequests and internal redirects may be involved.
* It's a complexity just waiting for modules to break on it.

I've made some more updates to mod_filter since I last posted on the
subject, and I'm getting some very positive feedback from real users.
For 2.2 I'd like to remove AddOutputFilterByType entirely, replacing
it with mod_filter.

mod_filter can also obsolete [Set|Add|Remove]OutputFilter, though I'm
in no hurry to do that.  What I can also do is re-implement all the
outputfilter directives within mod_filter and its updated framework.

-- 
Nick Kew


Reviewing the Filtering API

2004-09-22 Thread Nick Kew

The 2.0 filter chain is a great tool: for me it's _the_ major innovation
that turns httpd-2.0 from a (mere) webserver to a powerful applications
platform.  But extensive working with it highlights weaknesses.
The introduction of AddOutputFilterByType sought to address one of the
weaknesses, but it's a bolt-on that doesn't really fit, and is
problematic.  And even if fully successful, it's limited.

As you know, I'm proposing a new filtering framework, and have
implemented (modulo bugs) the main functionality in mod_filter.

Until yesterday, this was implemented purely as a module, suitable for
use with httpd-2.0 and its filters.  That meant some inevitable
duplication of data structures and inefficiency.  It now has several
users running the module with 2.0, and I propose to maintain a version
that can be used with 2.0 without patching or recompiling anything.
But the main thrust is towards tighter integration for 2.2.

Yesterday I made the first move towards integration, by merging the
most important data structs with util_filter and adding a couple of
new API calls (on which more below).

I got some useful feedback on this list when I first mooted the idea
that is now mod_filter, and more recently from users of mod_filter
(my filter_init is badly broken - fix to come).

But I'd like to broaden that into a wider review of filtering.


*** A few issues with util_filter in 2.0:

ap_filter_type
==

Making this an enum and then using values like AP_FTYPE_[anything] + 5
(as is done in, for example, mod_ssl) makes no sense.  An int with
a set of #defined values makes more sense.

ap_filter_t
===

This inclues both request_rec and conn_rec fields, but the request_rec
is invalid in content-level filters, while the conn_rec is of course
available from the request_rec where valid.  So, shouldn't that be a
union?


Documentation
=

I recently fixed PR:19688, but there are other less critical issues
outstanding, such as
 * @param ftype The type of filter function, either ::AP_FTYPE_CONTENT or
 *  ::AP_FTYPE_CONNECTION




* Simplifying Filtering

Yesterday I introduced two new API functions in util_filter:

ap_register_output_filter_protocol
ap_filter_protocol

together with a set of associated #defines

The first function is ap_register_output_filter with an additional
argument proto_flags.  The second sets proto_flags during a request.
The purpose of these is to enable filters to 'opt out' of concerning
themselves with the lower-level details of supporting HTTP.

Example: mod_include

mod_include is a typical output content filter, in that it changes the
data passing through, including changing the byte count.  It's almost
certainly the most widely known and used such filter.

As it stands, it correctly unsets content length, and it deals with
cacheing/Last-Modified in its own way based on configuration (XBitHack).
But it also has some bugs: for example:
* if a Content-MD5 is set, it doesn't unset it.  Likewise an ETag.
* it won't work correctly if served partial contents, but it
  does nothing to prevent that happening (vide discussion on
  handling ranges a couple of months ago).
For mod_include to deal fully with these is a significant burden on the
modules authors.

The new API calls offers mod_include the opportunity to be simplified
at the same time as fixing edge-case bugs such as those I've discussed.
A simple way is to replace the existing

ap_register_output_filter(INCLUDES, includes_filter, includes_setup,
  AP_FTYPE_RESOURCE);

with the new variant

ap_register_output_filter_protocol(INCLUDES,
includes_filter, includes_setup, AP_FTYPE_RESOURCE,
AP_FILTER_PROTO_CHANGE | AP_FILTER_PROTO_CHANGE_LENGTH
| AP_FILTER_PROTO_NO_BYTERANGE );

This causes mod_filter to unset all headers that are invalidated by
the module's content transformation, and prevent it getting byteranges
from the backend.  With this, mod_include still has to process XBitHack
and cacheing headers itself - these are very specific to SSI and don't
generalise to other filters - but mod_filter does everything else.

As with any other filter, mod_include will run unchanged within the
new framework by simply ignoring the additional API calls.

I need review on this, and I need to fix my existing code.  But looking
ahead, any problems with a wider-ranging review of util-filter, including
but not limited to fixing the problems identified above?

-- 
Nick Kew


Re: AddOutputFilterByType oddness

2004-09-22 Thread Nick Kew
On Wed, 22 Sep 2004, Justin Erenkrantz wrote:

 --On Wednesday, September 22, 2004 5:01 PM +0100 Nick Kew [EMAIL PROTECTED]
 wrote:

  I've said it before and I'll say it again: AddOutputFilterByType is
  fundamentally unsatisfactory.  This confusion is an effect, not cause.

 Suffice to say, I disagree.

  * Configuration is inconsistent with other filter directives.  The
relationship with [Set|Add|Remove]OutputFilter is utterly unintuitive
and, from a user POV, broken.

 I think it's really clear from the user's perspective.  I think the problem
 comes in on the developer's side.

It seems to me heavily counterintuitive that mixing ByType directives
with anything else means that the ByType filters *always* come last.
And that Remove won't affect them, but will affect others.

  * Tying it to ap_set_content_type is, to say the least, hairy.
IMO we shouldn't *require* modules to call this, and it's utterly
unreasonable to expect that it will never be called more than once
for a request, given the number of modules that might take an interest.
Especially when subrequests and internal redirects may be involved.

 We have *always* mandated that ap_set_content_type() should be called rather
 than setting r-content_type.  (I wish we could remove content_type from
 request_rec instead.)

Indeed.  But that doesn't prevent it being called multiple times, perhaps
from different modules.  So using it to insert filters leaves lots of
potantial for trouble.

  * It's a complexity just waiting for modules to break on it.

 Anything that depends upon content-type like this is going to be hairy because
 there may be several 'right' answers during the course of the request.

Indeed.  mod_filter addresses this by configuring at the last moment,
so any earlier set_content_type()s are irrelevant.  I don't suppose it's
a panacaea for everything, but I do think it's a significant improvement
on what we have.

  I've made some more updates to mod_filter since I last posted on the
  subject, and I'm getting some very positive feedback from real users.
  For 2.2 I'd like to remove AddOutputFilterByType entirely, replacing
  it with mod_filter.

 I've yet to see a clear and concise statement as to how mod_filter will solve
 this problem in a better and more efficient way.  (Especially from a user's
 perspective, but also from a developer's perspective.)

From the user's perspective, it's simply more powerful and flexible.
Works with any request or response headers (not just content-type) or
environment variables.  Gets rid of constraints on ordering, like
AddOutputFilterbyType filter always coming after other filters
regardless of ordering in httpd.conf.

Example: I have a user who wants to insert mod_deflate in a reverse
proxy, but only for selected content-types AND not if the content
length is below a threshold.  How would he do that with the old filter
framework?

From a developers perspective, I wrote it for myself, and have at least
two other developers using it operationally in their product.  Time will
tell what others may use it for.

 I will also comment that I looked in the mod_filter code the other day and was
 disappointed that it doesn't follow our coding style at all or even have
 comments that help people understand what it is trying to do inside the .c
 file.

When was that?  I made quite a lot of updates to the style towards
conforming (like eliminating tabs and realigning some braces) before
committing to CVS, but I'm willing to believe I need to look more
carefully.

-- 
Nick Kew


Re: AddOutputFilterByType oddness

2004-09-23 Thread Nick Kew
On Wed, 22 Sep 2004, Justin Erenkrantz wrote:

 --On Wednesday, September 22, 2004 6:17 PM +0100 Nick Kew
 [EMAIL PROTECTED] wrote:

  It seems to me heavily counterintuitive that mixing ByType directives
  with anything else means that the ByType filters *always* come last.
  And that Remove won't affect them, but will affect others.

 I think we could get Remove*Filter to also delete the content-type filters.

  Indeed.  mod_filter addresses this by configuring at the last moment,
  so any earlier set_content_type()s are irrelevant.  I don't suppose it's
  a panacaea for everything, but I do think it's a significant improvement
  on what we have.

 I'm concerned about the overhead of mod_filter having to check all of its
 rules each time a filter is invoked.  This is why I started to look through
 the code last night to see how it worked and how invasive it is.

It's improving with time (except when I introduce bugs...).  Merging in
the structs with util_filter saves on having to do superfluous lookups.

Basically it does the lookup/dispatch once per filter in the filterchain
per request.  It checks that filter's providers until it finds a match.
So for anything you could do with an [Add|Set]OutputFilter[ByType]
that's one lookup per request.

 How would you handle the situation when filter #1 sets C-T to be
 text/plain and then filter #2 sets C-T to be text/html?

mod_filter takes the content-type as it is at that point in the chain.

Isn't the real nightmare where a filter calls ap_set_content_type and
some AddOutputFilterByTypes are in effect?  I guess what *really* bothers
me is the idea of adding filters *as a side-effect*.

 And, then
 mod_deflate needs to be conditionally added (sub-case #1: it needs to be
 added for 'text/plain'; sub-case #2: it needs to be added for 'text/html').
 How and where is it added?  Are you inserting dummy filters?

I'm not sure I follow.  It will dispatch to deflate based on the
content-type (or other dispatch criterion) as it is at that point
in the chain.

So if the handler sets application/xml but that goes through an XSLT
filter which sets it to text/html, then mod_filter sees application/xml
if it's before the XSLT filter in the chain, or text/html after it.

How can AddOutputFilterByType expect to cope with that?


  From the user's perspective, it's simply more powerful and flexible.
  Works with any request or response headers (not just content-type) or
  environment variables.  Gets rid of constraints on ordering, like
  AddOutputFilterbyType filter always coming after other filters
  regardless of ordering in httpd.conf.
 
  Example: I have a user who wants to insert mod_deflate in a reverse
  proxy, but only for selected content-types AND not if the content
  length is below a threshold.  How would he do that with the old filter
  framework?

 I guess I'm not clear what the syntax is (I guess I should go read the
 docs).

That particular scenario is complex, and requires mod_filter to be
used as its own provider.  The point is, we *can* now support complex
setups (or will be - that chaining is still broken in CVS).

But FWIW I have that working locally with

FilterDeclare   filter1 Content-TypeCONTENT_SET
FilterDeclare   filter2 Content-Length  CONTENT_SET

FilterProvider  filter1 filter2 $text
FilterProvider  filter2 DEFLATE 4000

FilterChain filter1

to deflate all text/* documents of 4k or greater.


 I definitely don't want to see the filters be configured like
 mod_rewrite.  It needs to be fairly straightforward, but still fairly
 simplistic.  I don't want to have users have to read a complicated manual
 or docs to set up filters.  KISS.

Indeed.  Do you think the examples in the manual page are too complex?

Bear in mind that the third example is no more complex than the first two,
yet suddenly enables a frequently-requested capability that simply isn't
possible with the old filtering.

 Well, the point by you committing it into our tree is that the rest of us
 are now responsible for it.  That's why I brought up the code style issue:

OK, OKOK!   I promise to look harder at the code style guidelines!
And I _did_ ask on the list a couple of weeks before introducing to CVS.

 I looked yesterday afternoon (and haven't seen any commits since then).  I

That'll be the latest version.  Which FWIW was introduced prematurely
because it introduced a new feature demanded by a user.  Only that turned
out to be broken, which is why I'm re-hacking that now.

-- 
Nick Kew


Re: Bug 17629: SSI, CGI, and mod_deflate

2004-10-11 Thread Nick Kew
On Mon, 11 Oct 2004, [ISO-8859-15] Andr Malo wrote:

  It seems that calling an internal redirect from anywhere in an output
  filter is completely wrong.

 Nope. The real problem is that it's a *redirect within a subrequest*.

Erm - it seems to me you're both right.  Surely the underlying problem -
of which both the above are instances - is an internal redirect too late
in the request processing cycle.  An internal redirect after anything
could possibly have been sent down the [output filter chain|wire] is
broken.

The
 filterchain suddenly gets disconnected. What we need is kind of a glue filter
 which connects a subrequest (at connection level (a subrequest doesn't own
 a connection)) with the main one.

I'm struggling with how that should work, within the constraints of the
architecture we have.  I actually raised the question with Paul on IRC
in the hope that a solution would fall straight out of his Capturing a
Subrequest.  But it seems we're all stuck on partial insights.

I'm provisionally +1 on Paul's proposed fix, but I wonder if it should be
conditional on ap_is_initial_req, to leave untouched the 'normal' CGI
case.

-- 
Nick Kew


Re: [RFC] Patch for mod_log_config to allow conditioning on status code

2004-10-15 Thread Nick Kew
On Fri, 15 Oct 2004, Luc Pardon wrote:

   I patched mod_log_config.c (from the 2.0.51 distro) to allow
 conditional logging on HTTP status code, like so:

   CustomLog king-size.log common status=414

   The patch also supports not and lists (like the %.. syntax) and
 wildcards, e.g.:

   CustomLog ungood.log common status=!20x,3xx

   The changes are non-intrusive and the patch is of course backward
 compatible.

Sounds somewhat interesting, and (as you note) there's quite a lot of
demand from people who don't like 414 crap in their logs.  So that's
a good start.

But how does it work with piped log programs?  If I were implementing
this functionality, I'd probably hack rotatelogs rather than httpd.

   I already patched the docs and am willing to go the extra mile(s) to
 make it all nice, but the guidelines for contributing a patch say
 you're a conservative lot when it comes to new functionality.

Indeed, that's true.  But that's very minor functionality and clearly
tied in to an established core module, so unlikely to fall down on that.

The usual fate of patches in bugzilla is that, even if they are
appropriate for inclusion, they need a committer to take sufficient
interest to review and incorporate them.  A chronic shortage of round
tuits means this is rather hit-and-miss.

One more thing: I became aware that the flexible interface for
 mod_log_config patch (# 25014) also allows conditioning on status
 code(s), and there are three other contributed patches against
 mod_log_config waiting for a decision (# 28037, 29449 and 31311). I am
 willing to ensure compatibility with any or all of them if desired.

If you can fix a whole bunch of related bugs on bugzilla without your
patch becoming big and complex, that adds value but still doesn't
guarantee anything.

Ask yourself: is your code sufficiently different to anything we already
have to merit releasing separately as a third-party module?  If yes, then
do that.  If no, then it's probably appropriate to offer a patch.
My guess would be no.


-- 
Nick Kew


Re: [RFC] Patch for mod_log_config to allow conditioning on status code

2004-10-16 Thread Nick Kew
On Sat, 16 Oct 2004, Glenn Strauss wrote:

 I don't want to discourage Luc, but there's a steep uphill battle
 to getting anything into Apache 1.3.

Of course.  Apache 1.3 is an old, legacy application, and vastly less
capable than current versions.  It's still maintained, but noone is in
the business of adding new *features*.

2.1 is where interesting things happen, while 2.0 is intermediate: new
features may be added, but stability and binary-compatibility are more
important.  I might review and incorporate a third-party patch for 2.x,
but certainly wouldn't for 1.x unless someone was paying.

 diff -ruN apache_1.3.31/src/main/http_log.c apache_1.3.31-new/src/main/http_log.c
 --- apache_1.3.31/src/main/http_log.c   2004-02-16 17:29:33.0 -0500
 +++ apache_1.3.31-new/src/main/http_log.c   2004-05-24 12:26:06.0 -0400

Bugzilla is a good place for patches like that.  People who want it can
help themselves, without compromising stability.

-- 
Nick Kew


Re:[Bug 31759] - default handler returns output filter apr_status_t value

2006-09-12 Thread Nick Kew


On 12 Sep 2006, at 22:27, [EMAIL PROTECTED] wrote:



--- Additional Comments From [EMAIL PROTECTED]  2006-09-12  
21:27 ---
The PUT handler is a small 10 line script.  It absolutely doesn't  
return a code

70007 or anything other than 0 no matter how it finishes.

This is not resolved nor fixed.


The bug is fixed, because it refers explicitly to the default handler.

However, mod_cgi at line 840 and mod_cgid at line 1390 have the same  
issue
when the input filters return an error.  I think the easy fix is to  
return 500 there,

unless we can blame the client and return 400.

--
Nick Kew


Re: svn commit: r442758 - in /httpd/httpd/trunk/modules/generators: mod_cgi.c mod_cgid.c

2006-09-13 Thread Nick Kew
On Wednesday 13 September 2006 20:33, Ruediger Pluem wrote:

 Wouldn't it make sense to return OK even if rv != APR_SUCCESS in the case
 that c-aborted is set, just like in the default handler?

I'm not sure.  Presumably if c-aborted is set, then we have no client
to respond to, so this is just about housekeeping and what ends up
in the logs.  Do we want to log a successful POST or PUT when it wasn't?

-- 
Nick Kew


Re: svn commit: r442758 - in /httpd/httpd/trunk/modules/generators: mod_cgi.c mod_cgid.c

2006-09-13 Thread Nick Kew
On Wednesday 13 September 2006 22:31, Jeff Trawick wrote:
 On 9/13/06, Nick Kew [EMAIL PROTECTED] wrote:
  On Wednesday 13 September 2006 20:33, Ruediger Pluem wrote:
   Wouldn't it make sense to return OK even if rv != APR_SUCCESS in the
   case that c-aborted is set, just like in the default handler?
 
  I'm not sure.  Presumably if c-aborted is set, then we have no client
  to respond to, so this is just about housekeeping and what ends up
  in the logs.  Do we want to log a successful POST or PUT when it wasn't?

 Here is my understanding:

 The connection status (%c) is what the admin should check to confirm
 that there were no network I/O issues (at least none that caused TCP
 to give us an error up through the point when the request was
 complete).

 In many cases, an HTTP status code has already been written to the
 client before the I/O problem occurs anyway so changing the status
 code doesn't make sense.  A failure to read a request body would be 
 prior to the point where we could write a status code, but I don't see
 why the log analysis heuristic should be different.

So we should log an error, not a success.  500 won't always be the ideal
error, but I don't really see how we can do better within the current API.

 500 implies that there could be an action to take to resolve a problem
 (e.g., screwy filters bungled the return codes; screwy configuration;
 out of memory; ???).  It doesn't apply when somebody bored with an
 upload hit the Stop button.

So are you supporting Rüdiger's proposition?  I can accept that if it's
the popular view.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.prenhallprofessional.com/title/0132409674


Widespread confusion of apr_status_t and int values

2006-09-15 Thread Nick Kew
PR#31579 identified a bug with the default handler returning apr_status_t
values.  That was fixed.

But we have people reporting that the bug is not fixed.  What they're seeing
is the same bug elsewhere.  I just hacked up a fix in mod_cgi and mod_cgid,
which we've been discussing here over the last couple of days.  It's also
in mod_proxy (specifically, proxy_http - I didn't look elsewhere),
in both 2.0.x and trunk.  I wouldn't be at all surprised to find it in other
content generators.

I'm wondering if this would be working around one level up in the core.  
If a handler returns a value that's not OK/DECLINED and is out of range
for HTTP, then return 500 to the client, and log a buggy content
generator message to error_log.

Thoughts?

-- 
Nick Kew


  1   2   3   4   5   6   7   8   9   10   >