Re: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Paul Fee
Graham Leggett wrote:

 On 06 Sep 2010, at 11:00 PM, Paul Querna wrote:
 
 Isn't this problem an artifact of how all bucket brigades work, and is
 present in all output filter chains?

 An output filter might be called multiple times, but a single bucket
 can still contain a 4gb chunk easily.

 It seems to me it would be better to think about this holistically
 down the entire output filter chain, rather than building in special
 case support for this inside mod_cache's internal methods?
 
 In the cache case, thinking about it a bit the in and out brigades are
 probably unavoidable, as the cache is a special case in that it wants
 to write the data twice, once to the cache, a second time to the rest
 of the filter stack. Right now, the cache is forced to read the
 complete brigade to cache it, no option to give up early. And the
 cache has no choice but to keep the brigade buckets in the brigade so
 that they can be passed a second time up the filter stack, no deleting
 buckets as you go like you normally would. Read one 4GB file bucket in
 the cache, and in the process the file bucket gets morphed into 1/2
 million heap buckets, oops. With two brigades, one in, one out, the in
 brigade can have the buckets removed as they are consumed, as normal,
 and moved to the out brigade. The cache can quit at any time, and the
 code following knows what data to write to the network (out), and what
 data to loop round and resend to the cache (in). The cache provider
 could choose to quit and ask to be called again either because writing
 took too long, or too much data was read (and in the process became
 heap buckets), either reason is fine.
 
 That said, following on your suggestion of thinking about this in the
 general sense, it would be really nice if the filter stack had the
 option to say I have bitten off as much of the brigade as I am
 prepared to chew on right now, and the leftovers are still in the
 brigade, can you call me back with this data, maybe with more data
 added, and I'll try swallow some more?.
 
 In theory, that would mean all handlers (or entities that sent data)
 would no longer be allowed to make the blind assumption that the
 filter stack was willing to consume every possible set of buckets the
 handler wanted to send, and that the stack had the right to go I'm
 full, give me a second to chew on this.
 
 This wouldn't need separate brigades, probably just a return code that
 meant EAGAIN, and that was expected to be honoured by handlers.
 
 Regards,
 Graham
 --

Retrieving bodies from the cache has a similar scalability issue.  The 
CACHE_OUT filter makes a single call to the provider's recall_body().  The 
entire body must be placed in a single brigade which is sent along the 
filter chain with a single ap_pass_brigade() call.

If a custom provider is using heap buckets and the body is large, then this 
can consume too much memory.  It would be better to loop, asking the 
provider repeatedly for portions of the body until the provider provides an 
EOS bucket.  Is there interest in a patch implementing this approach?

Thanks,
Paul


RE: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Paul Fee
Plüm, Rüdiger, VF-Group wrote:

  
 
 -Original Message-
 From: Graham Leggett
 Sent: Montag, 13. September 2010 16:35
 To: dev@httpd.apache.org
 Subject: Re: mod_cache: store_body() bites off more than it can chew
 
 On 13 Sep 2010, at 4:18 PM, Plüm, Rüdiger, VF-Group wrote:
 
  It is not a problem for mod_disk_cache as you say, but
  I guess he meant for 3rd party providers that could only deliver
  the cached responses via heap buckets.
 
 The cache provider itself puts the bucket in the brigade, and
 has the
 power to put any bucket into the brigade it likes, including
 it's own
 custom developed buckets. The fact that brigades become heap buckets
 when read is a property of our bucket brigades, they aren't a
 restriction applied by the cache.
 
 For example, in the large disk cache patch, a special bucket was
 invented that represented a file that was not be completely present,
 and that blocked waiting for more data if the in-flight cache
 file was
 not yet all there. There was no need to change the API to
 support this
 scenario, the cache just dropped the special bucket into the brigade
 and it was done.
 
 Yeah, but in a tricky way, which is absolutely fine and cool if you cannot
 change the API, but the question is: Is this the way providers
 should go and does the API looks like as it should?
 
 Regards
 
 Rüdiger

Hi,

I'm familiar with the FILE bucket and have considered implementing a new 
bucket type that would have similar morphing properties for our custom 3rd 
party cache provider.

Currently a handler has the ability to call ap_pass_brigade multiple times 
hence can produce large bodies in small chunks.  The CACHE_OUT filter as 
currently implemented does not offer that, forcing a 3rd party provider to 
implement their own bucket type if HEAP buckets would occupy too much 
memory.

Changing CACHE_OUT filter to call recall_body() repeatedly until an EOS is 
obtained is a small change.  More importantly, it won't affect existing 
providers as they'll produce a brigade with an EOS bucket on their first 
invocation.

Custom bucket types may be a better approach, but shouldn't the CACHE_OUT 
filter be able to send the content in multiple brigades in the same way a 
handler would?

Thanks,
Paul


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Paul Fee
Graham Leggett wrote:
 
 Given that the make-cache-writes-atomic problem requires a change to
 the data format, it may be useful to look at this now, before v2.4 is
 baked, which will happen soon.
 
 How much of a performance boost is the use-null-terminated-strings?
 
 Regards,
 Graham
 --

If mod_disk_cache's on disk format is changing, now may be an opportunity to 
investigate some options to improve performance of httpd as a caching proxy.

Currently headers and data are in separate files.  If they were in a single 
file, the operating system is given more indication that these two items are 
tightly coupled.  For example, when the headers are read in, the O/S can 
readahead and buffer part of the body.

A difficulty with this could be refreshing the headers after a response to a 
conditional GET.  If the headers are at the start of the file and they 
change size, then they may overwrite the start of the existing body.  You 
could leave room for expansion (risks wasted space and may not be enough) or 
you could put the headers at the end of the file (may not benefit from 
readahead).

On a similar theme, would filesystem extended attributes be suitable for 
storing the headers?  The cache file's contents would be the entity body.  A 
problem with this approach could be portability.  However the APR could 
abstract this, reverting to separate files on platforms/filesystems that 
didn't offer extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

I haven't tested extended attributes to see if they offer performance gains 
over separate header and body files.  However it seems cleaner to have both 
parts in one file.  It should also eliminate race conditions where 
headers/body could get out of sync.

Thanks,
Paul


Re: [PATCH] tproxy2 patch to the apache 2.2.15

2010-08-13 Thread Paul Fee
JeHo Park wrote:

snip
 
 yes, i see,
 so i  also made tproxy4 apache patch  to the version httpd 2.2.9 and
 tested it in debian linux box successfully!. the software version i tested
 looks below --
 kernel:  vanilla 2.6.31 [tproxy4 included as default ]
 apache: 2.2.9 [tproxy4 patch applied]
 iptables: 1.4.3
 ebtables: 2.0.8
 --
 i tested the tproxy4 apache successfully in the debian lenny. but i met
 some strange things that was .. the same tproxy4 software did not operated
 correctly in the CentOS the main Environment me and our team developed in
 is not the debian but the CentOS so i had to give up the tproxy4.
 this is why i made the tproxy2 apache patch... in the kernel 2.6.18 CentOS
 kernel :-(

Can you share your tproxy4 based patches.  I think they're more interesting 
as they'll work across more distributions in the future.

RHEL6 beta has tproxy4 support, as will CentOS6 in time.  Your tproxy4 work 
will become usable when your main environment upgrades.

 
 
 Here's a post showing tproxy history, it recommends against tproxy2:
 https://lists.balabit.hu/pipermail/tproxy/2008-November/000994.html
 
 Bazsi suggests starting with tproxy4 for 2.6.17 and propagate that
 forward
 to a 2.6.18 kernel.  The tproxy4 API looks easier to use than tproxy2. 
 forex- Unfortunately I didn't find the tproxy4 for 2.6.17 kernel patch.
 
 really ?  great! i didn't know that !

Hopefully you can locate the tproxy4 for 2.6.17 patch as that would allow 
Apache to work consistently in both your environment and with 2.6.28+ 
kernels.

 
 but it seems wondering whether Bazsi do backport the tproxy4 kernel patch
 to the kernel 2.6.17 or 2.6.18 anyway recently, i applied my
 tproxy2 patch - exactly speaking, i modified or inserted some little bit
 codes to the existing patch --- to a commercial sites and then i found
 ..maybe .. tproxy2 is not real transparency.. because i had to insert some
 route infomations to the box for packet routing problems.
 
 
 However most important is to have future proof Apache changes that will
 be compatible with distros other than just CentOS5/RHEL5, for example
 RHEL6.

Although you're tied to CentOS5 now, I think Apache trunk would benefit more 
from tproxy4 patches.  The tproxy2 work has a limited future.

 
 Incidentally, how are you managing the iptables rules?  Is it assumed
 that
 these will be setup before Apache httpd is started?  Or do you think
 Apache should own the rules, creating them at startup and removing them
 on shutdown.
 yes, i see, both tproxy2 and tproxy4 need some L2 bridge, L3 or route
 rules by the iptables and etc so i always insert the rules before or after
 starting apache httpd. and i hope Apache don't own the rules. i call the
 deletion of the rules from the box as software bypass :-) i think it is
 not needed the Apache httpd own the rules .. for more easy debugging and
 other usages ..

Handling the iptables rules within Apache would present difficulties.  For 
example if Apache died/crashed, the rules could be left lingering.  Perhaps 
it's best not to pollute Apache with operation system networking setup, 
especially non-portable settings that are unique to Linux.

Thanks,
Paul


Re: [PATCH] tproxy2 patch to the apache 2.2.15

2010-08-12 Thread Paul Fee
JeHo Park wrote:

 hello Daniel
 thanks your interest.
 
 - Original Message -
 From: Daniel Ruggeri drugg...@primary.net
 To: dev@httpd.apache.org
 Sent: Wednesday, August 04, 2010 9:11 AM
 Subject: Re: [PATCH] tproxy2 patch to the apache 2.2.15
 
 
 On 8/3/2010 9:57 AM, JeHo Park wrote:
 hello ~
 it's my first mail to apache dev .. and i am beginner of the apache. :-)
 Anyway ... recently, i wrote transparent proxy [tproxy2] patch to the
 httpd-2.2.15
 because i needed web proxy and needed to know the source address of
 any client who try to connect to my web server
 and after all, i tested the performance of my patched tproxy with
 AVALANCHE 2900. if anyone ask me the performance result, i will send
 it to him [the size of the test result pdf is big size]
 *- here is the platform infomation this patch applied ---*
 1. OS
 CentOS release 5.2 (Final)
 2. KERNEL
 Linux version 2.6.18-194.el5-tproxy2 (r...@localhost.localdomain
 mailto:r...@localhost.localdomain)
 (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46))
 #10 SMP Wed May 26 17:35:19 KST 2010
 3. iptables
 iptables-1.3.8 + tproxy2 supporting patch
 *-- here is the usage of tproxy2 patched httpd configuration ---*
 httpd.conf
 VirtualHost 192.168.200.1:80
 ProxyTproxy On # On/Off flag
 ProxyTPifaddr 192.168.200.1 # IP address of bridge interface br0.
 example) br0 = eth0 + eth1 
 /VirtualHost
 i attach the kernel tproxy2 patch to the kernel
 above[2.6.18-194.el5-tproxy2 ], httpd-2.2.15 tproxy2 patch and kernel
 configuration for tproxy2
 above all, i want to know my patch is available or not .. and want
 feedback from anyone :-)
 
 JeHo;
 Hi, can you help me understand what the usage case is for this patch?
 
 as far as i know, there is another modules for IP transparency for example
 tproxy4 and X-Forwarded-For ...etc. but tproxy4 is only  available from
 kernel version 2.6.24 and above X-Forwarded-For make the L3, L4 security
 box unavailable, because the main function of the x-Forwarded-for is to
 make the web server know client IP address, we can't sure whether there
 are some another security box [L3, L4 ..firewall ] between the proxy and
 web server, in this point, X-Forwarded-For make the security box
 unavailable.
 
 What service or capability does it provide that is not currently
 available?
 i just tested the patch in my local network. it worked right and i did
 performance test with the avalanche. but i didn't test it in field .. and
 various network environment. so i hope so many people use, test this patch
 
 
 
 --
 Daniel Ruggeri


Hi JeHo,

Thank you for sharing your patches.

I was unable to use your Apache patches on Fedora 13 (kernel 2.6.33).  I 
didn't use your kernel patch since tproxy4.1 was merged into the Linux 
kernel at 2.6.28.

You've patched tproxy2 into the CentOS/RHEL 2.6.18 kernel.  tproxy2 behaves 
differently from tproxy4.1 hence it's to be expected that your userspace 
patches doesn't work with 2.6.28+ kernels.

Here's a post showing tproxy history, it recommends against tproxy2:
https://lists.balabit.hu/pipermail/tproxy/2008-November/000994.html

Bazsi suggests starting with tproxy4 for 2.6.17 and propagate that forward 
to a 2.6.18 kernel.  The tproxy4 API looks easier to use than tproxy2.  
Unfortunately I didn't find the tproxy4 for 2.6.17 kernel patch.

However most important is to have future proof Apache changes that will be 
compatible with distros other than just CentOS5/RHEL5, for example RHEL6.

Incidentally, how are you managing the iptables rules?  Is it assumed that 
these will be setup before Apache httpd is started?  Or do you think Apache 
should own the rules, creating them at startup and removing them on 
shutdown.

Thanks,
Paul


Re: Talking about proxy workers

2010-08-10 Thread Paul Fee
Paul Fee wrote:

 Rainer Jung wrote:
 
 Minor additions inside.
 
 On 06.08.2010 14:49, Plüm, Rüdiger, VF-Group wrote:


 -Original Message-
 From: Paul Fee
 Sent: Freitag, 6. August 2010 14:44
 To: dev@httpd.apache.org
 Subject: Re: Talking about proxy workers


 Also, is it possible to setup these three reuse styles for a
 forward proxy?

 1: No reuse, close the connection after this request.

 Yes, this the default.

 2: Reuse connection, but only for the client that caused its creation.

 No.
 
 Even if you configure pooled connections like in the example given in 3,
 the connections are returned to the pool after each request/response
 cycle. They are not directly associated with the client connection.
 
 But: if the MPM is prefork, the client connection is handled by a single
 process which doesn't handle any other requests during the life of the
 client connection. Since pools are process local, in this case the pool
 will always return the same connection (the only connection in the
 pool). Note that this pooled connection will not be closed when the
 client connection is closed. It can live longer or shorter than the
 client connection and you can't tie their lifetime together.
 
 Whether the proxy operates in forward or reverse mode doesn't matter, it
 only matters how the pool aka worker is configured. See 3.
 
 3: Pool connection for reuse by any client.

 Yes, but this is needed separately for every origin server you forward
 to:

 Proxy http://www.frequentlyused.com/
 # Set an arbitrary parameter to trigger the creation of a worker
 ProxySet keepalive=on
 /Proxy
 
 Pools are associated with workers, and workers are identified by origin
 URL. In case of a reverse proxy you often only have a few origin
 servers, so pooling works fine. In case of a forward proxy you often
 have an enormous amount of origin servers, each only visited every now
 and then. So using persistent connections is less effective. It would
 only make some sense, if we could optionally tie together client
 connections and origin server connections.
 
 Regards,
 
 Rainer
 
 I'm using the worker MPM, so connection sharing between clients can
 happen.
 
 As you've pointed out, pooling works well for reverse proxies as there are
 few backends and the hit rate is high.  For forward proxies, there are
 numerous destinations and the pool hit rate will be low.  The pool has a
 cost due to multi-threaded access to a single data structure, I presume
 locks protect the connection pool.  Locks can limit scalability.
 
 I'm wondering if pools should be restricted to the reverse proxy case.
 Forward proxies would couple the proxy-origin server connection to the
 client side connection.  Since connections can not be shared, there's no
 need for locking.  We'd loss the opportunity to share, but since the
 probability of a pool hit by another client is low, that loss should be
 acceptable.
 
 Essentially, I'm asking if it would make sense to implement 2: Reuse
 connection, but only for the client that caused its creation.  This could
 be a configurable proxy worker setting.
 
 Thanks,
 Paul

Here's a suggestion to refine connection pooling for forward proxies.

Can a connection to an origin server be tightly coupled to the client 
connection for the lifetime of the client connection?  Then, when the client 
connection closes, the origin server connection can be placed in the pool 
for possible reuse by another incoming client connection.

This would allow a client to reuse its own origin server connection without 
having to do a lookup in the pool.  We'd save on lookup costs and pool 
locking.  However connections would still go into the pool after the client 
has finished with it, for the potential benefit of other clients.

When a client makes a request, mod_proxy looks to see if there's an existing 
origin server connection coupled to the request_rec.  If there's no 
connection or the connection is not to the correct origin server, then 
perform a pool lookup.  If that fails, then create a fresh connection.

Does this should feasible?  Can we do this with mod_proxy already?  Is it 
worth implementing?

Thanks,
Paul


Re: Talking about proxy workers

2010-08-09 Thread Paul Fee
Rainer Jung wrote:

 Minor additions inside.
 
 On 06.08.2010 14:49, Plüm, Rüdiger, VF-Group wrote:


 -Original Message-
 From: Paul Fee
 Sent: Freitag, 6. August 2010 14:44
 To: dev@httpd.apache.org
 Subject: Re: Talking about proxy workers


 Also, is it possible to setup these three reuse styles for a
 forward proxy?

 1: No reuse, close the connection after this request.

 Yes, this the default.

 2: Reuse connection, but only for the client that caused its creation.

 No.
 
 Even if you configure pooled connections like in the example given in 3,
 the connections are returned to the pool after each request/response
 cycle. They are not directly associated with the client connection.
 
 But: if the MPM is prefork, the client connection is handled by a single
 process which doesn't handle any other requests during the life of the
 client connection. Since pools are process local, in this case the pool
 will always return the same connection (the only connection in the
 pool). Note that this pooled connection will not be closed when the
 client connection is closed. It can live longer or shorter than the
 client connection and you can't tie their lifetime together.
 
 Whether the proxy operates in forward or reverse mode doesn't matter, it
 only matters how the pool aka worker is configured. See 3.
 
 3: Pool connection for reuse by any client.

 Yes, but this is needed separately for every origin server you forward
 to:

 Proxy http://www.frequentlyused.com/
 # Set an arbitrary parameter to trigger the creation of a worker
 ProxySet keepalive=on
 /Proxy
 
 Pools are associated with workers, and workers are identified by origin
 URL. In case of a reverse proxy you often only have a few origin
 servers, so pooling works fine. In case of a forward proxy you often
 have an enormous amount of origin servers, each only visited every now
 and then. So using persistent connections is less effective. It would
 only make some sense, if we could optionally tie together client
 connections and origin server connections.
 
 Regards,
 
 Rainer

I'm using the worker MPM, so connection sharing between clients can happen.

As you've pointed out, pooling works well for reverse proxies as there are 
few backends and the hit rate is high.  For forward proxies, there are 
numerous destinations and the pool hit rate will be low.  The pool has a 
cost due to multi-threaded access to a single data structure, I presume 
locks protect the connection pool.  Locks can limit scalability.

I'm wondering if pools should be restricted to the reverse proxy case.  
Forward proxies would couple the proxy-origin server connection to the 
client side connection.  Since connections can not be shared, there's no 
need for locking.  We'd loss the opportunity to share, but since the 
probability of a pool hit by another client is low, that loss should be 
acceptable.

Essentially, I'm asking if it would make sense to implement 2: Reuse 
connection, but only for the client that caused its creation.  This could 
be a configurable proxy worker setting.

Thanks,
Paul


Re: OS Keep-alive on forward proxy

2010-08-06 Thread Paul Fee
Rainer Jung wrote:

snip
 
 The default worker for forward proxying does not use connection pooling
 in the naive sense. It closes each connection after each request.

Regardless of pooling, since that's httpd's internal implmentation, is there 
a reason for defaulting to non-persistent TCP connections on the wire?

I've read that the HTTP/1.0 protocol's specification for persistence was 
weak and that Netscape Navigator's Proxy-Connection: keep-alive header 
didn't fix the issue.  Therefore for HTTP/1.0 mod_proxy would not create a 
persistent connection to the next hop (e.g. the origin server).

However my understanding was that for HTTP/1.1 the protocol was good enough 
to work correctly over proxy chains and that the hop-by-hop connection 
header was adequate for negotiating each step on route from the client to 
the origin server.

I would like mod_proxy to use persistent connections for HTTP/1.1, are there 
reasons for sacrificing this performance improvement?

Thanks,
Paul


Re: Talking about proxy workers

2010-08-06 Thread Paul Fee
Mark Watts wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 06/08/10 12:13, Jeff Trawick wrote:
 On Fri, Aug 6, 2010 at 3:54 AM, Rainer Jung rainer.j...@kippdata.de
 wrote:
 On 05.08.2010 21:30, Eric Covener wrote:


 
http://people.apache.org/~rjung/httpd/trunk/manual/mod/mod_proxy.html.en#workers

 A direct worker is usually configured using any of ProxyPass,
 ProxyPassMatch or ProxySet.

 I don't know much about Proxy, but can this hammer home a bit more
 that these directives create a new worker implicitly based on [some
 parts of?] the destination URL?

 Good point. I updated the patch and HTML page to stress the
 identification of workers by their URL.

 And what happens when there's
 overlap?

 There's a warning box at the end of the Workers section talking about
 that. I slightly rephrased it to also contain the trm overlap.

 New patch:

 http://people.apache.org/~rjung/patches/mod_proxy_docs_workers-v2.patch
 
 nits:
 
 +  There are two builtin workers, the default forward proxy worker
 and the
 
 built-in
 
 +  optionally included in a directive
 module=mod_proxyProxy/directive
 +  directive.
 
 How about using container at the end instead of directive? (shrug)
 
 +  . Direct workers can use connection pooling,
 +  HTTP Keep-Alive and individual configurations for example
 +  for timeouts.
 
 (dumb question: what's the diff between keepalive and connection
 pooling?  reuse for one client vs. reuse for any client?)
 
 That last part sounds a little awkward.  Maybe something like this:
 
 A number of processing options can be specified for direct workers,
 including connection pooling, HTTP Keep-Alive, and I/O timeout values.
 
 +   Which options are available is depending on the
 +  protocol used by the worker (and given in the origin server URL).
 +  Available protocols include codeajp/code, codefcgi/code,
 +  codeftp/code, codehttp/code and codescgi/code./p
 +
 
 The set of options available for the worker depends on the protocol,
 which is specified in
 the origin server URL.  Available protocols include 
 
 
 +  pA balancer worker is created, if its worker URL uses
 
 no comma
 
 +  pWorker sharing happens, if the worker URLs overlap. More
 precisely
 +  if the URL of some worker is a leading substring of the URL of
 another
 +  worker defined later in the configuration file.
 
 pWorker sharing happens if the worker URLs overlap, which occurs when
 the URL of some worker is a leading substring of the URL of another
 worker defined later in the configuration file.
 
 +  In this case the later worker isn't actually created. Instead
 the previous
 +  worker is used. The benefit is, that there is only one connection
 pool,
 +  so connections are more often reused. Unfortunately all the
 configuration attributes
 +  given explicitly for the later worker overwrite the respective
 configuration
 +  of the previous worker!/p
 
 
 This sounds like a discussion of pros and cons.  There's no pro, since
 the user didn't intend to configure it this way, right?
 
 
 Can we have some examples put in that section - its a little wordy and I
 found it a little hard to understand, and I use mod_proxy quite a lot!
 
 Mark

Can I request an example of how to setup a worker for a forward proxy.  
Directives such as ProxyPass are for reverse proxies.  Can parameters such 
as disablereuse be used with a proxy block?

Also, is it possible to setup these three reuse styles for a forward proxy?

1: No reuse, close the connection after this request.
2: Reuse connection, but only for the client that caused its creation.
3: Pool connection for reuse by any client.

Could you provide example configurations for these please?

Thanks,
Paul


RE: Talking about proxy workers

2010-08-06 Thread Paul Fee
Plüm, Rüdiger, VF-Group wrote:
 3: Pool connection for reuse by any client.
 
 Yes, but this is needed separately for every origin server you forward to:
 
 Proxy http://www.frequentlyused.com/
# Set an arbitrary parameter to trigger the creation of a worker
ProxySet keepalive=on
 /Proxy

Can I use wildcards to enable connection pooling for all forward proxy 
destinations?

Proxy *
   ProxySet keepalive=on
/Proxy

Thanks,
Paul


Re: OS Keep-alive on forward proxy

2010-08-05 Thread Paul Fee
Rainer Jung wrote:

snip

 And yes: the forward proxy does *not* do HTTP Keepalive. Technical
 reason: the connections to the origin server are pooled and retrieved
 from and returned to the pool for each request. A forward proxy usually
 talks to many diferent origin servers. Keeping those connections open in
 a naive way would lead to a lot of not well used pools. Assuming that
 during one client connection the origin server often is used for
 multiple requests this could be improved, but would bloat the already
 complicated proxy code even more.

Has mod_proxy operated in that way for a while now?  I gained most of my 
experience with mod_proxy using Apache 2.0.X.  My understanding was that 
proxy to OS connections were tightly coupled to the client to proxy 
connection.  There was a deliberate decision not to reuse proxy-OS 
connections for requests coming from other client-proxy connections as this 
may be a security risk.

The OS may attribute authorization to a connection and a subsequent request 
on this persistent connection could inherit these attributes.  Each HTTP 
request *should* be stateless and hence the next request on the same socket 
should be independent, but there was the risk that a remote (non-Apache) 
origin server may not work that way.  If the proxy-OS connection is pooled 
and reused by a different client-proxy request, does that risk confusing an 
origin server that expects all requests on the same connection to come from 
the same client?

... or have I misunderstood your description?

Thanks,
Paul


RE: OS Keep-alive on forward proxy

2010-08-05 Thread Paul Fee
Plüm, Rüdiger, VF-Group wrote:

  
 
 -Original Message-
 From: Paul Fee
 Sent: Donnerstag, 5. August 2010 11:18
 To: dev@httpd.apache.org
 Subject: Re: OS Keep-alive on forward proxy
 
 Rainer Jung wrote:
 
 snip
 
  And yes: the forward proxy does *not* do HTTP Keepalive. Technical
  reason: the connections to the origin server are pooled and
 retrieved
  from and returned to the pool for each request. A forward
 proxy usually
  talks to many diferent origin servers. Keeping those
 connections open in
  a naive way would lead to a lot of not well used pools.
 Assuming that
  during one client connection the origin server often is used for
  multiple requests this could be improved, but would bloat
 the already
  complicated proxy code even more.
 
 Has mod_proxy operated in that way for a while now?  I gained
 
 Since 2.2.
 
 most of my
 experience with mod_proxy using Apache 2.0.X.  My
 understanding was that
 proxy to OS connections were tightly coupled to the client to proxy
 
 That was true in 2.0.x yes.
 
 connection.  There was a deliberate decision not to reuse proxy-OS
 connections for requests coming from other client-proxy
 connections as this
 may be a security risk.
 
 The OS may attribute authorization to a connection and a
 subsequent request
 on this persistent connection could inherit these attributes.
  Each HTTP
 request *should* be stateless and hence the next request on
 the same socket
 should be independent, but there was the risk that a remote
 (non-Apache)
 origin server may not work that way.  If the proxy-OS
 connection is pooled
 and reused by a different client-proxy request, does that
 risk confusing an
 origin server that expects all requests on the same
 connection to come from
 the same client?
 
 It would be a bug in this server to expect them to origin from the
 same client as you correctly state that HTTP is a stateless protocol.
 Nevertheless you can turn off connection pooling in the case you
 are dealing with a faulty origin server.
 
 Regards
 
 Rüdiger

That's useful information, it's not mentioned in the overview of new 
features in 2.2 and I missed it in the detailed changelog.  Thanks for 
correcting my misunderstanding.

Regarding disabling connection pooling, I looked at the source and see two 
ways to achieve this:

1) The disablereuse parameter of the ProxyPass directive.
2) The proxy-initial-not-pooled Apache environment variable set on a per-
request basis.

Both these relate to reverse proxy requests.

Does connection pooling apply to forward proxy requests?  If so, are there 
configuration options to control it?

Would disabling connection pooling fix the defect that Ryujiro Shibuya 
reported (with the penalty of losing the performance gains for pooling)?

Thanks,
Paul


Re: mod_deflate handling of empty initial brigade

2010-06-03 Thread Paul Fee
Bryan McQuade wrote:
 Are there any cases where it's important for ap_pass_bridgade to pass
 on an empty brigade? Doesn't sound like it, but since this is a core
 library change I want to double check.

When handling a CONNECT request, the response will have no body.  In 
mod_proxy, the CONNECT handler currently skips most filters and writes via 
the connection filters.  However there is a block of #if 0 code which 
intends to send only a FLUSH bucket down the filter chain.

That's not quite the case of an entirely empty brigade, but it seems close 
enough to warrant highlighting.

Thanks,
Paul



RE: Age calculation in mod_cache.

2010-04-15 Thread Paul Fee
Plüm, Rüdiger, VF-Group wrote:

  
 
 -Original Message-
 From: Ryujiro Shibuya
 Sent: Mittwoch, 14. April 2010 03:35
 To: dev@httpd.apache.org
 Subject: Age calculation in mod_cache.
 
 Hello,
 
 A minor issue in the age calculation in mod_cache
 [ap_cache_current_age() in
 cache_util.c] is found.
 
 In some unusual conditions, the age of cached content can be
 calculated as
 negative value.
 The negative age value will be casted into a huge unsigned
 integer later,
 and then the inappropriate Age header e.g. Age: 4294963617
 (= more than
 135 years) may be returned to the client.
 
 In my opinion, the negative age should be adjusted to zero, at least.
 What are your thoughts?
 
 Makes sense. Fixed in trunk as r933886.
 
 Regards
 
 Rüdiger

Hi Rüdiger,

Can you educate me on how this can be merged onto the 2.2.x branch?  Does a 
specific request need to be made or do most trunk changes get merged 
automatically?

Thanks,
Paul


Eliminating absolute paths on installation

2006-12-13 Thread Paul Fee
Hello all,

After building Apache httpd, I find that the httpd executable has explicit 
knowledge of its ultimate install location as specified with:

./configure --prefix=install location

Items with this absolute knowledge include:
ServerRoot (e.g. httpd implicitly know where to find its config file.)
RPATH (used by the dynamic linker to locate APR libraries.)

This is a problem for me as the install location is not always known at build 
time.  Also, if I give someone a built version of httpd, they can not install 
it multiple times on one host due to the absolute paths.

I expect the ServerRoot item could easily cope with relative paths.  Whether 
the starting point is the current working directory or the directory in which 
the httpd application resides can be up for debate.

The RPATH is slightly different.

Before installation libtool creates a script httpd which can be used to run 
the real httpd which is in the .libs directory.  It uses LD_LIBRARY_PATH to 
temporarily override the RUNPATH stored within the ELF object.  However 
LD_LIBRARY_PATH should be avoided in general use.

The RPATH is populated by the -R (or -rpath) linker option.  $ORIGIN is a token 
which the runtime linker interprets as the directory in which the ELF object 
resides.

The current RPATH can be seen with:
(Linux) objdump -p httpd | grep PATH
(Solaris) dump -Lv httpd | grep PATH
  RPATH   install root/lib

Replacing this with $ORIGIN/../lib would cause the httpd executable to search 
for the APR libraries in ../lib relative to itself.

Hence we could now build Apache httpd without advanced knowledge of where it is 
to be installed.  This would be very useful for me.

Any thoughts?

Thanks,
Paul

-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Re: Eliminating absolute paths on installation

2006-12-13 Thread Paul Fee
 - Original Message -
 From: Guy Hulbert [EMAIL PROTECTED]
 To: dev@httpd.apache.org
 Subject: Re: Eliminating absolute paths on installation
 Date: Wed, 13 Dec 2006 08:16:08 -0500
 
 
 On Wed, 2006-13-12 at 13:16 +0100, Paul Fee wrote:
  This is a problem for me as the install location is not always known
  at build time.  Also, if I give someone a built version of httpd, they
  can not install it multiple times on one host due to the absolute
  paths.
 
 Why do they need more than one ?
 
 --
 --gh

Hi Guy,

The main motivation is that I don't want to dictate install location to people 
that are using my builds of httpd.

Secondly, I have multiple people testing httpd and my module.  I want to 
increase machine utilisation and allow multiple installations on one box.  It 
may be possible to arrange that they share a common httpd but ideally each 
installation would be self contained.  For example different httpd versions may 
be built with different options.

The only conflicting resource that different instances must avoid contention 
over should be the TCP port that httpd listens on.

Another scenario would be a httpd server in active service and the need to 
install a new version (in a different directory) for testing without removing 
the active version.  It would be good if httpd had the option to be built 
without advanced knowledge of its install location.

Without eliminating absolute paths, I find myself heading down the path of OS 
visualisation, which to me seems very heavy weight to install multiple 
instances of one application.

Thanks,
Paul

-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Re: Eliminating absolute paths on installation

2006-12-13 Thread Paul Fee
 - Original Message -
 From: Joe Orton [EMAIL PROTECTED]
 To: dev@httpd.apache.org
 Subject: Re: Eliminating absolute paths on installation
 Date: Wed, 13 Dec 2006 14:33:03 +
 
 
 On Wed, Dec 13, 2006 at 01:16:35PM +0100, Paul Fee wrote:
  The RPATH is slightly different.
 
 The only way to avoid the RPATH (in general) is to link APR/APR-util
 statically; which can only be achieved by not building the shared
 libraries.  So passing --disable-shared to configure may work, though
 this is not a configuration that gets any testing at all AFAIK, so it
 may not work but bug reports are welcome.
 
 Having libtool use $ORIGIN-relative RPATHs would certainly be a neat
 hack for platforms which support that; patches would have to go to the
 libtool team ;)
 
 Alternatively, you can get tools which munge ELF binaries post-build -
 chrpath is the commonly used one IIRC.
 
 Regards,
 
 joe

Hi Joe,

A problem avoided is a problem solved!

My build of httpd has a small number of (non OS supplied) dependencies:
libaprutil
libexpat
libapr

Hence rolling these statically into the one httpd executable sounds feasible.  
I don't have other apps linking the same APR shared objects, hence I won't be 
losing opportunities to share .so files in memory.

Editing an existing RPATH with chrpath sounds interesting, but dangerous.  I 
won't be surprised if had a limitation such as the replacement RPATH can not 
exceed the original RPATH, but I could cope with that.

Also, delving into the subtleties of libtool sounds a bit intimidating.

Anyway, you and Jeff have provided useful pointers.

Thanks,
Paul

-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Re: Re: De-Chunking

2006-11-15 Thread Paul Fee
 - Original Message -
 From: Christian V. [EMAIL PROTECTED]
 To: dev@httpd.apache.org
 Subject:  Re: De-Chunking
 Date:  Wed, 08 Nov 2006 09:59:08 +0100
 
 
 Christian V. wrote:
  Nick Kew wrote:
  On Tue, 07 Nov 2006 11:24:05 +0100 Christian V.
  [EMAIL PROTECTED] wrote:
 
  Hi ,
 
  i 'm running a third-party web service authentication module that
   hangs when the request coming from the client is splitted out in
   different chunks. I don't have access to the module and to the
  client neither, so I'm thinking to write an input filter that
  collects all the chunks and pass'em to the downstream filter or
  handler . Is that possible?
 
  It's possible, yes.
 
  Whether it'll fix the problem for you is not certain.  I'd suggest
  starting with a quick hack (or a dechunking proxy in front of your
  server) to test it first, if you really can't get the source.
 
 
 
  Maybe the Proxy will fix it but it will not be the solution, so i
  think i'm gonna write the module-filter. But i need to know how
  Apache handle multi chunk request, as im not able to find this
  information. Is request coming entirely to my filter in the form of
  bucketbrigades then passed to down-streams module or brigades are
  passed down as soon as they come? (I hope i explained it well)
 
  Tnx a lot, Chris.
 
 
 
 
 
 Let me explain well as im not 100% sure the problem is the 3rd 
 party module, and other people may had met the same issue:
 
 
 
 [ CLIENT ] -- [ APACHE R.PROXY (SSL) + 3rd MODULE ] -- [WEB SERVICE]
 
 The Web Service receives requests from both Java and .net clients.
 
 Our problem is the following.
 
 The .Net clients (we have one in c# and one in VB, both programmed with
 visual studio 2005 ) will split the client's XML request in multiple
 1024 byte packets. This happens only over HTTPS, and causes problems
 with the 3rd party module
 
 To debug this we have programmed an apache module on the reverse proxy
 that dumps the stream of data as it receives it from the clients, and
 our .Net client, over HTTPS, splits it up in multiple chunks, as seen here:
 
 [Tue Oct 31 15:09:56 2006] [notice] (IN)  bucketdumper: mode READBYTES;
 blocking; 8192 bytes
 [Tue Oct 31 15:09:57 2006] [notice] (IN) - (AFTER bucket_read)
 -\tbucketdumper:\tbytes: 1024  -  lenght read: 1024  - data: ?xml
 version=1.0 encoding=utf-8?soap:Envelope
 xmlns:soap=http://schemas.xmlsoap.org/soap/envelope/;
 xmlns:soapenc=http://schemas.xmlsoap.org/soap/encoding/;
 xmlns:tns=http://www.acme.com.com/wsdl/HelloMoto.wsdl;
 xmlns:types=http://www.acme.com.com/wsdl/HelloMoto.wsdl/encodedTypes;
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 xmlns:xsd=http://www.w3.org/2001/XMLSchema;soap:Body
 soap:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/;q1:sayHello
 xmlns:q1=urn:examples:HelloMotoTAG1
 xsi:type=xsd:stringTES/TAG1TAG2
 xsi:type=xsd:stringTES/TAG2TAG3
 xsi:type=xsd:stringTES/TAG3TAG4
 xsi:type=xsd:stringTES/TAG4TAG5
 xsi:type=xsd:stringTES/TAG5TAG6
 xsi:type=xsd:stringTES/TAG6TAG7
 xsi:type=xsd:stringTES/TAG7TAG8
 xsi:type=xsd:stringTES/TAG8TAG9
 xsi:type=xsd:stringTES/TAG9TAG10
 xsi:type=xsd:stringTES/TAG10TAG11
 xsi:type=xsd:stringTES/TAG11TAG12
 xsi:type=xsd:stringTEST/TAG12/q1:sayHello/soap:Body/soap:Envelope
 -
 [Tue Oct 31 15:09:57 2006] [notice] (IN) Complete Bucket :
 [Tue Oct 31 15:09:57 2006] [notice] (IN)  bucketdumper: mode READBYTES;
 blocking; 8192 bytes
 [Tue Oct 31 15:09:58 2006] [notice] (IN) - (AFTER bucket_read)
 -\tbucketdumper:\tbytes: 1  -  lenght read: 1  - data:  -
 [Tue Oct 31 15:09:58 2006] [notice] (IN) - (AFTER bucket_read)
 -\tbucketdumper:\tbytes: 0  -  lenght read: 0  - data:  -
 
 Note how the XML is 1025 bytes long, and gets send in one 1024 byte
 packet first, followed by a second 1 byte packet (that contains just the
 final ).
 
 This does not happen over HTTP, where the entire XML arrives in one 1025
 byte long data chunk.
 
 Also, our Java clients do not split up the XML when posting in HTTPS,
 independently of how long it is. Here is a request made by one of our
 java clients:
 
 
 [Tue Oct 31 15:12:57 2006] [notice] (IN)  bucketdumper: mode READBYTES;
 blocking; 8192 bytes
 [Tue Oct 31 15:12:57 2006] [notice] (IN) - (AFTER bucket_read)
 -\tbucketdumper:\tbytes: 4333  -  lenght read: 4333  - data: ?xml
 version=1.0 encoding=UTF-8?soapenv:Envelope
 xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/;
 xmlns:xsd=http://www.w3.org/2001/XMLSchema;
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;soapenv:BodyberegnLivorno
 xmlns=http://prognosiRiservata.acme.com;startinputns1:channel
 xmlns:ns1=http://;LIV/ns1:channelns2:clientId
 xmlns:ns2=http://;TEST/ns2:clientIdns3:clientPassword
 xsi:nil=true xmlns:ns3=http:///ns4:test
 xmlns:ns4=http://;false/ns4:testns5:useCache
 
 
 mlns:ns36=http://data.prognosiRiservata.acme.com;true/ns36:returnerBeskrivelserns37:returnerThreadSide
 

Header compression (or lack of) in mod_proxy

2006-11-07 Thread Paul Fee
Hello all,

I'm using Apache as a HTTP proxy.  Regarding the request and
response headers, I've done some tests and noticed different
behaviour in the request and response direction.

The request headers are compressed (i.e. headers with same name are
merged into one header and comma separated). e.g.

hdr: value1
hdr: value2

becomes,
hdr: value1, value2

This due to ap_get_mime_headers_core() calling
apr_table_compress().  It occurs in protocol.c before Apache even
detects that the incoming request is a proxy request.

The response headers on the other hand are read by mod_proxy in
ap_proxy_read_headers() which calls apr_table_add() but not
apr_table_compress().

RFC 2616 states that header compression MUST be allowed, i.e. it's optional, 
therefore Apache's behaviour is compliant.

However if a proxy is between a non-compliant client and/or server
then it may be best to leave the headers in their original form.
If a direct connection works and a proxied connection fails then
the proxy will be perceived as the problem.

Could someone point out a reason for the different behaviour in the
request and response path?

How about making the behaviour configurable so that it's consistent
in both directions and if necessary the headers can be left in
their original uncompressed form?

By the way, my tests were on httpd 2.0.59, however reading the
source for 2.2.3 suggests it has same behaviour.

Thanks for your time,
Paul

-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Re: Header compression (or lack of) in mod_proxy

2006-11-07 Thread Paul Fee
Sorry for the double post, I thought my first post got dropped.  But it was my 
fault because I hadn't subscribed.  Anyway more below...

 - Original Message -
 From:  Graham Leggett [EMAIL PROTECTED]
 To: dev@httpd.apache.org
 Subject: confirm subscribe to dev@httpd.apache.org

  Paul Fee wrote:
 
  Could someone point out a reason for the different behaviour in the 
  request and response path?

 Cookies.

 Cookie headers cannot be compressed as the RFC says they should be, so 
 proxy works around this by not compressing headers.

Thanks for that, now I see the problem.  The response could contain a 
set-cookie header, such as:
Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 
23:12:40 GMT

The problem being the date may contain a , character.  Use of this reserved 
character for a purpose other than separating multiple header values means that 
the Set-Cookie header can not be compressed.

Fortunately in the case of a request, the cookie header will not contain a 
date, hence the same problem is not present.

 
  How about making the behaviour configurable so that it's consistent 
  in both directions and if necessary the headers can be left in 
  their original uncompressed form?
 
 In theory, the idea that it be consistent in both directions is not 
 unreasonable - it follows the principle of be lenient in what we accept.

Therefore, we've eliminated the idea of calling apr_table_compress() for the 
response due to set-cookie.  Would you foresee any issues with disabling the 
apr_table_compress() call for the request?

I'd like to add the option to leave request headers in the form in which they 
were received from the client.

Today, multiple headers with the same name are compressed when read from the 
client.  Therefore Apache modules reading the headers will see a single string 
with comma separated values.  However modules, in theory, could also add new 
headers so we could have something like:
hdr: value1, value2
hdr: value3

If apr_table_compress() was not called, would that break anything?  Would 
modules expect comma separated values or would they be designed to cope with 
both representations as RFC2616 says they MUST?

If we assume that the rest of Apache will cope with both representations, then 
disabling the call to apr_table_compress() in ap_get_mime_headers_core() will 
not cause problems.

Of course we should keep it configurable and perhaps have the default set to 
enabled so as not to force new behaviour on users.

Thanks,
Paul

-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Re: De-Chunking

2006-11-07 Thread Paul Fee
 - Original Message -
 From: Christian V. [EMAIL PROTECTED]
 To: dev@httpd.apache.org
 Subject:  De-Chunking
 Date:  Tue, 07 Nov 2006 11:24:05 +0100
 
 
 Hi,
 
 I'm running a third-party web service authentication module that hangs
 when the request coming from the client is splitted out in 
 different chunks. I don't have access to the module and to the 
 client neither, so
 I'm thinking to write an input filter that collects all the chunks 
 and pass'em to the downstream filter or handler .
 Is that possible?
 

I would almost expect that if a module's filter is of the appropriate type then 
is will not see the underlying representation (e.g. chunked or not).  However 
that impression may be due to me usually working with output filter.  The same 
may not be the same for input from the client.

Also, Apache 2.2 mod_proxy has a feature to dechunk request bodies, see:
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#request-bodies

It sounds like your on a web server rather than a proxy, but the mod_proxy 
implementation may provide you with some clues.

Hope that helps,
Paul


-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze


Header compression (or lack of) in mod_proxy

2006-11-06 Thread Paul Fee
Hello all,

I'm using Apache as a HTTP proxy.  Regarding the request and 
response headers, I've done some tests and noticed different 
behaviour in the request and response direction.

The request headers are compressed (i.e. headers with same name are 
merged into one header and comma separated). e.g.

hdr: value1
hdr: value2

becomes,
hdr: value1, value2

This due to ap_get_mime_headers_core() calling 
apr_table_compress().  It occurs in protocol.c before Apache even 
detects that the incoming request is a proxy request.

The response headers on the other hand are read by mod_proxy in 
ap_proxy_read_headers() which calls apr_table_add() but not 
apr_table_compress().

RFC 2616 states that header compression MUST be allowed.

However if a proxy is between a non-compliant client and/or server 
then it would be best to leave the headers in their original form.  
If a direct connection works and a proxied connection fails then 
the proxy will be perceived as the problem.

Could someone point out a reason for the different behaviour in the 
request and response path?

How about making the behaviour configurable so that it's consistent 
in both directions and if necessary the headers can be left in 
their original uncompressed form?

By the way, my tests were on httpd 2.0.59, however reading the 
source for 2.2.3 suggests the same behaviour.

Thanks for your time,
Paul


-- 
___
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com

Powered by Outblaze