Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson

On Tue, 22 May 2007, Henrik Nordstrom wrote:


tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:


-8---
Does anybody see a problem with changing mod_cache to not update the
stored headers when the request has max-age=0, the body turns out not
to be stale and the on-disk header hasn't expired?
-8---


My understanding:

It's fine in an RFC point of view for the cache to completely ignore a
304 and not update the stored entity at all. But the response to this
request should be the merge of the two responses assuming the
conditional was added by the cache.


This is in line with my understanding, and since the response-merging 
is being done today the only change that would be done is to skip 
storing the header to disk. I think it would be wise to only skip the 
storing for the max-age=0 case though.


Should I try to whip up a patch for it then?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Radioactive halibut will make fission chips.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Sander Striker

On 5/24/07, Niklas Edmundsson [EMAIL PROTECTED] wrote:

On Tue, 22 May 2007, Henrik Nordstrom wrote:

 tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:

 -8---
 Does anybody see a problem with changing mod_cache to not update the
 stored headers when the request has max-age=0, the body turns out not
 to be stale and the on-disk header hasn't expired?
 -8---

 My understanding:

 It's fine in an RFC point of view for the cache to completely ignore a
 304 and not update the stored entity at all. But the response to this
 request should be the merge of the two responses assuming the
 conditional was added by the cache.

This is in line with my understanding, and since the response-merging
is being done today the only change that would be done is to skip
storing the header to disk. I think it would be wise to only skip the
storing for the max-age=0 case though.


Why limit it to the the max-age=0 case?  Isn't it a general improvement?

Sander


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Graham Leggett
On Thu, May 24, 2007 10:23 am, Sander Striker wrote:

  It's fine in an RFC point of view for the cache to completely ignore a
  304 and not update the stored entity at all. But the response to this
  request should be the merge of the two responses assuming the
  conditional was added by the cache.

 This is in line with my understanding, and since the response-merging
 is being done today the only change that would be done is to skip
 storing the header to disk. I think it would be wise to only skip the
 storing for the max-age=0 case though.

 Why limit it to the the max-age=0 case?  Isn't it a general improvement?

It isn't - the nett effect of not storing the headers to disk, means that
once a fresh object goes stale, it will remain stale until the end of
days, because the mechanism to make that object fresh again has been
removed.

If the object remains stale, it means that a conditional request will be
generated and sent to the backend on every single hit, which is
unnecessary load on both the backend network and the backend webserver.

As a directive controlled special case, this feature makes sense - but
this isn't the kind of default behaviour you want to see on a cache.

A better approach might be to determine whether the headers have actually
changed before writing them to disk. You needed to read the header in in
the first place, if the previously-read header and the newly-received
header from the backend are the same, then don't write to disk, it's
unnecessary.

This remains RFC compliant and solves the underlying problem.

Regards,
Graham
--




Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson

On Thu, 24 May 2007, Sander Striker wrote:


 -8---
 Does anybody see a problem with changing mod_cache to not update the
 stored headers when the request has max-age=0, the body turns out not
 to be stale and the on-disk header hasn't expired?
 -8---

 My understanding:

 It's fine in an RFC point of view for the cache to completely ignore a
 304 and not update the stored entity at all. But the response to this
 request should be the merge of the two responses assuming the
 conditional was added by the cache.

This is in line with my understanding, and since the response-merging
is being done today the only change that would be done is to skip
storing the header to disk. I think it would be wise to only skip the
storing for the max-age=0 case though.


Why limit it to the the max-age=0 case?  Isn't it a general improvement?


Consider a default cache lifetime of 86400 seconds, and requests 
coming in with max-age=4 (we see a lot of mozilla downloads with 
this, for example). If you don't rewrite the on-disk headers you'll 
end up always hitting your backend when you pass an age of 4.


In the max-age=0 case you only force an unneccesary header write, 
because:

a) The written header won't be useful for other requests with
   max-age=0. A ground rule of caching is to not save stuff that's
   never used.
b) Requests with max-age!=0 aren't helped much by it, the only penalty
   would be when an max-age!=0 request causes a header rewrite that
   an max-age=0 access would have performed. Doing this single rewrite
   instead of potentially thousands if rewriting due to max-age=0
   is a rather big win.
c) RFC-wise it seems to me that a not-modified object is a
   not-modified object. There is no guarantee that next request will
   hit the same cache, so nothing can expect a max-age=0 request to
   force a cache to rewrite its headers and then access it with
   max-age!=0 and get headers of that age.
d) Also, an object tend to be accessed with more-or-less the same
   max-age. So to store headers in the max-age=0 case just because it
   might be accessed by max-age!=0 makes no sense, since it's more
   likely that the next request to this object will have the same
   max-age.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Did I just step on someones toes again??
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Henrik Nordstrom
tor 2007-05-24 klockan 13:22 +0200 skrev Niklas Edmundsson:

 c) RFC-wise it seems to me that a not-modified object is a
 not-modified object. There is no guarantee that next request will
 hit the same cache, so nothing can expect a max-age=0 request to
 force a cache to rewrite its headers and then access it with
 max-age!=0 and get headers of that age.

Yes. RFC wise it's fine to not update the cache with the 304. Updating
of cached entries is optional (RFC2616 10.3.5 last paragraph).

The only MUST regardig 304 and caches is that you MUST ignore the 304
and retry the request without the conditional if the 304 indicates
another object than what is currently cached (i.e. ETag or Last-Modified
differs).  (same section, the paragraph above)

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: mod_cache: Don't update when req max-age=0?

2007-05-22 Thread Henrik Nordstrom
tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:

 -8---
 Does anybody see a problem with changing mod_cache to not update the 
 stored headers when the request has max-age=0, the body turns out not 
 to be stale and the on-disk header hasn't expired?
 -8---

My understanding:

It's fine in an RFC point of view for the cache to completely ignore a
304 and not update the stored entity at all. But the response to this
request should be the merge of the two responses assuming the
conditional was added by the cache.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Graham Leggett
On Mon, May 21, 2007 4:49 pm, Niklas Edmundsson wrote:

 Does anybody see a problem with changing mod_cache to not update the
 stored headers when the request has max-age=0, the body turns out
 not to be stale and the on-disk header hasn't expired?

 The rationale behind this is that there are hordes of stupid download
 managers that always issue this kind of request, and multiple in
 parallell to the same file at that. This hammers the entire
 cache-layer by causing headers to be rewritten for each request.

 Since max-age=0 requests can't be fulfilled without revalidating the
 object they don't benefit from this header rewrite, and requests with
 max-age!=0 that can benefit from the header rewrite won't be affected
 by this change.

 Am I making sense? Have I missed something fundamental?

At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then having
a directive allowing the admin to enable this behaviour does make sense.

Zooming out a little bit, this seems to fall into the category of RFC
violations that allow the cache to either hit the backend less, or hit the
backend not at all, for the benefit of an admin who knows whet they are
doing.

A simple set of directives that allow an admin to break RFC compliance
under certain circumstances in order to achieve certain goals does make
sense.

Regards,
Graham
--




Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Niklas Edmundsson

On Mon, 21 May 2007, Graham Leggett wrote:


Since max-age=0 requests can't be fulfilled without revalidating the
object they don't benefit from this header rewrite, and requests with
max-age!=0 that can benefit from the header rewrite won't be affected
by this change.

Am I making sense? Have I missed something fundamental?


At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then having
a directive allowing the admin to enable this behaviour does make sense.


Why would it break RFC compliance? This request will never benefit of 
the headers being saved to disk, and the headers returned to the 
client should of course be those that resulted of the revalidation of 
the object. The only difference is that they aren't saved to disk too.


The only difference I can see is that you can't probe that the 
previous request was a max-age=0 by doing max-age!=0 request 
afterwards...



Zooming out a little bit, this seems to fall into the category of RFC
violations that allow the cache to either hit the backend less, or hit the
backend not at all, for the benefit of an admin who knows whet they are
doing.

A simple set of directives that allow an admin to break RFC compliance
under certain circumstances in order to achieve certain goals does make
sense.


Yup. CacheIgnoreCacheControl is one of those, we use it on the 
offloaders that only serves large files that we know doesn't need the 
RFC behaviour.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Sir, We are receiving 285,000 Hails. þ Crusher
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Roy T. Fielding

On May 21, 2007, at 7:49 AM, Niklas Edmundsson wrote:
Does anybody see a problem with changing mod_cache to not update  
the stored headers when the request has max-age=0, the body turns  
out not to be stale and the on-disk header hasn't expired?


Yes, the problem is that it will break content management systems that
need to refresh a cache front-end after the content has changed.

The rationale behind this is that there are hordes of stupid  
download managers that always issue this kind of request, and  
multiple in parallell to the same file at that. This hammers the  
entire cache-layer by causing headers to be rewritten for each  
request.


Why don't you just add an ignore of cache-control on requests from
those stupid download managers?  A simple BrowserMatch should do.

Roy



Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Graham Leggett

Niklas Edmundsson wrote:


At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then 
having

a directive allowing the admin to enable this behaviour does make sense.


Why would it break RFC compliance?


Because when clients say maxage=0 it means please consider all URLs 
as stale and revalidate them, and the server is obliged to honor this.


This request will never benefit of 
the headers being saved to disk, and the headers returned to the client 
should of course be those that resulted of the revalidation of the 
object. The only difference is that they aren't saved to disk too.


If this happens you introduce a subtle bug - when the URL becomes stale 
on the frontend, it will remain stale to the end of days, because the 
entry on disk is never refreshed with new headers to show the content is 
fresh.


Yup. CacheIgnoreCacheControl is one of those, we use it on the 
offloaders that only serves large files that we know doesn't need the 
RFC behaviour.


I was thinking of a directive like CacheOrigin [on|off], meaning that 
*this* cache isn't a cache at all, but rather an origin server that just 
happens to fetch data via HTTP from some backend if the data isn't fresh 
in the cache.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Roy T. Fielding

On May 21, 2007, at 2:22 PM, Ruediger Pluem wrote:

Why don't you just add an ignore of cache-control on requests from
those stupid download managers?  A simple BrowserMatch should do.


I am not quite sure what you mean by this. AFAIK you cannot set
CacheIgnoreCacheControl based on env variables.


Which is why we would have to add it to the code.  Note that this
would be to ignore client-provided cache control, which is a good
feature to have on a cache for various DoS reasons.

Roy