Following some IRC chat, I thought I'd start a discussion on a possible improvement of refresh_pattern in Squid3.

The starting point for this discussion is the fact that refresh_pattern is a source of confusion for many users, even expert admins. It's not obvious what it does, how to achieve certain things, or under what circumstances different bits of it apply or don't apply.

Currently refresh_pattern means different things depending on how the response freshness was calculated: whether by explicit header set by the origin server (Cache-Control, Expires), by invoking the Last- Modified algorithm (if it had a Last-Modified header), or whether it could not calculate a freshness by either of these methods.

It's quite complicated. I don't know what the right answer is.

Here is an idea though:

We could separate the configuration out into "standard" and "HTTP violating" parts. Let us define "standard" as the two mechanisms that are most semantically transparent:

1. Explicit expiration set by server (Cache-Control, Expires)
2. Heuristic expiration based on Last-Modified

And let's define "HTTP violating" as anything that either overrides these, or anything that enforces cacheability in the absence of any of these headers.

What configuration options do we need for each of these two categories?

For the "standard" configuration:
We don't need any options for the explicit expiry mechanism, as it's... explicit :) However, we do need a couple of global options for the Last-Modified factor algorithm:

     TAG: refresh_lastmod_factor (percent)
     Default: 20

     TAG: refresh_lastmod_max (minutes)
     Default: 10080

These, then, are the only refresh options I propose for a non-HTTP- violating setup.


Now for the "HTTP violating" overrides, which are more complicated.

Defaults are set first:
        
     TAG: refresh_override_default options
     Default: none

These can be refined by regex:

     TAG: refresh_override_match [-i] pattern options
     Default: none

where options can be any of:
     min=xxx
          minimum amount of time this object will be considered fresh
     max=xxx
          maximum amount of time this object will be considered fresh
     ignore-reload=on|off
ignore all client headers that prevent serving a cached response
     reload-into-ims=on|off
client reload is downgraded from unconditional to conditional GET
     ignore-no-cache=on|off
          ignore all server headers that prevent caching a response
     ignore-no-store=on|off
          ignore "Cache-Control: no-store" server header
     ignore-private=on|off
          ignore "Cache-Control: private" server header
     ignore-auth=on|off
cache authorized responses, even if server didn't specify "Cache-Control: public"
     refresh-ims=on|off
always pass client IMS requests through to the origin, even if we think our copy is fresh

For example:
     refresh_override_default     max=4320 reload-into-ims=on

refresh_override_match http://host/ ignore-reload=on ignore-no-cache=on ignore-no-store=on
     refresh_override_match     /path/     reload-into-ims=off
     refresh_override_match     \.jpe?g$     min=1440
     refresh_override_match     \.css$     max=60


Main  differences in usage:

1. The overrides would always apply, regardless of how the expiration time was arrived at - whether by explicit headers or last-modified algorithm heuristics. Currently the Min, Max and Percent settings only apply in different specific circumstances, e.g. Max and Percent only apply to L-M requests, Min only applies in the absence of L-M, Expires and CC max-age.

2. The refresh_override_default would always apply (although its options may be overridden by those of a refresh_override_match). Currently the default refresh_pattern only applies if no patterns match the request, meaning you can't ever override default behaviour, you can only fall back to it.

3. There is no way of setting the Last-Modified factor percentage by regex! This is perhaps a big problem, and it could be added as an option. But then it would be the only non-HTTP-violating directive possible in the option... and so would spoil it slightly.

4. No need for global counterparts of refresh_pattern directives, e.g. refresh_all_ims and reload_into_ims.

5. Frequently used override options could be stated in the default instead of every subsequent line


This may be completely the wrong way of looking at it, or it may be just going too far. A smaller, but still helpful, step might be to introduce a refresh_pattern_default whose values would be inherited by any subsequent refresh_pattern match.


Any help or input into this would be very welcome indeed

Doug


On 1 Jun 2006, at 20:06, Doug Dixon wrote:

Hi

I'm fixing bug 1202 (it's a simple fix) and am cleaning up refresh.cc at the same time.

I'd like to review the various refresh_pattern options, as some of them are mutually exclusive in practice (although you can configure all of them) and it's not clear from the documentation what they all mean. They're quite hard to understand and use correctly.


1. reload-into-ims

The following is legal:

refresh_pattern html$ 5 20% 60 ignore-reload reload-into-ims

but reload-into-ims will not have any effect. You could argue that this is obvious, but I think it should be caught at parse time.

2. As an aside - but I want to mention it here - we need to make it clearer that if an object does specify an expiry time, the Min, Percent and Max values in refresh_pattern will be completely ignored, but the options won't be. I'll change cf.data.pre accordingly

3. override-expire

                override-expire enforces min age even if the server
                sent a Expires: header. Doing this VIOLATES the HTTP
                standard.  Enabling this feature could make you liable
                for problems which it causes.

If you do want to modify the behaviour of blindly obeying the server's explicit expiry time, you can - to an extent.

The override-expire option enforces the Min time in cache, even if the origin stated it should expire before then. But it ignores the Max time (surprising!), and the L-M factor (more expected - not obvious what this would do anyway)

It's not very intuitive. I think we should probably make this option enforce the Max time as well. Possibly even ignore the explicit expiry of the object altogether and fall back to last- modified factor??

It could be a naming thing... override-expire doesn't really say what it does. enforce-min might be better. But then you've already stated a min and might expect it to be already enforced.

4. override-lastmod

                override-lastmod enforces min age even on objects
                that were modified recently.

The Min time isn't enforced even when the last-modified factor algorithm does kick in. If the object was only just modified and the L-M factor algorithm results in a figure lower than the Min, it will be considered fresh for less than the configured Min.

This isn't what I would expect. I know that the override-lastmod exists to let you do this, but it's really non-intuitive. I think the Min should always be enforced if we're using L-M factor algorithm, and that we should therefore lose the override-lastmod option. Can't see the point in the default (null) behaviour of Min otherwise.


Thoughts?

Doug


Reply via email to