Henrik Nordstrom wrote:
On Wed, 31 Mar 2004, Greg Swallow wrote:


Assuming documentation at http://www.bowiesnyder.com/writings/caching_shtml.htm is true, my squid-based reverse proxies shouldn't be caching HTML documents that are server parsed. However, it appears that the four reverse proxies I have sitting in front of my webservers are?


How the page was generated on the server does not matter that much other
than that server generated pages usually does not have caching
information, what matters is the HTTP headers of the response and to some
extent your refresh_pattern settings in squid.conf.

Some good references to understand these concepts better:

  Caching Tutorial for Web Authors and Webmasters
  <url:http://www.mnot.net/cache_docs/>

  Cacheability Engine
  <url:http://www.mnot.net/cacheability/>

Wow, those are kick-ass docs. I'll have to bookmark them.


Let me give you a little more background. The closest related configuration directives I can find are:

cache_peer 157.91.12.68 sibling 80 3130 proxy-only
cache_peer 157.91.12.70 sibling 80 3130 proxy-only
cache_peer 157.91.12.71 sibling 80 3130 proxy-only
refresh_pattern \.cfm$          0       0%      0
refresh_pattern \.asp$          0       0%      0
refresh_pattern \.aspx$         0       0%      0
refresh_pattern .               59      20%     240

These (along with the 59 minutes minimum, above) are new, as of upgrading to 2.5-stable4:

digest_generation on
digest_rebuild_period 600 seconds
digest_rewrite_period 3600 seconds
refresh_pattern -i \.pdf$ 59 20% 240 reload-into-ims override-lastmod override-expire


squid.conf is functionally identical on all four systems -- I've checked them with diff.

The page in question is our main index page: http://www.IN.gov/ According to the cache-docs page I read, this page is not cacheable when:

There are no validators (ETags or Last-Modified headers. Maybe Cache-control?)

The content in question, in our page, is:

<div id="bannerimage"> <div id="amberalert"><!--#include virtual="/amber/include.html"--></div>

This include is either a zero-length file or a graphic, depending on conditions.

So I used the caching engine to test our page, but I'm afraid the results are skewed, since that's visiting our cache. The caching engine reports that my page isn't cacheable, but I decided to do some testing with curl instead.

curling my own site (no SSI):

curl -z 040113342004.00 -D headers http://www.netgawds.com/

HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 14:53:08 GMT
Server: Apache/1.3.29 (Unix) PHP/4.3.4 mod_ssl/2.8.16 OpenSSL/0.9.7c
Last-Modified: Wed, 31 Mar 2004 03:58:20 GMT
ETag: "18a480-289e-406a41dc"
Accept-Ranges: bytes
Content-Length: 10398
Connection: close
Content-Type: text/html

curl -z 040113342004.00 -D headers http://www.IN.gov/ (direct to the origin server -- inside a firewall)

HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 15:08:51 GMT
Server: Apache
Connection: close
Content-Type: text/html

You're right about the 304's. However, I'm still a bit confused about why this is happening:

cache1# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
1 TCP_HIT:NONE
153 TCP_IMS_HIT:NONE
169 TCP_MISS:DIRECT
480 TCP_MISS:NONE
4456 TCP_MEM_HIT:NONE


cache2# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
176 TCP_MISS:DIRECT
1186 TCP_MISS:CD_SIBLING_HIT


cache3# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
162 TCP_MISS:DIRECT
1118 TCP_MISS:CD_SIBLING_HIT


cache4# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
161 TCP_MISS:DIRECT
1205 TCP_MISS:CD_SIBLING_HIT


On the bright side, my hit ratios are better than they ever have been before :)

Hey, BTW, how "stable" is Squid 3? I'd like to start using ESI within the year.

--
+--------------+------+----------------------+---------------+
| Greg Swallow | CCNA | System Administrator | accessIndiana |
+--(http://www.IN.gov/)----------------------(888.4IN.EGOV)--+

**********************************************************************
CONFIDENTIALITY NOTICE: This E-mail and any attachments are
confidential.  If you are not the intended recipient, you do not have
permission to disclose, copy, distribute, or open any attachments.
If you have received this E-mail in error, please notify us
immediately by returning it to the sender and delete this copy from
your system.

Thank you.
accessIndiana, MyLocal.IN.gov, CivicNet
**********************************************************************



Reply via email to