On Wed, 31 Mar 2004, Greg Swallow wrote:
Assuming documentation at http://www.bowiesnyder.com/writings/caching_shtml.htm is true, my squid-based reverse proxies shouldn't be caching HTML documents that are server parsed. However, it appears that the four reverse proxies I have sitting in front of my webservers are?
How the page was generated on the server does not matter that much other than that server generated pages usually does not have caching information, what matters is the HTTP headers of the response and to some extent your refresh_pattern settings in squid.conf.
Some good references to understand these concepts better:
Caching Tutorial for Web Authors and Webmasters <url:http://www.mnot.net/cache_docs/>
Cacheability Engine <url:http://www.mnot.net/cacheability/>
Wow, those are kick-ass docs. I'll have to bookmark them.
Let me give you a little more background. The closest related configuration directives I can find are:
cache_peer 157.91.12.68 sibling 80 3130 proxy-only cache_peer 157.91.12.70 sibling 80 3130 proxy-only cache_peer 157.91.12.71 sibling 80 3130 proxy-only refresh_pattern \.cfm$ 0 0% 0 refresh_pattern \.asp$ 0 0% 0 refresh_pattern \.aspx$ 0 0% 0 refresh_pattern . 59 20% 240
These (along with the 59 minutes minimum, above) are new, as of upgrading to 2.5-stable4:
digest_generation on
digest_rebuild_period 600 seconds
digest_rewrite_period 3600 seconds
refresh_pattern -i \.pdf$ 59 20% 240 reload-into-ims override-lastmod override-expire
squid.conf is functionally identical on all four systems -- I've checked them with diff.
The page in question is our main index page: http://www.IN.gov/ According to the cache-docs page I read, this page is not cacheable when:
There are no validators (ETags or Last-Modified headers. Maybe Cache-control?)
The content in question, in our page, is:
<div id="bannerimage"> <div id="amberalert"><!--#include virtual="/amber/include.html"--></div>
This include is either a zero-length file or a graphic, depending on conditions.
So I used the caching engine to test our page, but I'm afraid the results are skewed, since that's visiting our cache. The caching engine reports that my page isn't cacheable, but I decided to do some testing with curl instead.
curling my own site (no SSI):
curl -z 040113342004.00 -D headers http://www.netgawds.com/
HTTP/1.1 200 OK Date: Thu, 01 Apr 2004 14:53:08 GMT Server: Apache/1.3.29 (Unix) PHP/4.3.4 mod_ssl/2.8.16 OpenSSL/0.9.7c Last-Modified: Wed, 31 Mar 2004 03:58:20 GMT ETag: "18a480-289e-406a41dc" Accept-Ranges: bytes Content-Length: 10398 Connection: close Content-Type: text/html
curl -z 040113342004.00 -D headers http://www.IN.gov/ (direct to the origin server -- inside a firewall)
HTTP/1.1 200 OK Date: Thu, 01 Apr 2004 15:08:51 GMT Server: Apache Connection: close Content-Type: text/html
You're right about the 304's. However, I'm still a bit confused about why this is happening:
cache1# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
1 TCP_HIT:NONE
153 TCP_IMS_HIT:NONE
169 TCP_MISS:DIRECT
480 TCP_MISS:NONE
4456 TCP_MEM_HIT:NONE
cache2# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
176 TCP_MISS:DIRECT
1186 TCP_MISS:CD_SIBLING_HIT
cache3# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
162 TCP_MISS:DIRECT
1118 TCP_MISS:CD_SIBLING_HIT
cache4# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print $NF}' | sort | uniq -c | sort
161 TCP_MISS:DIRECT
1205 TCP_MISS:CD_SIBLING_HIT
On the bright side, my hit ratios are better than they ever have been before :)
Hey, BTW, how "stable" is Squid 3? I'd like to start using ESI within the year.
-- +--------------+------+----------------------+---------------+ | Greg Swallow | CCNA | System Administrator | accessIndiana | +--(http://www.IN.gov/)----------------------(888.4IN.EGOV)--+
********************************************************************** CONFIDENTIALITY NOTICE: This E-mail and any attachments are confidential. If you are not the intended recipient, you do not have permission to disclose, copy, distribute, or open any attachments. If you have received this E-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system.
Thank you. accessIndiana, MyLocal.IN.gov, CivicNet **********************************************************************
