There are lot of stats around cache. The one that means that it didn't
have to contact the origin server at all is the first one:
proxy.process.http.cache_hit_fresh
proxy.process.http.cache_hit_revalidated
proxy.process.http.cache_hit_ims
proxy.process.http.cache_hit_stale_served
proxy.process.http.cache_miss_cold
proxy.process.http.cache_miss_changed
proxy.process.http.cache_miss_not_cacheable
proxy.process.http.cache_miss_client_no_cache
proxy.process.http.cache_miss_ims
If you are looking at the stat below it is possible it is a cache hit,
but just from disk and not ram:
proxy.process.cache.ram_cache.hits
-Bryan
On 03/26/2010 01:12 PM, Zachary Miller wrote:
Yeah, I just see cache writes on the initial run only, which leads me
to believe that the content is being cached, and later runs don't
produce hits (based on the hit rate from the traffic_line).
On Fri, Mar 26, 2010 at 12:09 PM, Bryan Call <[email protected]
<mailto:[email protected]>> wrote:
On 03/26/2010 11:33 AM, Zachary Miller wrote:
Hello,
I am currently conducting a proof-of-concept to explore the
value of content caching for a system that automatically
fetches a large number of external web pages. I am mainly
interested in using TS as a forward proxy and then serving
content locally for subsequent duplicate queries to the sites.
Currently I have the forward proxy enabled, as well as the
reverse proxy setting. Is it necessary or advisable to have
both of these enabled if my interest is mainly external content?
I run traffic server as both a reverse and forward proxy without a
problem.
Also, I have been getting metrics using traffic_line and
running tests using the web UI and see some odd behavior. For
instance, whenever I run tests against BestBuy.com, on the
initial run (through about 3k pages) there are nearly the same
number of writes to the cache. On following runs (using the
same pages) no new writes are made to the cache, leading me to
believe that the pages already exist, but according to
traffic_line, there are no cache hits during the execution
period. Other sites appear to perform as expected, so I
expected that this was due to dynamic content.
However, when I access pages through a browser with a clear
cache, I see certain pages failing to be added to the cache,
while what I would consider collateral content is added. For
instance, if I access http://www.msn.com and inspect the cache
through the web UI, I find many cached items from the MSN
domain, but nothing containing the actual page content. Is
this expected behavior and something configurable, or am I
missing some fundamental aspect of the cache?
If the page has dynamic content and the origin server (best buy /
msn) has the headers not to cache the content then it won't be
cached by default. You can override this and cache the content,
we do this for some of our crawling.
I am not very familiar with the Web UI and I use the command line
tools. Are you seeing cache writes on Best Buy on the following
runs or just the first crawl?
Thanks a lot,
Zachary Miller