Try to set the cache-control header param in the request so that it
will fetch and only fetch from cache.
My curl syntax is rusty but I think you can specify:
-h "Cache-Control: only-if-cached"
Prefered testing should be done with a cleared cache so no 304s occur.
Hope that helps?
-Jason
On Mar 26, 2010, at 4:12 PM, Zachary Miller <[email protected]>
wrote:
Yeah, I just see cache writes on the initial run only, which leads
me to believe that the content is being cached, and later runs don't
produce hits (based on the hit rate from the traffic_line).
On Fri, Mar 26, 2010 at 12:09 PM, Bryan Call <[email protected]>
wrote:
On 03/26/2010 11:33 AM, Zachary Miller wrote:
Hello,
I am currently conducting a proof-of-concept to explore the value of
content caching for a system that automatically fetches a large
number of external web pages. I am mainly interested in using TS as
a forward proxy and then serving content locally for subsequent
duplicate queries to the sites. Currently I have the forward proxy
enabled, as well as the reverse proxy setting. Is it necessary or
advisable to have both of these enabled if my interest is mainly
external content?
I run traffic server as both a reverse and forward proxy without a
problem.
Also, I have been getting metrics using traffic_line and running
tests using the web UI and see some odd behavior. For instance,
whenever I run tests against BestBuy.com, on the initial run
(through about 3k pages) there are nearly the same number of writes
to the cache. On following runs (using the same pages) no new
writes are made to the cache, leading me to believe that the pages
already exist, but according to traffic_line, there are no cache
hits during the execution period. Other sites appear to perform as
expected, so I expected that this was due to dynamic content.
However, when I access pages through a browser with a clear cache, I
see certain pages failing to be added to the cache, while what I
would consider collateral content is added. For instance, if I
access http://www.msn.com and inspect the cache through the web UI,
I find many cached items from the MSN domain, but nothing containing
the actual page content. Is this expected behavior and something
configurable, or am I missing some fundamental aspect of the cache?
If the page has dynamic content and the origin server (best buy /
msn) has the headers not to cache the content then it won't be
cached by default. You can override this and cache the content, we
do this for some of our crawling.
I am not very familiar with the Web UI and I use the command line
tools. Are you seeing cache writes on Best Buy on the following
runs or just the first crawl?
Thanks a lot,
Zachary Miller