Funny that you are looking at that stat. Our crawler team was looking at it this week for determining if they had a 0% cache hit ratio and it was giving them false positives.

There might be a bug associated with that stat. If you can dig into it a little more and show what values you are getting on that stat vs cache_hit_fresh that would be helpful. I am going to run a quick test myself to see what I get.

-Bryan

On 03/26/2010 01:39 PM, Zachary Miller wrote:
This is great. Thanks. I must have missed these commands somehow. Does that mean that the cache rate (/proxy.node.cache_hit_ratio_avg_10s/) refers to the ram cache?

On Fri, Mar 26, 2010 at 1:33 PM, Bryan Call <[email protected] <mailto:[email protected]>> wrote:

    There are lot of stats around cache.  The one that means that it
    didn't have to contact the origin server at all is the first one:
    proxy.process.http.cache_hit_fresh
    proxy.process.http.cache_hit_revalidated
    proxy.process.http.cache_hit_ims
    proxy.process.http.cache_hit_stale_served
    proxy.process.http.cache_miss_cold
    proxy.process.http.cache_miss_changed
    proxy.process.http.cache_miss_not_cacheable
    proxy.process.http.cache_miss_client_no_cache
    proxy.process.http.cache_miss_ims

    If you are looking at the stat below it is possible it is a cache
    hit, but just from disk and not ram:
    proxy.process.cache.ram_cache.hits

    -Bryan


    On 03/26/2010 01:12 PM, Zachary Miller wrote:
    Yeah, I just see cache writes on the initial run only, which
    leads me to believe that the content is being cached, and later
    runs don't produce hits (based on the hit rate from the
    traffic_line).

    On Fri, Mar 26, 2010 at 12:09 PM, Bryan Call <[email protected]
    <mailto:[email protected]>> wrote:

        On 03/26/2010 11:33 AM, Zachary Miller wrote:

            Hello,

            I am currently conducting a proof-of-concept to explore
            the value of content caching for a system that
            automatically fetches a large number of external web
            pages.  I am mainly interested in using TS as a forward
            proxy and then serving content locally for subsequent
            duplicate queries to the sites.  Currently I have the
            forward proxy enabled, as well as the reverse proxy
            setting.  Is it necessary or advisable to have both of
            these enabled if my interest is mainly external content?


        I run traffic server as both a reverse and forward proxy
        without a problem.



            Also, I have been getting metrics using traffic_line and
            running tests using the web UI and see some odd behavior.
             For instance, whenever I run tests against BestBuy.com,
            on the initial run (through about 3k pages) there are
            nearly the same number of writes to the cache.  On
            following runs (using the same pages) no new writes are
            made to the cache, leading me to believe that the pages
            already exist, but according to traffic_line, there are
            no cache hits during the execution period.  Other sites
            appear to perform as expected, so I expected that this
            was due to dynamic content.

            However, when I access pages through a browser with a
            clear cache, I see certain pages failing to be added to
            the cache, while what I would consider collateral content
            is added.  For instance, if I access http://www.msn.com
            and inspect the cache through the web UI, I find many
            cached items from the MSN domain, but nothing containing
            the actual page content.  Is this expected behavior and
            something configurable, or am I missing some fundamental
            aspect of the cache?


        If the page has dynamic content and the origin server (best
        buy / msn) has the headers not to cache the content then it
        won't be cached by default.  You can override this and cache
        the content, we do this for some of our crawling.

        I am not very familiar with the Web UI and I use the command
        line tools.  Are you seeing cache writes  on Best Buy on the
        following runs or just the first crawl?



            Thanks a lot,

            Zachary Miller







Reply via email to