Re: Unexplained Cache MISSes

Nigel Peck Fri, 23 Jun 2017 12:04:02 -0700

Sure, that's something I can understand! Will gather some data over thenext couple of days for different time periods and configurations.


On 23/06/2017 04:09, Guillaume Quintard wrote:

Hum, could you toy with ttl/grace/keep periods? Like having only a oneweek TTL but no grace/keep, then a one week grace but no TTL/keep?

The period when the purge occurs may be important...

--
Guillaume Quintard

On Fri, Jun 16, 2017 at 9:09 PM, Nigel Peck <[email protected]<mailto:[email protected]>> wrote:



    Here's an interesting thing about this. When I refreshed the cache
    just now (PURGE) for 204 URLs, 78 of them were a HIT instead of a
    MISS. All had been in the cache for 9 hours at least. (a re-issued
    GET request received a MISS for all 78)

    When I immediately issued a PURGE again a few seconds later for all
    204 URLs, every one of them was a MISS and purged successfully. I
    did it again a few seconds after that, and again all good. Same
    again a few minutes after that. No HITs.

    So this seems to be in some way related to how long the objects have
    been in the cache.

    Nigel


    On 16/06/2017 13:27, Nigel Peck wrote:


        Sorry for the delay on working on this. I've read your email a
        few times now and am still confused! I need to read the man
        pages suggested but haven't got to it yet. Will let you know
        when I make some progress on it.

        I'm fixing the issue in the interim here by issuing another GET
        request in my cache refresh scripts for any PURGE requests that
        come back with a HIT.

        Nigel

        On 02/06/2017 18:08, Dridi Boukelmoune wrote:

                    Amazingly enough I never looked at the logs of a
                    purge, maybe ExpKill
                    could give us a VXID to then check against the hit.
                    If only SomeoneElse(tm)
                    could spare me the time and look at it themselves
                    and tell us (wink wink=).



                I'm very happy to help in any way I can. Please let me
                know anything I can
                do or information I can provide. I'm no C programmer
                (web developer/server
                admin), so can't help out with
                coding/patching/debugging[3], but anything
                else I can do, please let me know what you need.


            Well, luckily I didn't write any C code to find out what
            purge logs
            look like. I'm certainly not going to debug code I'm not
            familiar with ;)

            I wrote a dummy test case instead:

                  varnishtest "purge logs"

                  server s1 {
                      rxreq
                      expect req.url == "/to-be-purged"
                      txresp
                  } -start

                  varnish v1 -vcl+backend {
                      sub vcl_recv {
                          if (req.method == "PURGE") {
                              return (purge);
                          }
                      }
                  } -start

                  client c1 {
                      txreq -url "/to-be-purged"
                      rxresp

                      txreq -req PURGE -url "/to-be-purged"
                      rxresp

                      txreq -req PURGE -url "/unknown"
                      rxresp
                  } -run

            And then looked at the logs manually:

                  varnishtest test.vtc -v | grep vsl | less

            Here's a sample:

                  [...]
                  **** v1    0.4 vsl|       1002 VCL_return      b deliver
                  **** v1    0.4 vsl|       1002 Storage         b malloc s0
                  [...]
                  **** v1    0.4 vsl|          0 ExpKill         - EXP_When
            p=0x7f420b027000 e=1496443420.703764200 f=0xe
                  **** v1    0.4 vsl|          0 ExpKill         -
            EXP_expire
            p=0x7f420b027000 e=-0.000092268 f=0x0
                  **** v1    0.4 vsl|          0 ExpKill         -
            EXP_Expired x=1002 t=-0
                  [...]
                  **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
                  **** v1    0.4 vsl|       1003 ReqURL          c
            /to-be-purged
                  [...]
                  **** v1    0.4 vsl|       1003 VCL_return      c purge
                  **** v1    0.4 vsl|       1003 VCL_call        c HASH
                  **** v1    0.4 vsl|       1003 VCL_return      c lookup
                  **** v1    0.4 vsl|       1003 VCL_call        c PURGE
                  **** v1    0.4 vsl|       1003 VCL_return      c synth
                  [...]
                  **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
                  **** v1    0.4 vsl|       1004 ReqURL          c /unknown
                  [...]
                  **** v1    0.4 vsl|       1004 VCL_return      c purge
                  **** v1    0.4 vsl|       1004 VCL_call        c HASH
                  **** v1    0.4 vsl|       1004 VCL_return      c lookup
                  **** v1    0.4 vsl|       1004 VCL_call        c PURGE
                  **** v1    0.4 vsl|       1004 VCL_return      c synth
                  [...]

            The interesting transaction id (VXID) is 1002.

            So 1) purge-related logs will only show up with raw grouping in
            varnishlog (which I find unfortunate but I should have
            remembered the
            expiry thread would have been involved) and 2) we don't see in a
            transaction log how many objects were actually purged (moved
            to the
            expiry inbox).

            The ExpKill records appear before because transactions
            commit their
            logs when they finish by default.

                Would a cleanly installed server and absolute minimum
                VCL to reproduce this
                be useful? You would be welcome to have access to that
                server, if useful,
                once I've got it set up and producing the same problem.


            Not yet, at this point we know that we were looking at an
            incomplete
            picture so what you need to do is capture raw logs and we
            will be able
            to get both a VXID and a timestamp from the ExpKill records
            (although
            the timestamp for EXP_expire puzzles me).

            See man varnishlog to see how to write (-w) and then read
            (-r) logs
            to/from a file. When you notice the alleged bug, note the
            transaction
            id and write the current logs (with the -d option) so that
            you can
            pick up all the interesting bits at rest (instead of doing
            it on live
            traffic).

                I can say that in my case there is definitely no Age
                header coming from the
                back-end. Also as shown in the example I sent it is the
                7th HIT on that
                object.


            Yes, smells like a bug. But before capturing logs, make sure
            to remove
            Hash records from the vsl_mask (man varnishd) so we can
            confirm what's
            being purged too.

            I have a theory, a long shot that will only prove how
            unfamiliar I am
            with this part of Varnish. Since the purge moves the object
            to the
            expiry inbox, it could be that under load the restart may happen
            before the expiry thread marks it as expired, thus creating
            a race
            with the next lookup.

            Cheers,
            Dridi


        _______________________________________________
        varnish-misc mailing list
        [email protected]
        <mailto:[email protected]>
        https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc 
<https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>


    _______________________________________________
    varnish-misc mailing list
    [email protected] <mailto:[email protected]>
    https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
    <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>


_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: Unexplained Cache MISSes

Reply via email to