Sure, that's something I can understand! Will gather some data over the
next couple of days for different time periods and configurations.
On 23/06/2017 04:09, Guillaume Quintard wrote:
Hum, could you toy with ttl/grace/keep periods? Like having only a one
week TTL but no grace/keep, then a one week grace but no TTL/keep?
The period when the purge occurs may be important...
--
Guillaume Quintard
On Fri, Jun 16, 2017 at 9:09 PM, Nigel Peck <[email protected]
<mailto:[email protected]>> wrote:
Here's an interesting thing about this. When I refreshed the cache
just now (PURGE) for 204 URLs, 78 of them were a HIT instead of a
MISS. All had been in the cache for 9 hours at least. (a re-issued
GET request received a MISS for all 78)
When I immediately issued a PURGE again a few seconds later for all
204 URLs, every one of them was a MISS and purged successfully. I
did it again a few seconds after that, and again all good. Same
again a few minutes after that. No HITs.
So this seems to be in some way related to how long the objects have
been in the cache.
Nigel
On 16/06/2017 13:27, Nigel Peck wrote:
Sorry for the delay on working on this. I've read your email a
few times now and am still confused! I need to read the man
pages suggested but haven't got to it yet. Will let you know
when I make some progress on it.
I'm fixing the issue in the interim here by issuing another GET
request in my cache refresh scripts for any PURGE requests that
come back with a HIT.
Nigel
On 02/06/2017 18:08, Dridi Boukelmoune wrote:
Amazingly enough I never looked at the logs of a
purge, maybe ExpKill
could give us a VXID to then check against the hit.
If only SomeoneElse(tm)
could spare me the time and look at it themselves
and tell us (wink wink=).
I'm very happy to help in any way I can. Please let me
know anything I can
do or information I can provide. I'm no C programmer
(web developer/server
admin), so can't help out with
coding/patching/debugging[3], but anything
else I can do, please let me know what you need.
Well, luckily I didn't write any C code to find out what
purge logs
look like. I'm certainly not going to debug code I'm not
familiar with ;)
I wrote a dummy test case instead:
varnishtest "purge logs"
server s1 {
rxreq
expect req.url == "/to-be-purged"
txresp
} -start
varnish v1 -vcl+backend {
sub vcl_recv {
if (req.method == "PURGE") {
return (purge);
}
}
} -start
client c1 {
txreq -url "/to-be-purged"
rxresp
txreq -req PURGE -url "/to-be-purged"
rxresp
txreq -req PURGE -url "/unknown"
rxresp
} -run
And then looked at the logs manually:
varnishtest test.vtc -v | grep vsl | less
Here's a sample:
[...]
**** v1 0.4 vsl| 1002 VCL_return b deliver
**** v1 0.4 vsl| 1002 Storage b malloc s0
[...]
**** v1 0.4 vsl| 0 ExpKill - EXP_When
p=0x7f420b027000 e=1496443420.703764200 f=0xe
**** v1 0.4 vsl| 0 ExpKill -
EXP_expire
p=0x7f420b027000 e=-0.000092268 f=0x0
**** v1 0.4 vsl| 0 ExpKill -
EXP_Expired x=1002 t=-0
[...]
**** v1 0.4 vsl| 1003 ReqMethod c PURGE
**** v1 0.4 vsl| 1003 ReqURL c
/to-be-purged
[...]
**** v1 0.4 vsl| 1003 VCL_return c purge
**** v1 0.4 vsl| 1003 VCL_call c HASH
**** v1 0.4 vsl| 1003 VCL_return c lookup
**** v1 0.4 vsl| 1003 VCL_call c PURGE
**** v1 0.4 vsl| 1003 VCL_return c synth
[...]
**** v1 0.4 vsl| 1004 ReqMethod c PURGE
**** v1 0.4 vsl| 1004 ReqURL c /unknown
[...]
**** v1 0.4 vsl| 1004 VCL_return c purge
**** v1 0.4 vsl| 1004 VCL_call c HASH
**** v1 0.4 vsl| 1004 VCL_return c lookup
**** v1 0.4 vsl| 1004 VCL_call c PURGE
**** v1 0.4 vsl| 1004 VCL_return c synth
[...]
The interesting transaction id (VXID) is 1002.
So 1) purge-related logs will only show up with raw grouping in
varnishlog (which I find unfortunate but I should have
remembered the
expiry thread would have been involved) and 2) we don't see in a
transaction log how many objects were actually purged (moved
to the
expiry inbox).
The ExpKill records appear before because transactions
commit their
logs when they finish by default.
Would a cleanly installed server and absolute minimum
VCL to reproduce this
be useful? You would be welcome to have access to that
server, if useful,
once I've got it set up and producing the same problem.
Not yet, at this point we know that we were looking at an
incomplete
picture so what you need to do is capture raw logs and we
will be able
to get both a VXID and a timestamp from the ExpKill records
(although
the timestamp for EXP_expire puzzles me).
See man varnishlog to see how to write (-w) and then read
(-r) logs
to/from a file. When you notice the alleged bug, note the
transaction
id and write the current logs (with the -d option) so that
you can
pick up all the interesting bits at rest (instead of doing
it on live
traffic).
I can say that in my case there is definitely no Age
header coming from the
back-end. Also as shown in the example I sent it is the
7th HIT on that
object.
Yes, smells like a bug. But before capturing logs, make sure
to remove
Hash records from the vsl_mask (man varnishd) so we can
confirm what's
being purged too.
I have a theory, a long shot that will only prove how
unfamiliar I am
with this part of Varnish. Since the purge moves the object
to the
expiry inbox, it could be that under load the restart may happen
before the expiry thread marks it as expired, thus creating
a race
with the next lookup.
Cheers,
Dridi
_______________________________________________
varnish-misc mailing list
[email protected]
<mailto:[email protected]>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
<https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
_______________________________________________
varnish-misc mailing list
[email protected] <mailto:[email protected]>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
<https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc