I've been playing with some numbers here. First, note that the
total.num.queries does vary from execution to execution , there are new
messages being fed in and potentially old messages being expired out,
however, it's an insignificant enough portion of the corpus that I don't
care and instead, I simply cut the percentage to 3 significant digits.

I did four tests, with cache-min-ttl at 0, 300, 900 and 14400 (4 hours).
Note that this is only the minimum cache length, I am honouring the
specified TTLs in all case (and of course, clearing the cache between
runs). 14400 was selected because it ensures no positive query will be
repeated (but, negative queries may still be repeated and I don't see a
way to control minimum negative caching, only maximum)

cache-min-ttl: 0
total.num.queries=1688222
total.num.cachehits=1186963 (70.3%)
total.num.cachemiss=501259 (29.6%)
total.num.prefetch=60403 (3.57%)

cache-min-ttl: 300
total.num.queries=1631371
total.num.cachehits=1428988 (87.5%)
total.num.cachemiss=202383 (12.4%)
total.num.prefetch=19686 (1.20%)

cache-min-ttl: 900
total.num.queries=1688358
total.num.cachehits=1501797 (88.9%)
total.num.cachemiss=186561 (11.0%)
total.num.prefetch=25430 (1.50%)

cache-min-ttl: 14400
total.num.queries=1683693
total.num.cachehits=1533700 (91.0%)
total.num.cachemiss=149993 (8.90%)
total.num.prefetch=8000 (0.47%)

To start off with, a worst-case (no minimum) cache hit rate of 70.3% is
actually quite good, but there is room for improvement, up to 91% by
simply caching everything for one masscheck run.

Also, while I didn't note the overall processing time for each run, when
I had cache-min-ttl set to 0 the task took over 3 hours (3:20 or so,
from memory), with the cache-min-ttl set to 14400, it was a little under
2 hours. This could be significant as well in some cases. In terms of
performance, I believe only cachemiss will cause a performance hit,
prefetch will attempt to step in and refresh the cache for frequently
used items, but when counting the number of queries sent up to remote
servers, one should combine both cachemiss and prefetch together, which
means 561662 to 157993 queries is a significant gain.

My corpus is currently fairly small for a variety of reasons, mainly
that my primary collection method had some errors in one of the
synthisized headers and therefore these messages were removed. Larger
environments will probably see more benefits, at least for environments
where masscheck uses a dedicated resolver.



On Fri, Nov 4, 2016, at 12:35, Dave Warren wrote:
> Howdy!
> 
> I'm building a new box to run my SA Masschecks and one of the things I'm 
> looking at is DNS resolution. I run a local instance of unbound 
> dedicated to this machine, and I'm thinking it makes sense to increase 
> the cache-min-ttl to an hour.
> 
> While I wouldn't normally suggest running this in production for a live 
> mail server, this resolver is dedicated to SA Masschecking, which by 
> it's nature is working with email that is anywhere from hours to months 
> old, so I don't feel like there will be any harm in over-caching DNSBLs.
> 
> At cache-min-ttl 900, I'm seeing much higher cache hits than I was 
> seeing without defining a minimum, currently:
> 
> total.num.queries=674702
> total.num.cachehits=588015
> total.num.cachemiss=86687
> total.num.prefetch=7703
> 
> The .prefetch count tells me that I would increase my cache 
> effectiveness by increasing the amount of data I can cache, although the 
> existing cache rate isn't exactly terrible.
> 
> I also need to study the various cache-sizes, can anyone provide any 
> recommendations? My current thinking is to start around:
> 
>          msg-cache-size: 64m
>          rrset-cache-size: 128m
>          key-cache-size: 128m
>          neg-cache-size: 128m
> 
> I've done a daily and a weekly run on the new box, currently I have 2GB 
> of RAM available to it and I'm floating around 400MB free, and another 
> 800MB buff/cache, so I don't believe I'm RAM constrained and therefore 
> I'm content to simply throw memory at the caches.
> 
> Can anyone think of any potential harm, given that the cache entries 
> will always expire between weekly masscheck runs?
> 
> The only other thing that came to mind was whether the daily or weekly 
> rulesets use DNS to verify their validity like SpamAssassin does, but as 
> far as I can tell, masscheck just uses rsync and doesn't care about DNS 
> for versioning (just for finding the rsync server, obviously)
> 
> Thoughts?
> 
> 

Reply via email to