MySQL Student wrote:
> Over the past few days I have been investigating more closely email
> that wasn't tagged that I thought should have been, and
> vice-versa, using various factors, such as URIBL_BLACK and JMF_W.
Very interesting.
Here's a quick testing script (ymmv on log file syntax):
#########
#!/bin/sh
# helper function, see below
_sacount() {
zgrep -h "spamd: result: ${3:+Y}" /var/log/mail.lo* \
|egrep -c "$1${2:+.*$2|$2.*$1}"
}
# Usage: sa_count RULE1 [RULE2]
# Counts messages marked as RULE1 (and RULE2 if given)
sa_count() {
c=`_sacount $1 $2`
sc=`_sacount $1 $2 spam`
echo "Found $c ($sc spam) matching ${2:+both} $1${2:+ and $2}."
}
sa_count RCVD_IN_HOSTKARMA_W URIBL_BLACK
sa_count RCVD_IN_DNSWL URIBL_BLACK
sa_count URIBL_BLACK
sa_count . # show total numbers
#########
My output (note, I greylist):
Found 54 (11 spam) matching both RCVD_IN_HOSTKARMA_W and URIBL_BLACK.
Found 25 (16 spam) matching both RCVD_IN_DNSWL and URIBL_BLACK.
Found 1981 (1919 spam) matching URIBL_BLACK.
Found 123273 (3791 spam) matching ..
I don't have data on whether there were FPs or FNs involved.
(And yes, zgrep is perfectly content to deal with uncompressed files.)