MySQL Student wrote:
> Over the past few days I have been investigating more closely email
> that wasn't tagged that I thought should have been, and
> vice-versa, using various factors, such as URIBL_BLACK and JMF_W.

Very interesting.

Here's a quick testing script (ymmv on log file syntax):

#########
#!/bin/sh

# helper function, see below
_sacount() {
  zgrep -h "spamd: result: ${3:+Y}" /var/log/mail.lo* \
         |egrep -c "$1${2:+.*$2|$2.*$1}"
}

# Usage: sa_count RULE1 [RULE2]
# Counts messages marked as RULE1 (and RULE2 if given)
sa_count() {
  c=`_sacount $1 $2`
  sc=`_sacount $1 $2 spam`
  echo "Found $c ($sc spam) matching ${2:+both} $1${2:+ and $2}."
}

sa_count RCVD_IN_HOSTKARMA_W URIBL_BLACK
sa_count RCVD_IN_DNSWL URIBL_BLACK
sa_count URIBL_BLACK
sa_count .   # show total numbers

#########

My output (note, I greylist):

Found 54 (11 spam) matching both RCVD_IN_HOSTKARMA_W and URIBL_BLACK.
Found 25 (16 spam) matching both RCVD_IN_DNSWL and URIBL_BLACK.
Found 1981 (1919 spam) matching  URIBL_BLACK.
Found 123273 (3791 spam) matching  ..


I don't have data on whether there were FPs or FNs involved.

(And yes, zgrep is perfectly content to deal with uncompressed files.)

Reply via email to