Hi,
On Sat, 7 Feb 2004 15:40:20 +0100 Jonas Eckerman <[EMAIL PROTECTED]> wrote:
> On Sat, 7 Feb 2004 07:01:57 -0600, Bob Apthorpe wrote:
>
> > I'm working on a project to combine mail log analysis and
> > SpamAssassin (spamd) scoring to rank the spamminess of a
> > connecting IP address. I haven't found any standard metrics so I'm
> > guessing at what might be useful, such as %spam per unit time {15-
> > minutes, hour, day, week} per unit network {/32, /28, /24}.
>
> Two comments:
>
> 1: I'm using relaydb for something similar (but not identical) to this.
>
> This technique simply stores the number of spams and hams per IP in a
> small database. I'm then checking the ratio of spam to ham for
> connecting IPs. If the ratio is above a certain threshold, I reject the
> connection.
>
> I'm also expiring records after a certain time.
Sounds like what I'm looking for; a threshold and an expiry time.
> 2: This method might seem effective in theory, but in reality it doesn't do
> as much as I'd hoped for.
>
> Nowadays spam more often comes from a multitude of addresses rather than
> a few dedicated spam sending hosts. This means that few sender IPs
> actually ever reach the threshold I've set up (a more aggressive
> threshold could change this though).
Right, and since I plan to tempfail (reject with 450), a more aggressive
threshold should result only in delayed mail, not rejected ham (except
for broken old Groupwise servers that don't comply with RFCs and take
450s as permanent failures.) This is different from traditional
graylisting in that some amount of spam leaks into your system but you
don't need to retain sender-envelope/sender-IP/recipient triplets. The
nice thing about this method is that you can process your logs in real
time to generate a tempfail access list. With multiple MTAs writing to a
central log host, you can generate one access list for all inbound MTAs
and the load of log processing can be pushed off the MTAs. Then with
something like CFEngine you could periodically push out the updated
access lists. I don't run a system nearly that large but I'd rather
build something that has a chance of scaling beyond a single host.
> I haven't checked what difference it'd make is subnets were used instead
> of IP-addresses.
My system probably doesn't receive enough mail to generate useful
statistics. :/
-- Bob