http://bugzilla.spamassassin.org/show_bug.cgi?id=3055
Summary: Bayes: use hash instead of Message-Id?
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: minor
Priority: P5
Component: Learner
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]
Folks --
this has come up before, but I think we might as well raise it again ;)
Basically, Robert Menschel noted on Fri, 13 Feb 2004 20:59:56 -0800
in this mail
Subject: Re[2]: Some real anti-bayes stuffing followup
Date: Fri, 13 Feb 2004 20:59:56 -0800
Cc: spamassassin-users.incubator.apache.org
the following:
'I've received multiple spams all using the same message id.
a) If a ham is sent to my domain with four recipients here, then because
of the way I run SA, I could process that email four times, once for each
mailbox. That's expected. And it's expected that each of those emails
will have identical bodies, and identical subjects.
b) I receive spam where in a given day I can receive similar spam,
identical message ids, but with different subject headers (usually random
words or letters added to a subject), and/or with different bodies
(sometimes minor random differences, sometimes very different messages).
c) I receive spam where on Jan 2 I can receive spam with a given message
ID, and I can receive spam (similar or not) with identical message ids on
Jan 14, Jan 30, Feb 12, etc.'
I think this is probably a bayes-evasion technique, since we key
our bayes_seen db on Message-ID if present.
What were the objections to using a hash of some selected headers (From, To,
Subject) and the message body, again? Strikes me this is a more resilient
way to avoid spammers using 1 message ID for all their spam and evading
bayes learning that way.
--j.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.