This is a Crosspost from AmaVis-users, as it's kind of a cross issue
post, I'll pose the same question here..
Thanks,
Richard.
Greetings:
I know this may well be OT for the AmaVis list, but I was wondering if
there was any expertise here that I could draw on...
I'm using Amavis and SpamAssassin (obviously), and I would like to get
something setup to feed false negatives to sa-learn... Unfortunately
there's not much I can do with my mail client setup... The only thing I
can really do is "forward as attachment" when there's a FN...
The "forward as attachment" option produces messages that look like
this:
>From [EMAIL PROTECTED] Tue May 4 13:09:21 2004
Return-Path: <[EMAIL PROTECTED]>
Received: from localhost (localhost [127.0.0.1])
by whfirewall.nwtel.ca (8.12.11/8.12.9) with ESMTP id
i44K9LRp008313
for <[EMAIL PROTECTED]>; Tue, 4 May 2004 13:09:21
-0700
Received: from whfirewall.nwtel.ca ([127.0.0.1])
by localhost (whfirewall [127.0.0.1]) (amavisd-new, port 10024) with
LMTP
id 07235-06 for <[EMAIL PROTECTED]>;
Tue, 4 May 2004 13:09:19 -0700 (PDT)
Received: from hobbes.nwtel.ca (hobbes.nwtel.ca [172.16.96.89])
by whfirewall.nwtel.ca (8.12.11/8.12.11) with ESMTP id
i44K99ea008300
for <[EMAIL PROTECTED]>; Tue, 4 May 2004 13:09:09
-0700
Received: from WHTHYT-MTA by hobbes.nwtel.ca
with Novell_GroupWise; Tue, 04 May 2004 13:09:09 -0700
Message-Id: <[EMAIL PROTECTED]>
X-Mailer: Novell GroupWise Internet Agent 6.5.1
Date: Tue, 04 May 2004 13:08:41 -0700
From: "Richard Whittaker" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: Fwd: anemone
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=__Part1E3FC459.0__="
X-Virus-Scanned: by amavisd-new 20030616-p9 and SA 2.63 at nwtel.ca
This is a MIME message. If you are reading this text, you may want to
consider changing to a mail reader or gateway that understands how to
properly handle MIME multipart messages.
--=__Part1E3FC459.0__=
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Richard Whittaker, CISSP
Whitehorse Systems Manager,
IS Security Officer
NorthwesTel Inc.
--=__Part1E3FC459.0__=
Content-Type: message/rfc822
Return-path: <[EMAIL PROTECTED]>
Received: from whfirewall.nwtel.ca [192.168.90.253]
by hobbes.nwtel.ca; Tue, 04 May 2004 10:52:06 -0700
Received: from localhost (localhost [127.0.0.1])
by whfirewall.nwtel.ca (8.12.11/8.12.9) with ESMTP id
i44Hq6tT029961
for <[EMAIL PROTECTED]>; Tue, 4 May 2004 10:52:06 -0700
Received: from whfirewall.nwtel.ca ([127.0.0.1])
by localhost (whfirewall [127.0.0.1]) (amavisd-new, port 10024) with
LMTP
id 29573-03-2; Tue, 4 May 2004 10:51:59 -0700 (PDT)
Received: from 199.85.228.1 ([219.234.169.251])
by whfirewall.nwtel.ca (8.12.11/8.12.9) with SMTP id
i44HpBxh029825;
Tue, 4 May 2004 10:51:17 -0700
X-Message-Info: 509FMR52245RND_UC_CHAR[1-3]io9/HHbewCikRD14cOCit919uVJ
Received: from plight ([142.200.25.110])
by 730ns.hotbed.kinglet.dour.168.com
(InterMail vM.4.00.91.80 724-8-1-033-1-12724) with ESMTP
id
<[EMAIL PROTECTED]>
for <[EMAIL PROTECTED]>; Tue, 04 May 2004 16:45:31 -0200
Message-ID: <[EMAIL PROTECTED]>
Reply-To: "Sanders" <[EMAIL PROTECTED]>
From: "Sanders" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: anemone
Date: Tue, 04 May 2004 12:47:31 -0600
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="--71261322241384563126"
X-Virus-Scanned: by amavisd-new 20030616-p9 and SA 2.63 at nwtel.ca
X-Spam-Status: No, hits=-1.4 tagged_above=-999.0 required=2.0
tests=BAYES_01,
BIZ_TLD
X-Spam-Level:
----71261322241384563126
Content-Type: text/plain;
Content-Transfer-Encoding: 7Bit
....junk removed...
----71261322241384563126--
--=__Part1E3FC459.0__=--
When I run "sa-learn", I believe what's being identified is wrong, and
will taint my bayseian DB...
[EMAIL PROTECTED]:/var/adm# su - amavis -c "sa-learn --spam -D -L --mbox
/var/spo
ol/mail/amavis"
...blah, blah, blah...
debug: Learning Spam
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *p = "U*RWHITTAKER D*nwtel.ca D*ca"
debug: tokenize: header tokens for *m = " s09795f5 024 hobbes nwtel ca
"
debug: tokenize: header tokens for *x = "Novell GroupWise Internet
Agent 6.5.1 "
debug: tokenize: header tokens for *F = "U*RWHITTAKER D*nwtel.ca D*ca"
debug: tokenize: header tokens for To = "U*amavis D*whfirewall.nwtel.ca
D*nwtel.ca D*ca"
debug: tokenize: header tokens for Mime-Version = "1.0"
debug: tokenize: header tokens for *c = "multipart/mixed; =__
PHrtHHHHHHHH . H __= "
debug: tokenize: header tokens for *r = " WHTHYT-MTA by
hobbes.nwtel.ca Novell_GroupWise; "
debug: tokenize: header tokens for *r = " WHTHYT-MTA by
hobbes.nwtel.ca Novell_GroupWise; hobbes.nwtel.ca (hobbes.nwtel.ca
[172.16.96]) by whfirewall.nwtel.ca (8.12.11/8.12.11)
<[EMAIL PROTECTED]>; "
debug: bayes: Learned '[EMAIL PROTECTED]'
Learned from 3 message(s) (3 message(s) examined).
debug: bayes: 8433 untie-ing
debug: bayes: 8433 untie-ing db_toks
debug: bayes: 8433 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 8433 unlink /usr/share/bayes/.lock
[EMAIL PROTECTED]:/var/adm# ls -l | more
The learner is mis-identifying the message as coming from me (since I
forwarded it, but I did so as an attachment)...
Is there something I can do to pre-process these messages, and strip
out the headers from forwarding before sa-learn gets it's hooks into
it?... Has anyone dealt with this before?... Is there anything I can
do?... I've looked at the folder/IMAP option, and it's not likely going
to help me much...
Regards,
Richard.
Richard Whittaker, CISSP
Whitehorse Systems Manager,
IS Security Officer
NorthwesTel Inc.