On Tue, 14 Oct 2014 13:58:27 +0200 Axb wrote: > On 10/14/2014 01:51 PM, RW wrote: > > On Tue, 14 Oct 2014 10:44:51 +0200 > > Axb wrote: > > > >> > >> have you verified that some of these are not included? > >> > >> X-Originating-IP will not be included as it can be used to help > >> detect ham or spam > > > > It's really no different to other headers you are ignoring. > > for example, if you get a flood of 419s from the same source, you may > want it to be tokenized...
As I do with, for example: X-AntiAbuse: Originator/Caller UID/GID - [514 32007] / [47 12] in this spam Bayes found 0.999-4--HX-AntiAbuse:32007 These numbers seem to be very good indicators for me. Most of the headers in the file have never appeared in my ham, so they'll be pure spam indicators if they are ever faked. In general it's difficult for a spammer to gain an overall advantage against an average per user database using faked headers. Whatever the merits of this on system-wide Bayes (if any beyond reducing token count), I think it would have a negative effect on per user Bayes.