On Sat, 21 Jan 2017, Kevin Golding wrote:

On Sat, 21 Jan 2017 19:08:39 -0000, Jari Fredriksson <ja...@iki.fi> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Hardin kirjoitti 20.1.2017 22:38:

> Collecting spam after RBL filtering is much less helpful to masscheck.
> Ideally your spam corpus is from a totally unfiltered feed.
> > However, even if it is filtered and small, it helps, *especially* if
> the ham is not in English - masscheck is perennially starved for
> non-English ham and rule scoring is thus baised against non-English
> languages to a degree.

This is NOT what I have learned from SA lists. I used to do this, but
learned in SA discussions that it is *harmful* to pass such spam to
masscheck. That it harms the SA users doing proper pre SA filtering.

We do *need* an official policy! What are we going to do with mixed
messages like this??

It was written down once. I saw the unfiltered thing again when I looked earlier today, but I can't spot it just now. I believe I was also told by someone who knows this stuff that it wasn't a requirement, more an ideal.

I apologize if there's empirical evidence that including spam that would be blocked by RBLs causes poorer masscheck results. That seems strongly counterintuitive to me, especially for sites where such filtering is *not* done at the MTA level - there are such.

However looking for that comment again just now I registered another discrepancy on the wiki:

https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older than 2 months

https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam older than 6 months

I don't think either are actually strict rules.

There is age filtering in the masscheck code, but I don't remember off the top of my head what the cutoff actually is. I agree that the discrepancies in the wiki should be corrected...


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 An operating system design that requires a system reboot in order to
 install a document viewing utility does not earn my respect.
-----------------------------------------------------------------------
 2 days until John Moses Browning's 162nd Birthday

Reply via email to