http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From [EMAIL PROTECTED]  2004-03-28 20:49 -------
+1 on duncan's latest comments.

'> I propose the following format:
> <manual class> <result class> <score> <id> <rules> <value pairs>
> where, for our current code, <manual class> is:
>   "spam" | "ham" | "none"
> and <result class> is:
>   "spam" | "ham"
I'd prefer that we stick with single characters, since that is what
ArchiveIterator does. (It passes "s" or "h" around instead of "spam"
or "ham") Furthermore, having it fixed width is a good thing imho.'

what about:

<manual class><result class> <score> <id> <rules> <value pairs>

with one-letter classes.  That gives us:

hh: manually ham, classed as ham
hs: false positive
sh: false negative
ss: manually spam, classed as spam

That's handy because (a) it's closer to what the academic lit uses (TCR
calculation in particular uses just those classes with pretty much that
nomenclature in its computation), (b) it's very logical and obvious, (c) it fits
in 2 bytes, so fixed width, (d) it fits in one non-whitespace "token" so very
little script modification will be required in rule-qa et al. where
/\S+\s+\S+etc./ is used.

The "no manual classification" type would then be

us: unknown, marked as spam
uh: unknown, marked as ham

like this:

hh 0 ...path... RULES bayes=0.001
hh 0 ...path... RULES bayes=0.001



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Reply via email to