http://bugzilla.spamassassin.org/show_bug.cgi?id=2853





------- Additional Comments From [EMAIL PROTECTED]  2004-03-16 11:13 -------
Sounds generally like a good idea -- in particular, I'd suggest making
mass-check easier to use.  I'd like to see mass-check generating one
output file, e.g. by adding a "ham"/"spam" indicator to the start of the line
instead of keeping each in separate output files.

Also the ancillary scripts -- fp-fn-statistics, hit-frequencies, etc. are a
little complicated, and all of them make too many assumptions about their
location, e.g. assumign that ../rules is the rules dir.

However, rewriting the perceptron in perl gets -1 from me.

IMO the C nature of the perceptron is not a big problem.  Pretty much every
Debian machine will have a C compiler available.  Also, the amount of data (logs
and scores) in RAM needs a good, compact and fast representation, and C works
very very well for this; probably a lot better than perl can do without quite a
bit of work.

What is the problem, however, is that it currently requires a rebuild to include
the hits and scores from the C files generated from logs-to-c.  That should
probably be fixed, so that the perceptron can be distributed as a binary and
read that data at runtime. (as you said)

IMO, the biggest problem for users of a system like this, will be in corpus
management and mass-checking.  the perceptron et al aren't too hard compared to
that.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Reply via email to