http://bugzilla.spamassassin.org/show_bug.cgi?id=2853
------- Additional Comments From [EMAIL PROTECTED] 2004-03-16 11:13 ------- Sounds generally like a good idea -- in particular, I'd suggest making mass-check easier to use. I'd like to see mass-check generating one output file, e.g. by adding a "ham"/"spam" indicator to the start of the line instead of keeping each in separate output files. Also the ancillary scripts -- fp-fn-statistics, hit-frequencies, etc. are a little complicated, and all of them make too many assumptions about their location, e.g. assumign that ../rules is the rules dir. However, rewriting the perceptron in perl gets -1 from me. IMO the C nature of the perceptron is not a big problem. Pretty much every Debian machine will have a C compiler available. Also, the amount of data (logs and scores) in RAM needs a good, compact and fast representation, and C works very very well for this; probably a lot better than perl can do without quite a bit of work. What is the problem, however, is that it currently requires a rebuild to include the hits and scores from the C files generated from logs-to-c. That should probably be fixed, so that the perceptron can be distributed as a binary and read that data at runtime. (as you said) IMO, the biggest problem for users of a system like this, will be in corpus management and mass-checking. the perceptron et al aren't too hard compared to that. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
