> Sorry, Skip - I don't.  And I was surprised just now to see that we 
> apparently never checked test data files into the Sourceforge source tree 
> either!
>
> But it shouldn't matter.  SB learns pretty quickly, and it would be better to 
> use _current_ examples of spam and ham anyway (their characteristics change 
> over time).

Sure, but constructing a suitable ham/spam corpus from scratch is a
non-trivial task, as you no doubt remember. I could start with the
collection on mail.python.org, but I suspect I would probably let a
personal email or three leak through into what's ostensibly a public
database. (SpamBayes has been doing a pretty good job over the years
at its original assigned task.) I am looking to insure that a Py3 port
of SpamBayes works the same as the Py2 code.

Skip
_______________________________________________
spambayes-dev mailing list
spambayes-dev@python.org
https://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to