At 11:43 AM 6/25/2004, Harald Arnold wrote:
I have all my mails beginning from year 2000.
I tried to feed the database (ham <-> spam)
via sa-learning --[ham|spam].
After learning about 2.000 mails I tried
to see the "magic" numbers (nham and nspam),
but the database was KILLED and destroyed:

Output of "sa-learn --dump magic" shows only
zeros all over the table like a rebuild DB.
What could this be ? Is it the limit of the
DB ?

There's no limit I know of, and if there is, It is certainly not as low as 2,000 messages.


sa-learn --dump magic
0.000          0          2          0  non-token data: bayes db version
0.000          0     206544          0  non-token data: nspam
0.000          0       9356          0  non-token data: nham

Although I'll admit that I've never dumped more than 1,000 messages in as a single mbox file.


I'd also seriously question the value of feeding anything to sa-learn which is over a year old. Spam and Ham both change dramatically over time, and the tokens from ancient email aren't going to be very helpful.


Personally, I don't like to train things that are over 3 months old when I set up a new SA system, but sometimes I'll go back 6 months if I have to.



Reply via email to