I have all my mails beginning from year 2000. I tried to feed the database (ham <-> spam) via sa-learning --[ham|spam]. After learning about 2.000 mails I tried to see the "magic" numbers (nham and nspam), but the database was KILLED and destroyed:
Output of "sa-learn --dump magic" shows only zeros all over the table like a rebuild DB. What could this be ? Is it the limit of the DB ?
There's no limit I know of, and if there is, It is certainly not as low as 2,000 messages.
sa-learn --dump magic 0.000 0 2 0 non-token data: bayes db version 0.000 0 206544 0 non-token data: nspam 0.000 0 9356 0 non-token data: nham
Although I'll admit that I've never dumped more than 1,000 messages in as a single mbox file.
I'd also seriously question the value of feeding anything to sa-learn which is over a year old. Spam and Ham both change dramatically over time, and the tokens from ancient email aren't going to be very helpful.
Personally, I don't like to train things that are over 3 months old when I set up a new SA system, but sometimes I'll go back 6 months if I have to.
