Justin Mason wrote:
as a matter of interest, how much disk space does this database of as-yet-untrained tokens take up? It's something we've considered implementing in SpamAssassin, but the disk space issue is an important datum before considering it.
This is not for the faint of heart. dspam is a major app which make very heavy use of the backend database (with MySQL being the most common and fastest backend). On a freshly installed server using a dump of my production database, my entire dspam database is 1.7GB, of which the token table is 266M/336M (data/index); I don't have a breakdown of how much of that is untrained vs trained. The signature table is 1.1G/9.0M for comparison.
For comparison, my production server is 7.2GB, on the other hand reflecting the high water mark (since MySQL never shrinks the tables unless optimized manually).
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
