Maarten> and this has worked very goeed for a long time. But recently we
    Maarten> had a lot of spams coming trough and saw that the spamdb.db
    Maarten> filesize stays at 82.040 KB ! Normally after training the
    Maarten> filesize always grows.  We've started with a new database,
    Maarten> however after 2 weeks the filesize stays at 20.680 KB.

    Maarten> Spambayes still functions ok but we feel we can't update the
    Maarten> database anymore ...

    Maarten> I guess this a software problem because of the filesize (just
    Maarten> about 20GB and 80GB)

20 & 80 kilobytes or 20 & 80 gigabytes?  Big difference!

In any case, note than many of the database file formats implement on-disk
hashes.  Thus they greatly overallocate space to support efficient key
lookup and insertion.  Also, after awhile you probably see very few actual
new keys.  Most of the time when you train you're probably just adjusting
the counts on already seen keys.  It's thus not suprising that the absolute
file size doesn't change (or actually changes very slowly).  To see if the
file contents change after you train a message, try comparing the checksum
of the file before and after training.  I don't know what's available on
Windows.  On Unix/Linux/Mac systems the cksum command does the trick:

    % ls -l hammie.db
    -rw-rw-r--   1 skip  staff  1326002 May 21 07:27 hammie.db
    % cksum hammie.db
    3591908501 1326002 hammie.db

Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to