(CC'ing spambayes@python.org to make sure the conversion procedure outlined
below is saved for posterity.  I'll add it to the SpamBayes wiki as well.)

    Kevin> Wouldn't a corrupt database produce consistent results
    Kevin> (i.e. always crash on a particular message)? The behavior I'm
    Kevin> seeing is that the crash tends to occur about 7 messages (and
    Kevin> steadily decreasing) into an email download session. Following
    Kevin> the crash, if I restart SpamBayes and begin downloading messages
    Kevin> again, the first message (the one the previous session crashed
    Kevin> on) is downloaded without an issue.

Okay, a little more investigation.  When Sam Thorne built the installer you
used he added a new storage format type, "dbm" (that's the patch referred to
on the wiki).  I suspect he added that because at the time SpamBayes didn't
have a db185hash database type name.  I'm not sure why he didn't simply use
the pickle format.  On Macs, importing the "dbm" package doesn't actually
use the old Unix dbm file format.  It used the not as old, but just as
bereft of merit, Berkeley DB 1.85 file format.  This file format is known to
have serious errors in its design.  It's likely that you've encountered one
of those errors.  Here's a note I wrote to the SpamBayes list back in 2002:

    http://mail.python.org/pipermail/spambayes/2002-December/002394.html

I reverted the change I made yesterday which added a "dbm" file type to
SpamBayes.  It's simply not needed and would only lead to more grief such as
you've encountered.

Now the fun begins...

The simplest way to solve your problem is to switch to the pickle file
format for your database storage.

Everything happens in a Terminal window, so fire up your Mac's Terminal app.
You will have to sudo to root because the permissions on the files and
directories seem incorrect (at least to me).  So, execute

    sudo bash

at the shell prompt and enter your password when prompted.  (I assume for
this exercise that your user account is an administrative account.)

Stop the POP3 proxy:

    /Library/StartupItems/SpamBayes stop

Change directory to the SpamBayes installation:

    cd /Library/SpamBayes

Edit your bayescustomize.ini file (vi, ex, Emacs, the TextEdit app, whatever
your favorite editor is - just make sure to save it as plain text if you use
an editor which likes "rich" formats).  Change the persistent_use_database
option to pickle and set the persistent_storage_file option to hammie.pck.
If you don't have a bayescustomize.ini file for some reason or it's missing
either of these options, just make sure you edit or create it so that it
contains these lines:

[Storage]
persistent_use_database: pickle
persistent_storage_file: hammie.pck

In that same directory you should have two files, _pop3proxyham.mbox and
_pop3proxyspam.mbox.  We need to train on them to create the hammie.pck
file.  Alas, it appears the POP3 proxy created those files incorrectly,
adding a spurious From_ line before every message.  Execute these two
commands to clean up that mess:

    sed -e '/^From [EMAIL PROTECTED]/d' < _pop3proxyham.mbox > 
_pop3proxyham.mbox.new
    sed -e '/^From [EMAIL PROTECTED]/d' < _pop3proxyspam.mbox > 
_pop3proxyspam.mbox.new

Now execute

    /usr/bin/python sb_mboxtrain.py -p hammie.pck -g _pop3proxyham.mbox.new -s 
_pop3proxyspam.mbox.new -f

to generate hammie.pck.  Remove the output files from the sed commands:

    rm _pop3proxy*.mbox.new

Restart the POP3 proxy with this command:

    /Library/StartupItems/SpamBayes start

Hopefully that will get you going again.  Let me know if it doesn't.

Now to uninstall this package from my Mac...

Skip
_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to