Bugs item #3094687, was opened at 2010-10-25 11:12
Message generated for change (Tracker Item Submitted) made by cacolijn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=3094687&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: pop3proxy
Group: 1.1.x
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Carl Colijn (cacolijn)
Assigned to: Nobody/Anonymous (nobody)
Summary: Trainging via web interface (sometimes) doesn't work?

Initial Comment:
Hi all,

This is a summary of my post on the SpamBayes user discussion list where it 
went unanswered (october 14, 
http://blog.gmane.org/gmane.mail.spam.spambayes.general ).

I have 2 Thunderbird spam training folders (one ham, one spam).  With SpamBayes 
1.0.4 I used these to quickly re-train the filter after a re-install and such.  
These folder files have no extension, but they worked perfectly when uploading 
them for training via the web interface.

I now set up a SpamBayes 1.1a6 installation, and let it train on the training 
folders, but it didn't work.  No errors in the web interface, training seemed 
to go OK (uploaded ok, Training... Saving... Done!) but the statistics on the 
main page ("Total emails trained") didn't reflect the newly trained mails 
(neither ham nor spam).

After that I uninstalled SpamBayes and tried the ThunderBayes++ Thunderbird 
plugin (which also includes version 1.1a6), but it too wouldn't train via the 
web interface - training seems to go OK but the trained-on mails don't arrive 
in the database.

Maybe it's some silly configuration issue on my part, but I've already tweaked 
it for quite a few hours now and can't get it right.  But even while the cause 
could be my configuration history, it might mean that certain configuration 
changes might break a SpamBayes installation.

The attached zip file contains the configuration file set after test training 
on 1 ham message, and it also includes 2 sample Thunderbird mail files (spam & 
ham) containing one email each - I used these for the test training.

Some observations:
- I run Windows XP SP3 en-us with the SpamBayes 1.1a6 version shipped with 
ThunderBayes++ - databases are of the pickle version
- My training databases (2 Thunderbird files - spam and ham) contain +- 250 
ham, +- 6000 spam
- When I start clean (close ThunderBird/SpamBayes, delete the cache & training 
databases) it re-creates them OK when restarted again
- After a restart it claims there are 0 trained messages (of course)
- When I upload the Thunderbird ham training folder file it seems to process it 
correctly but after it's done the counter still remains at "0 trained messages"
- hamme.db doesn't grow either (56 bytes after a clean database recreation, 
still 56 bytes after training)
- There's no errors in the log
- I've enabled caching messages (ThunderBayes by default has it off I think), 
and the uploaded messages do get extracted as separate messages in the cache - 
messageinfo.db indeed also grows
- "Review messages" sometimes shows the uploaded messages, but not consistently 
- they did appear a few times after I tweaked and restarted and such
- Copy/pasting a separate mail with headers and training on that has the same 
effect
- When I let it train on my Spam folder (with 6000+ mails in it) it is 
seriously busy - CPU at 100% for more than 10 minutes - so it must be doing 
something (apart from extracting to the cache)?
- Consecutively letting it train on the small Ham folder (250 messages) now 
takes far more time - the 6000+ spam messages it processed earlier must have 
influenced something
- When I look at the "More statistics" page the uploaded messages _do_ get 
reflected in the "Unsures trained as good" and "Unsures trained as spam" 
statistics
- Training via the ThunderBayes plugin buttons in ThunderBird _do_ raise the 
"trained on" counters - what does it do that I cannot do?
- There are no SMTP proxy details info specified in the settings - I assume 
ThunderBayes++ passes the ham/spam training via the web interface as well?
- Starting from scratch again (delete db's, clear email cache) and selecting 
"bsddb" as db type didn't change a thing

Here's the spambayes.ini file I use:

[Headers]
include_score:True
notate_subject:
[Storage]
persistent_use_database:pickle
persistent_storage_file:databases/hammie.db
cache_expiry_days:2
cache_messages:True
no_cache_bulk_ham:False
messageinfo_storage_file:databases/messageinfo.db
ham_cache:cache/ham
spam_cache:cache/spam
unknown_cache:cache/unsure
[html_ui]
default_spam_action:defer
display_score:True
[pop3proxy]
use_ssl:automatic
listen_ports:53100,53101,53102
remote_servers:xxx.xxx.com:110,xxx.xxx.com:995,xxx.xxx.nl:110



If anyone wants a copy of my installation or wants me to test some things, 
please feel free to ask - I'm willing to have my installation dissected for the 
good cause.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=3094687&group_id=61702
_______________________________________________
Spambayes-bugs mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-bugs

Reply via email to