I have installed the CVS version as suggested. A couple of points which may help others trying it (especially the PIL). I am using FC5 on AMD64, and had to install tk-devel, tcl-devel as well as tk and tcl (and tkinter etc). To get PIL to successfully include support for everything I had to add /usr/lib64 to the standard paths in setup.py. The freetype2 files required by PIL are in freetype-devel.
I will report how it performs in a few days. Is there any way I can easily test it with my current spam collection without creating a new .hammiedb and starting again? My email is stored in one file/folder (mbox). I tried just feeding a few messages which had been incorrectly classified, and they were now classified as spam, but I think that is because I had trained them as spam after I received them (with version 1.1a2). I am using kmail with sb_bnfilter.py. Can I tell from the X-Spambayes-Evidence header if the new code is detecting any spam? Regards, Peter Barker > >>>>> "skip" == skip <[EMAIL PROTECTED]> writes: > > I should have given a bit more complete answer based on your message's more > general point. I recently added a fair amount of code to SpamBayes to > "crack" the content of images. The new code works very well for me. If > you'd like to try it, here's what you'll need to do: > > 1. Check out the latest source from the CVS repository. (There's been > no new release since my recent checkins.) Install it. > > 2. Install the Python Imaging Library: > http://www.pythonware.com/products/pil/ > > 3a. (Windows) Grab the ocrad-cygwin package from the > SpamBayes Files page: > http://sourceforge.net/project/showfiles.php?group_id=61702 > Unpack the zip file and copy ocrad.exe somewhere on your PATH. > > 3b. (Unix/Linux/Mac) Grab the ocrad source distribution from its web > site: > http://www.gnu.org/software/ocrad/ocrad.html > Unpack and install it. > > I realize this may not be all that straightforward for people who are > unused to installing open source software. Once you've done it a couple > times though, it gets easier. Hopefully, we can get another SpamBayes > alpha release out in the next little while. (Tony, if there's anything I > can do to help make this happen, let me know.) > > Once you're ready to go, add the following to your SpamBayes options: > > x-lookup_ip: True > lookup_ip_cache: ~/.dnscache > > x-image_size: True > > x-crack_images: True > crack_image_cache: ~/.image_cache.pickle > > The first group is unrelated to the image spam, but I find it helps me a > lot. It maps hostnames to their IP addresses using DNS and generates > tokens based on those addresses. The second records tokens about the size > of images. The third enables text extraction from images (OCR, or optical > character recognition). This is where PIL and Ocrad come in. > > I still get the occasional false negative on image spam, but it's > definitely manageable and should improve as Ocrad (itself still a very > alpha piece of software) improves. Even though Ocrad does a poor job of > text extraction from a human comprehension standpoint, it generates tokens > that SpamBayes just loves and seems to generate enough unique tokens to tip > the scales on most image spam. > > Skip _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
