Just thought this might help someone out. Thanks to M. Blapp for an
excellent SA Plugin. Optical Character Recognition (OCR) can be used to
nab those pesky spam messages that are hidden in gif,jpeg, or png images...
Here is what I did to get the plugin running.
Test the components that the plugin uses first.
( Check out the documentation at
http://antispam.imp.ch/patches/patch-ocrtext for requirements. )
1. Copy a spam image for an example to your sa machine.
2. Use giftopnm or jpegtopnm or pngtopnm to convert whatever type of
image you have to a pnm image like so:
giftopnm Xj105jQX.gif > Xj105jQX.pnm
3. Run gocr on the pnm file like so:
gocr Xj105jQX.pnm
This should output some text with lots of garbage. If you got this far
you should be ready to get the plugin going.
1. cd to /etc/mail/spamassassin
2. download the patch file from:
http://antispam.imp.ch/patches/patch-ocrtext
3. type 'patch < patch-ocrtext'
This will create two files in your current directory called
ocrtext.cf and ocrtext.pm
4. Edit v310.pre and add the following lines:
# OCR - performs Optical Character Recognition on spam images
#
loadplugin ocrtext /etc/mail/spamassassin/ocrtext.pm
loadplugin Mail::SpamAssassin::Timeout
5. Edit the ocrtext.cr file and change the following settings:
## This points to your gocr binary not just the path. Try 'which gocr'.
gocr_path /usr/local/bin/gocr
## This is JUST the path to your pnm binarys ( i.e. pngtopnm, giftopnm,
jpegtopnm )
pnmtools_path /usr/bin
6. Run spamassassin -D --lint and check for errors.
If all went well restart spamassassin or force it to reread it's config
however you would on your system.
Then try typing something like 'tail -f /var/log/mail.log | grep
SPAMPIC_ALPHA', on a high volume server you should see some rules
matching after a few minutes. If so then you are OCR'ing the images!
Hope this helps!
Sincerely,
Davin Flatten
--
Davin Flatten
Unix Systems Administrator
University of Massachusetts
Amherst, MA 01003
Phone: 413-545-1580
Email: [EMAIL PROTECTED]