I am not getting any funny tokens with underscores when I run spamcounts. I tried converting my image with giftopnm, which gives a warning, but I can view the resultant image as a portable bitmap image OK. I then ran ocrad on it, which said "bad magic number - not a pbm file" I looked in ImageStripper.py to see what options to use with ocrad, and saw that ocrad is being called with the -s option. My ocrad (version 0.9) says -s is an invalid option.
When I set the globals/verbose option in .spambayesrc, spamcounts reported: saving 23 items to /home/peterb/.image_cache.pickle 100.00% hit rate. I have attached the gif image I used for testing. Thanks, Peter Barker > Peter> I have been running the code from CVS for a couple of days, and I > Peter> am not sure if analyzing text in images is making a > Peter> difference. Can I tell from the Evidence header (or by other > Peter> means) if the image analyzer is actually being used, and what > Peter> evidence it is finding? > > Sure, you'll probably see lots of tokens with runs of underscore > characters, such as (from spamcounts output): > > token,nspam,nham,spam prob > yn__,1,0,0.844827586207 > _ol__,2,0,0.908163265306 > __leht,1,0,0.844827586207 > _omo____,1,0,0.844827586207 > rpo_la_o__,1,0,0.844827586207 > _lo__,4,0,0.949438202247 > __a_,1,0,0.844827586207 > > Those correspond to characters it could tell were there, but didn't > recognize. > > Did you start training from scratch?
image001.gif
Description: GIF image
_______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
