Re: [Spambayes] Analyzing text in image spam

Peter Barker Tue, 22 Aug 2006 17:38:08 -0700

I am not getting any funny tokens with underscores when I run spamcounts.

I tried converting my image with giftopnm, which gives a warning, but I can 
view the resultant image as a portable bitmap image OK. I then ran ocrad on 
it, which said "bad magic number - not a pbm file"
I looked in ImageStripper.py to see what options to use with ocrad, and saw 
that ocrad is being called with the -s option. My ocrad (version 0.9) says -s 
is an invalid option.


When I set the globals/verbose option in .spambayesrc, spamcounts reported:
saving 23 items to /home/peterb/.image_cache.pickle 100.00% hit rate. 

I have attached the gif image I used for testing.

Thanks,
Peter Barker

>     Peter> I have been running the code from CVS for a couple of days, and I
>     Peter> am not sure if analyzing text in images is making a
>     Peter> difference. Can I tell from the Evidence header (or by other
>     Peter> means) if the image analyzer is actually being used, and what
>     Peter> evidence it is finding?
>
> Sure, you'll probably see lots of tokens with runs of underscore
> characters, such as (from spamcounts output):
>
>     token,nspam,nham,spam prob
>     yn__,1,0,0.844827586207
>     _ol__,2,0,0.908163265306
>     __leht,1,0,0.844827586207
>     _omo____,1,0,0.844827586207
>     rpo_la_o__,1,0,0.844827586207
>     _lo__,4,0,0.949438202247
>     __a_,1,0,0.844827586207
>
> Those correspond to characters it could tell were there, but didn't
> recognize.
>
> Did you start training from scratch?

image001.gif
Description: GIF image

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Re: [Spambayes] Analyzing text in image spam

Reply via email to