[EMAIL PROTECTED] ha scritto: > I updated the OCR capabilities a bit more today. I added more intelligent > assembly of split images into a single image after noticing that the > spammers don't simply chop up multi-part GIF images horizontally. I also > added a couple extra options (ocrad_scale and ocrad_charset) which control > the image scaling factor (default is 2) and character set (default is > "ascii") Ocrad uses. Scaling the image by a factor of 2 was a pretty > obvious win: > > false positive percentages > 0.000 0.000 tied > 0.000 0.000 tied > 0.000 0.000 tied > 0.000 0.000 tied > 0.000 0.000 tied > > won 0 times > tied 5 times > lost 0 times > > total unique fp went from 0 to 0 tied > mean fp % went from 0.0 to 0.0 tied > > false negative percentages > 4.213 4.213 tied > 1.404 0.843 won -39.96% > 3.371 2.809 won -16.67% > 2.528 2.247 won -11.12% > 4.213 3.652 won -13.32% > > won 4 times > tied 1 times > lost 0 times > > total unique fn went from 56 to 49 won -12.50% > mean fn % went from 3.14606741573 to 2.75280898876 won -12.50% > > Scaling by a factor of three was even better in the false negative > department but regressed a bit in the false positive category so I checked > Options.py in with a default scaling factor of 2. A couple things could > stand to be further tested: > > * I have no idea how good Ocrad's scaling algorithm is. It's possible > that PIL or NetPBM's scaling code is better. If so, it would make > sense to scale the images before feeding to Ocrad. > > * The images I've see so far were all plain English, so I blindly made > ascii the default charset. The other choices were iso-8859-9 and > iso-8859-15. I simply assumed ascii would be the most appropriate > default, but didn't test it. > > Finally, I put together a really simpleminded Ocrad-for-Windows release > based upon the ocrad.exe binary that Tony built. Check the Files section of > the SpamBayes project site: > > http://sourceforge.net/project/showfiles.php?group_id=61702 > > and grab ocrad-cygwin. > > There are a few caveats: > > 1. I don't do Windows. (No, really, I don't, strange as that may seem.) > This is no fancy-schmancy point-and-shoot Windows installer. It's > just a simple zip file with the Ocrad 0.15 distribution, Tony's .exe > file and the patch he applied to the source. > > 2. I don't do Windows. The code I've written so far has been done > entirely on my Mac. I've made no obvious concessions to portability. > That said, I hope portability issues won't be daunting for any early > adopters. > > 3. I don't do Windows. If you have problems it won't do you any good to > mail me directly. Post about problems on the SpamBayes bug tracker: > > http://sourceforge.net/tracker/?group_id=61702&atid=498103 > > 4. If you do Windows you will need PIL to take advantage of the recent > changes: > > http://www.pythonware.com/products/pil/ > > (unless you want to put hair on your chest and build NetPBM on > Windows). Fredrik Lundh provides prebuilt Windows versions of PIL. > Grab the one appropriate for the version of Python you have > installed. > > 5. If you do Windows (or any other platform for that matter), feedback > to the lists about successes and failures would be helpful. > > Cheers, > > Skip > > > _______________________________________________ > spambayes-dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/spambayes-dev > > > Hi,
I'm very interested in this OCR and in the way SpamBayes analyzes image spam. Now there is a new kind of image spam using animated images and I've received a lot of "animated spam" lately so it's possible they could be very common in a brief period. Here you can find a brief description about this: http://www.viruslist.com/en/weblog?weblogid=196822613 I would like to ask you how your OCR manages this kind of images. Thank you a lot for your time. Regards -- Michele Belloli Research & Development Dept. Symbolic - Network Security Distributor http://www.symbolic.it eXtensiveControl La nuova soluzione di Content Filtering per la PMI http://www.extensivecontrol.it/ _______________________________________________ spambayes-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-dev
