This is just one simple little test... I took two pump & dump messages for HLVK I received overnight. The GIF image is actually sliced into pieces horizontally, so I wrote a little shell script to convert the images to netpbm and concatenate them, then sent the result through ocrad, sorted, uniq'd and downshited the whole mess, then checked for words the two had in common. I came up with: _ __ and co company hlv hlvc lnc. low new news nlv now! now!!! on the tnis wl_ |_
While that is not a huge increase in the number of tokens and some aren't going to help, it's still better than what we have today. Time will tell if the cost is worth it. Perhaps if we generate some further interest in ocrad it will improve as well. Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev