> If you run ocrad over some spam text images you can see what > it generates. > If it finds nothing, nothing comes out the back end. If it > sees something, > it's almost certain to be some garbage text peculiar to it, > unlikely to turn > up in normal text. For example, here's a pretty clean image: > > http://www.webfast.com/~skip/bogus-5-3.png > > Here's what ocrad produces by default: > > COULD THl_ BE THE NEXT IBM_ > ALL _|___ _wow IWAl LllL |_ ABO_| lo EXPLODEl > WAIIW LllL p_ Ll_E A WAW_ _IARll__ WO_DA_ _EPIEWBER lll > > IomO_n_ __m_ L |_IL IOWP_IER_ |_I (o_h__ OII LllL p_) > __o__ __mbol LllL > F_ld__ Ilo__ O Tl (_o s_/_ On F_ld__ Alon_|) > _ d__ |__o__ __ > I____n_ R__lnO ___onO B__ > \ > ln _h_ Io____ ot _ W___. LllL W____ ______| ___nnlnO Wo___' > > L ln___n__lon_| Anno_n___ > > On_lo__h(IW) _P_o_P__ TP_hnoloO_ b_ > B_llP_ p_oo_ Da_a _P___|__ Ba_k_O_ and _P__o_P_ > |__ ____ __n____lon p__Aqco_TM_/P__AID CO_TM_ > _|__a Po__ablP wloh _OPPd _olld __a_P D_|_P TP_hnoloO_ > _h_ W___oOoll_. _hP Wo_ld _ _|___ _g laO_oO ComOrfP_ > _Pa___lnO W_ldla _ Q_a_ll TP_hnoloO_ > \ > L ln___n__lon_| _IOn_ _4 _W E__oO__n Dl___lb__lon AO___m_n_ > > Th_ b_Pmo__ __PO b_wa_d _a__|_al _Pn___P |_ amonO o_hP_ p__|__|_P > dl___lb__lon aO_PPmPn__ ____Pn_|_ _ndP_ nPOo_la_lon ยช_ > _P_P_al addl_lonal > hlOh O_ofi_ _POlon_ and _PO_P_Pn__ a kP_ ___a_POl_ > Oa__nP__hlO _ha_ _P___P_ > l ln_P_na_lonal ComO__P__ wl_h ___|_ Olobal ma_kP_ _Pa_h > and O_a_an_PPd > O_P _alP_ and lo_k_ _hP _omOan_ ln hlOhl_ dP_|_ablP > p__|__|_P dl___lb__lon > ma_kP__ > > READ MORE ONLINE NOWl > > OPPORl__||_ DOE_ _ol __OI_ o_ IWE DOOR E_ER_ DA_| > _o _A_E A Wl__IE IOODD LllL lo _O_R RADAR _ow A_D > WAIIW II _OARl
FWIW, I am getting *much* better results with gocr than ocrad. gocr running over that same image results in: --- 8< --- _ _ _ _ COULD THIS BE THE NEXT IBM? ALL SIGNS SHOW THAT LITL IS ABOUT TO EXPLODE! Company Name: Stock Symbol: Friday Close: O.71 (Up 6O_a On Friday Alone!) S-dayTarget: $3 Current Rating: Strong Buy \ In the Course of a Week, LITL Makes Several Stunning Moves! L International Announces: - OneTouch(TM) Recovery Technology hr Bullet-Proof Data Security Backups and Restores , - Its Next-Generation PuRA_GO(TM)/PuRAID-GO(TM) UItra-Portable High-Speed Solid State Drive Technology . - the metropolis, the worldt First l9'' Laptop compWer Featuring Nvidiat Quad-SLI Technology _ \ L International Signs $4SM European Distribution Agreement - T_s hremost step hrward tactical venture is, among other exclusive distribution agreements, currently under negotiation gr several additional high-pro_t regions and represents a key strategic partnership that secures L International Computers with truly global market reach and guaranteed pre-sales, and locks the company in highly desirable exclusive distribution marke.ts. --- >8 ---- Indeed, I have never seen an image that ocrad does better on than gocr. FWIW, I'm currently 1/2 way through modifying spambayes to support either ocrad or gocr, in the hope that using gocr will actually cause a noticible reduction in image spam - unfortunately, using gocr I see no reduction at all (which isn't to say there is not a small reduction - it just doesn't "seem" to me like it has reduced). Mark _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
