Richard, > I am looking at Fuzzy ocr to detect more image spam and I had a couple > of questions;
FuzzyOCR does not detect image spam per se, it detects spam text in an image. To classify image spam, you could consider image Cerberus that does a classification on images metadata (size, presence of text, etc.) > 1) Is this being used? Does it detect image spam, or should I be > looking at something else? Yes. No, maybe. I am running it, it does not do a very good job at extracting the text from the images. Then it uses it's own list of keywords to detect spam: to me it's the biggest problem, it should push back the text to SpamAssassin and let SA rules decide what to do with it. > 2) I'm getting some horny date spam coming through with just > images and text inside an image at the bottom. My bayes seems to be > scoring this with -1.90 Bayes_00. I keep sending this to my database > as spam but I'm not sure how many I need to feed it and I don't get > much. Are there any other means of feeding bayes with image spam (or > any spam really) from a source on the internet? Or is that a bad idea > since that's not my spam? The ideal plugin would be able to look at a picture and decide that it's an horny date :) I remember we once had a student that wanted to work on classifying picture by the amount of flesh to decide whether it was a naked picture or not/ But I don't think he ever succeeded. > 3) If I use Fuzzy OCR on FreeBSD, how does it get updated? I doubt FuzzyOCR ever gets updated, on FreeBSD or elsewhere. > 4) I installed it from the ports and I had to install tesseract > or I got a dependency warning message. Now I still get a warning - > warn: FuzzyOcr: Cannot find executable for gifinter - Is this normal? > How should I omit this error since I can't find gifinter in the ports > tree? gifinter used to be part of /usr/ports/graphics/giflib but the NEWS file mentions that: Version 5.0.1 ============= Retirements ----------- * gifinter is gone. Use convert -interlace from the ImageMagick suite. In my case, I still have an old executable of gifinter laying around, but I think you would configure FuzzyOCF.cf with an approprate line of the form: focr_bin_gifinter /usr/local/bin/convert -interlace and the needed parameters. Best regards, Olivier