Re: FuzzyOcr question
Is decoder (Chris) still developing FuzzyOCR ? Regards, -- --[ UxBoD ]-- // PGP Key: curl -s http://www.splatnix.net/uxbod.asc | gpg --import // Fingerprint: F57A 0CBD DD19 79E9 1FCC A612 CB36 D89D 2C5A 3A84 // Keyserver: www.keyserver.net Key-ID: 0x2C5A3A84 // Phone: +44 845 869 2749 SIP Phone: [EMAIL PROTECTED] - Original Message - From: NFN Smith [EMAIL PROTECTED] To: users@spamassassin.apache.org Sent: 14 January 2008 17:35:30 o'clock (GMT) Europe/London Subject: FuzzyOcr question A couple of months ago, I updated FuzzyOcr to the current package version supported in Debian Stable (2.3b-1). In the meantime, I notice that when there are hits on FuzzyOcr, the SpamAssassinReport.txt attachment is showing that I am getting hits on FuzzyOcr, and the number of points scored by hits, but in the Description, I'm getting only BODY:, and no listing of which words were actually hit. e.g., 2.0 FUZZY_OCR BODY: I'm not finding anything in docs or FuzzyOcr.cf that seems to govern this one, and for debugging purposes, I'd really like to know what terms are getting hits or not. What am I missing? Smith -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: FuzzyOcr question
Is decoder (Chris) still developing FuzzyOCR ? I haven't seen any changes recently, nor any discussion on the FuzzyOCR mailing list. But then I haven't seen a lot of OCR spams going by since the stock spams cut down in volume a while back. I'd say its a good tool to keep around just to keep them from coming back! Loren
Re: FuzzyOcr question
Loren Wilton wrote: Is decoder (Chris) still developing FuzzyOCR ? I haven't seen any changes recently, nor any discussion on the FuzzyOCR mailing list. But then I haven't seen a lot of OCR spams going by since the stock spams cut down in volume a while back. I'd say its a good tool to keep around just to keep them from coming back! The volume of graphical spam that needs FuzzyOCR is pretty limited on my spamtraps, although a couple of weeks ago, I saw a couple of bursts of pump-and-dump. However, there's a still a slow (but steady) volume of pillz spammers, and occasional watches and OEM getting through. On the Pillz, there's one that looks like a Yambo one, and I finally tweaked my terms list enough that I'm getting a couple of FuzzyOcr points on that, but not quite enough to force a rejection. There's also one pillz that's pretty offensive (but fairly infrequent) -- it got a couple of points, but I'd really like to get enough hits on that one to force rejection, so that my users don't see it. Thus, I'd like to get a verification of what terms are actually getting hits. Smith
Re: FuzzyOcr question
NFN Smith wrote: [snip] On the Pillz, there's one that looks like a Yambo one, and I finally tweaked my terms list enough that I'm getting a couple of FuzzyOcr points on that, but not quite enough to force a rejection. There's also one pillz that's pretty offensive (but fairly infrequent) -- it got a couple of points, but I'd really like to get enough hits on that one to force rejection, so that my users don't see it. Thus, I'd like to get a verification of what terms are actually getting hits. To do that you should save the spam and run it through spamassassin with '-D FuzzyOcr -x -t' parameters to see what matches and what words don't. The FuzzyOcr.words file defines the words FuzzyOcr is looking for; most of us have a customized file: add words to that file, tweak factors, etc. It all depends on what country/language you are using it. BTW the recommended (in FuzzyOcr's site) version is 3.5.1 with some patches from the SVN repository. The version you have has some issues with recent Spamassassin versions, like the one about the report being empty, or not formatted. -- René Berber
RE: FuzzyOCR question
I'm brainstorming here tonight and I'm curious of something. When you're using FuzzyOCR, is it called for every message that goes through SA, or just ones with gif attachments? FuzzyOcr is invoked on every image on a message whenever the message itself doesn't reach a score threshold by other means. Ie: if a spam is detected as such before running FuzzyOcr, the latter is not invoked. --- Giampaolo Tomassoni - IT Consultant Piazza VIII Aprile 1948, 4 I-53044 Chiusi (SI) - Italy Ph: +39-0578-21100 MAI inviare una e-mail a: NEVER send an e-mail to: [EMAIL PROTECTED] Steven Lake Owner/Technical Writer Raiden's Realm www.raiden.net A friendly web community