RE: uribl not working properly with .gg TLD
Ah I understand now why they are treated differently.. I've never delved into the details of that module. Blacklisting might be a good idea! Thanks Dave Giampaolo Tomassoni-2 wrote: What I am asking is why a reference to http://querty.ru.gg generates a URI lookup for ru.gg (ie missing out the first component) whereas a reference to http://qwerty.ru.com generates a URI lookup for qwerty.ru.com. Dave Because the ru.gg second level domain is not in the TWO_LEVEL_DOMAINS variable defined in Mail::SpamAssassin::Util::RegistrarBoundaries , while ru.com is. If you mean that ru.gg should be there too, please note that querty.ru.gg is a third-level domain of ru.gg, which is assigned to webme.com. So, I don't see any need to discriminate querty.ru.gg from ru.gg. Further, I would personally blacklist the whole .gg gTLD since their whois service is ridiculous. Giampaolo Giampaolo Tomassoni-2 wrote: I'm running SpamAssassin version 3.3.0 and we received some spam recently which contained a link to a .ru.gg domain. While investigating whether it was listed in any of the URIBLs I discovered that if a message contains a link to http://qwerty.ru.gg;, spamassassin only looks up the domain ru.gg - here's a snippet from the log: Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 . DNSBL:dob.sibl.support-intelligence.net:ru.gg Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 . DNSBL:multi.uribl.com.:ru.gg However if I edit the message, change the link to http://qwerty.ru.com; and run it through spamassassin again, then the URIBL lookups are done for the full domain name: Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 . DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 . DNSBL:multi.uribl.com.:qwerty.ru.com This can't be right, can it? It looks like the gg top-level domain isn't being handled properly. Any ideas? I don't see why you believe querty.ru.gg == querty.ru.com . .gg is a gTLD (for the Bailiwick of Guernsey, according to http://en.wikipedia.org/wiki/.gg). Dave Giampaolo -- View this message in context: http://old.nabble.com/uribl-not-working- properly-with-.gg-TLD-tp29159353p29159839.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29170299.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
uribl not working properly with .gg TLD
I'm running SpamAssassin version 3.3.0 and we received some spam recently which contained a link to a .ru.gg domain. While investigating whether it was listed in any of the URIBLs I discovered that if a message contains a link to http://qwerty.ru.gg;, spamassassin only looks up the domain ru.gg - here's a snippet from the log: Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 . DNSBL:dob.sibl.support-intelligence.net:ru.gg Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 . DNSBL:multi.uribl.com.:ru.gg However if I edit the message, change the link to http://qwerty.ru.com; and run it through spamassassin again, then the URIBL lookups are done for the full domain name: Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 . DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 . DNSBL:multi.uribl.com.:qwerty.ru.com This can't be right, can it? It looks like the gg top-level domain isn't being handled properly. Any ideas? Dave -- View this message in context: http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29159353.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: uribl not working properly with .gg TLD
What I am asking is why a reference to http://querty.ru.gg generates a URI lookup for ru.gg (ie missing out the first component) whereas a reference to http://qwerty.ru.com generates a URI lookup for qwerty.ru.com. Dave Giampaolo Tomassoni-2 wrote: I'm running SpamAssassin version 3.3.0 and we received some spam recently which contained a link to a .ru.gg domain. While investigating whether it was listed in any of the URIBLs I discovered that if a message contains a link to http://qwerty.ru.gg;, spamassassin only looks up the domain ru.gg - here's a snippet from the log: Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 . DNSBL:dob.sibl.support-intelligence.net:ru.gg Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 . DNSBL:multi.uribl.com.:ru.gg However if I edit the message, change the link to http://qwerty.ru.com; and run it through spamassassin again, then the URIBL lookups are done for the full domain name: Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 . DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 . DNSBL:multi.uribl.com.:qwerty.ru.com This can't be right, can it? It looks like the gg top-level domain isn't being handled properly. Any ideas? I don't see why you believe querty.ru.gg == querty.ru.com . .gg is a gTLD (for the Bailiwick of Guernsey, according to http://en.wikipedia.org/wiki/.gg). Dave Giampaolo -- View this message in context: http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29159839.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: FuzzyOCR
Sorry this reply is a bit late, but the problem is a bug in FuzzyOCR. When a message has multiple images, it ends up appending to the text file instead of replacing it. The bug is in routine open_on_specific_fd in Misc.pm: $fname =~ s/ *// and $flags |= O_CREAT|O_WRONLY; should be $fname =~ s/ *// and $flags |= O_CREAT|O_WRONLY|O_TRUNC; (and you have to add O_TRUNC to the import list at the top of the module too). I logged this as ticket 555 on the FuzzyOCR website. Having fixed that, I'm not sure that FuzzyOCR is helping much. Also I've lowered the FUZZY_OCR_WRONG_EXTENSION score as it was occasionally firing multiple times on non-spam. Dave Bowie Bailey wrote: I've had FuzzyOCR running for quite a while. Today I found a false positive for it that is a bit strange. The message has seven images. FuzzyOCR claims to have found the word service in five of them (and counted it 10 times for a score of 6.5). However, I can only see the word in one of the images and only three of the seven images have any text at all. Is there a problem here? Is FuzzyOCR still useful? It doesn't seem to hit a lot for me. %OFMAIL: 1.18 %OFSPAM: 3.41 %OFHAM: 0.26 -- Bowie -- View this message in context: http://www.nabble.com/FuzzyOCR-tp19672684p20581027.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.