RE: uribl not working properly with .gg TLD

2010-07-15 Thread DaveAtJLA

Ah I understand now why they are treated differently.. I've never delved into
the details of that module.

Blacklisting might be a good idea!

Thanks

Dave


Giampaolo Tomassoni-2 wrote:
 
 What I am asking is why a reference to http://querty.ru.gg generates a
 URI
 lookup for ru.gg (ie missing out the first component) whereas a
 reference to
 http://qwerty.ru.com generates a URI lookup for qwerty.ru.com.
 
 Dave
 
 Because the ru.gg second level domain is not in the TWO_LEVEL_DOMAINS
 variable defined in Mail::SpamAssassin::Util::RegistrarBoundaries , while
 ru.com is.
 
 If you mean that ru.gg should be there too, please note that querty.ru.gg
 is
 a third-level domain of ru.gg, which is assigned to webme.com. So, I don't
 see any need to discriminate querty.ru.gg from ru.gg.
 
 Further, I would personally blacklist the whole .gg gTLD since their whois
 service is ridiculous.
 
 Giampaolo
  
 
 
 Giampaolo Tomassoni-2 wrote:
 
  I'm running SpamAssassin version 3.3.0 and we received some spam
  recently
  which contained a link to a .ru.gg domain. While investigating
 whether
  it
  was listed in any of the URIBLs I discovered that if a message
 contains
  a
  link to http://qwerty.ru.gg;, spamassassin only looks up the domain
  ru.gg
  - here's a snippet from the log:
 
  Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 .
  DNSBL:dob.sibl.support-intelligence.net:ru.gg
  Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 .
  DNSBL:multi.uribl.com.:ru.gg
 
  However if I edit the message, change the link to
  http://qwerty.ru.com; and
  run it through spamassassin again, then the URIBL lookups are done
 for
  the
  full domain name:
 
  Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 .
  DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com
  Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 .
  DNSBL:multi.uribl.com.:qwerty.ru.com
 
  This can't be right, can it? It looks like the gg top-level domain
  isn't
  being handled properly. Any ideas?
 
  I don't see why you believe querty.ru.gg == querty.ru.com .
 
  .gg is a gTLD (for the Bailiwick of Guernsey, according to
  http://en.wikipedia.org/wiki/.gg).
 
 
  Dave
 
  Giampaolo
 
 
 
 
 --
 View this message in context: http://old.nabble.com/uribl-not-working-
 properly-with-.gg-TLD-tp29159353p29159839.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29170299.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



uribl not working properly with .gg TLD

2010-07-14 Thread DaveAtJLA

I'm running SpamAssassin version 3.3.0 and we received some spam recently
which contained a link to a .ru.gg domain. While investigating whether it
was listed in any of the URIBLs I discovered that if a message contains a
link to http://qwerty.ru.gg;, spamassassin only looks up the domain ru.gg
- here's a snippet from the log:

Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 .
DNSBL:dob.sibl.support-intelligence.net:ru.gg
Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 .
DNSBL:multi.uribl.com.:ru.gg

However if I edit the message, change the link to http://qwerty.ru.com; and
run it through spamassassin again, then the URIBL lookups are done for the
full domain name: 

Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 .
DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com
Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 .
DNSBL:multi.uribl.com.:qwerty.ru.com

This can't be right, can it? It looks like the gg top-level domain isn't
being handled properly. Any ideas?

Dave

-- 
View this message in context: 
http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29159353.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: uribl not working properly with .gg TLD

2010-07-14 Thread DaveAtJLA

What I am asking is why a reference to http://querty.ru.gg generates a URI
lookup for ru.gg (ie missing out the first component) whereas a reference to
http://qwerty.ru.com generates a URI lookup for qwerty.ru.com.

Dave


Giampaolo Tomassoni-2 wrote:
 
 I'm running SpamAssassin version 3.3.0 and we received some spam
 recently
 which contained a link to a .ru.gg domain. While investigating whether
 it
 was listed in any of the URIBLs I discovered that if a message contains
 a
 link to http://qwerty.ru.gg;, spamassassin only looks up the domain
 ru.gg
 - here's a snippet from the log:
 
 Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.026 .
 DNSBL:dob.sibl.support-intelligence.net:ru.gg
 Jul 14 07:55:54.785 [3269] dbg: async: timing: 0.027 .
 DNSBL:multi.uribl.com.:ru.gg
 
 However if I edit the message, change the link to
 http://qwerty.ru.com; and
 run it through spamassassin again, then the URIBL lookups are done for
 the
 full domain name:
 
 Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.287 .
 DNSBL:dob.sibl.support-intelligence.net:qwerty.ru.com
 Jul 14 08:52:49.412 [16122] dbg: async: timing: 0.290 .
 DNSBL:multi.uribl.com.:qwerty.ru.com
 
 This can't be right, can it? It looks like the gg top-level domain
 isn't
 being handled properly. Any ideas?
 
 I don't see why you believe querty.ru.gg == querty.ru.com .
 
 .gg is a gTLD (for the Bailiwick of Guernsey, according to
 http://en.wikipedia.org/wiki/.gg).
 
 
 Dave
 
 Giampaolo
 
 
 

-- 
View this message in context: 
http://old.nabble.com/uribl-not-working-properly-with-.gg-TLD-tp29159353p29159839.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: FuzzyOCR

2008-11-19 Thread DaveAtJLA

Sorry this reply is a bit late, but the problem is a bug in FuzzyOCR. When a
message has multiple images, it ends up appending to the text file instead
of replacing it. The bug is in routine open_on_specific_fd in Misc.pm:

$fname =~ s/ *// and $flags |= O_CREAT|O_WRONLY;

should be

$fname =~ s/ *// and $flags |= O_CREAT|O_WRONLY|O_TRUNC;

(and you have to add O_TRUNC to the import list at the top of the module
too).

I logged this as ticket 555 on the FuzzyOCR website.

Having fixed that, I'm not sure that FuzzyOCR is helping much. Also I've
lowered the FUZZY_OCR_WRONG_EXTENSION score as it was occasionally firing
multiple times on non-spam.

Dave



Bowie Bailey wrote:
 
 I've had FuzzyOCR running for quite a while.  Today I found a false
 positive for it that is a bit strange.
 
 The message has seven images.  FuzzyOCR claims to have found the word
 service in five of them (and counted it 10 times for a score of 6.5).
 However, I can only see the word in one of the images and only three of
 the seven images have any text at all.  Is there a problem here?
 
 Is FuzzyOCR still useful?  It doesn't seem to hit a lot for me.
 
   %OFMAIL: 1.18
   %OFSPAM: 3.41
   %OFHAM:  0.26
 
 --
 Bowie
 
 

-- 
View this message in context: 
http://www.nabble.com/FuzzyOCR-tp19672684p20581027.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.