On Tue, 5 Dec 2017 16:25:28 -0500
Alex wrote:

> Hi, I have the following rule that is used to detect some of the less
> common URIs:
> 
> uri        URI_RARE_TLD
> m;://[^/]+\.(?:work|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online|xxx|health|bot|cw|date)(?:/|$);i
> describe   URI_RARE_TLD     URI refers to rarely-nonspam TLD
> 
> The problem is that it is hitting patterns that aren't necessarily
> URIs. This one matches on ".SPACE"
> 
> TIX400 ROH B.W.SPACE SHUTTLE IN
...
> Should I submit a bug, 

It's been discussed before. Not doing that would mean that spammers
could just leave off the protocol and avoid URI lists.


> or does someone have other suggestions on how
> to handle this?


It's a reason to exercise caution in scoring such rules. It's one the
reasons why, when  I suggested rewriting his rules as metarules, I
suggested this:


meta  ADDR_RARE_TLD     __REPTO_RARE_TLD || __FROM_RARE_TLD

meta  URI_RARE_TLD      __URI_RARE_TLD && !ADDR_RARE_TLD

Reply via email to