On Tue, 5 Dec 2017 16:25:28 -0500 Alex wrote: > Hi, I have the following rule that is used to detect some of the less > common URIs: > > uri URI_RARE_TLD > m;://[^/]+\.(?:work|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online|xxx|health|bot|cw|date)(?:/|$);i > describe URI_RARE_TLD URI refers to rarely-nonspam TLD > > The problem is that it is hitting patterns that aren't necessarily > URIs. This one matches on ".SPACE" > > TIX400 ROH B.W.SPACE SHUTTLE IN ... > Should I submit a bug,
It's been discussed before. Not doing that would mean that spammers could just leave off the protocol and avoid URI lists. > or does someone have other suggestions on how > to handle this? It's a reason to exercise caution in scoring such rules. It's one the reasons why, when I suggested rewriting his rules as metarules, I suggested this: meta ADDR_RARE_TLD __REPTO_RARE_TLD || __FROM_RARE_TLD meta URI_RARE_TLD __URI_RARE_TLD && !ADDR_RARE_TLD