> On Jan 29, 2014, at 9:53 AM, "Andy Jezierski" <ajezier...@stepan.com> wrote:
> 
> I've been noticing a lot of spam getting through with the same traits, a 
> bunch of random words within brackets.  They all seem to come after the 
> </body> or the </html> tag.  Anyone much more knowledgeable than me care to 
> assist with a rule to detect them?

What about something like:

rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}

This will hit on 10 or more consecutive tags separated by nothing but white 
space. Only single-word tags will hit, so this should minimize FPs from heavy 
formatting such as nested divs.

Completely untested, use at your own risk (but post back and tell us how well 
it worked).

--- Amir
thumbed via iPhone

Reply via email to