Trying to detect bogus end tags

Loren Wilton 28 Feb 2004 03:46:37 -0000

I'm trying to come up with a way to detect bogus end tags, and so far I'm
not having much luck.


What I'm specifically trying to catch are things like

</table>
</belch></huntsville></delusion></wilma></boswell></attune>
</vasectomy></centum></surf></yeasty></molt></autocollimate>
</acrobat></harvest></gage></flagrant></fumble></nowadays>
</BODY>
</HTML>

Now, it looks like there is an html_tag_balance eval that would catch the
fact that there is no "<belch" to match the "</belch>" in the above hunk of
spam, if only there were some way that I could feed "belch" into the eval.
I can detect end tags eash enough with a regexp, but I can't find any way
that works to pull the found tag out and feed it to the eval routine within
an SA rule definition.

Alternately, is there a way to write a regexp that will let me look backward
for <belch once I have found </belch>?  I can't seem to figure this one out
either.

Thanks,
        Loren

Trying to detect bogus end tags

Reply via email to