On Fri, 2004-02-27 at 19:46, Loren Wilton wrote:
> I'm trying to come up with a way to detect bogus end tags, and so far I'm
> not having much luck.
> 
> What I'm specifically trying to catch are things like
> 
> </table>
> </belch></huntsville></delusion></wilma></boswell></attune>
> </vasectomy></centum></surf></yeasty></molt></autocollimate>
> </acrobat></harvest></gage></flagrant></fumble></nowadays>
> </BODY>
> </HTML>

The list of valid HTML tags is finite. You could try something like:

rawbody BOGUS_HTML_TAG    /<\/(?<!(list|of|valid|tags|...)>)[a-z]+>/i

Here is where additive scoring would be of benefit. I wouldn't want to
score high on a handful of bogus tags, but a dozen or more should score
fairly high.

N.B.: I'm still having trouble wrapping my brain around zero-length
assertions - does the above look right? 

--
John Hardin  KA7OHZ                           
Internal Systems Administrator/Guru               voice: (425) 672-1304
Apropos Retail Management Systems, Inc.             fax: (425) 672-0192
-----------------------------------------------------------------------
  Failure to plan ahead on someone else's part does not constitute an
  emergency on my part.
                                  - David W. Barts in a.s.r

Reply via email to