On Thu, Jun 24, 2010 at 8:20 AM, Benjamin M. Schwartz <bmsch...@fas.harvard.edu> wrote: > On 06/24/2010 11:04 AM, Kornel Lesinski wrote: >> If you mean "parsing" with regular expressions, then I think that's a bad >> practice and shouldn't be encouraged. > > Worldwide, regarding HTML, I'm sure there is 100 times more regular > expression processing code than full-on lexing code. Most code that > processes HTML is embedded in scripts, doing some small special-purpose > operation. Those regular expressions aren't going away. Helping them > break less is a noble cause.
Actually, if we could make regex-based "parsing" break more, it would probably be a net positive for the world. Regexes are the source of so many holes in "validation"-type scripts. In any case, XML doesn't require > to be escaped in attribute values, and HTML doesn't appear to either. In practice, > is used in attribute values, so declaring it verboten wouldn't be helpful. ~TJ