On Thu, 2010-06-24 at 09:01 -0700, Tab Atkins Jr. wrote: > On Thu, Jun 24, 2010 at 8:20 AM, Benjamin M. Schwartz > <bmsch...@fas.harvard.edu> wrote: > > On 06/24/2010 11:04 AM, Kornel Lesinski wrote: > >> If you mean "parsing" with regular expressions, then I think that's a bad > >> practice and shouldn't be encouraged. > > > > Worldwide, regarding HTML, I'm sure there is 100 times more regular > > expression processing code than full-on lexing code. Most code that > > processes HTML is embedded in scripts, doing some small special-purpose > > operation. Those regular expressions aren't going away. Helping them > > break less is a noble cause. > > Actually, if we could make regex-based "parsing" break more, it would > probably be a net positive for the world. Regexes are the source of > so many holes in "validation"-type scripts. > > In any case, XML doesn't require > to be escaped in attribute values, > and HTML doesn't appear to either. In practice, > is used in > attribute values, so declaring it verboten wouldn't be helpful. > > ~TJ
Just to point out, regex's aren't the problem, and people who are blaming the issue on regular expressions are as bad as the people writing the dodgy regex's. The problem is just badly written expressions, not the tool itself. The same arguments are put forward by people when regular expressions are suggested as a means to validate email addresses. It's possible to do, but some people who write them don't really think about the problem. [/end rant] Thanks, Ash http://www.ashleysheridan.co.uk