Re: a simple unicode question

Gabriel Genellina Thu, 22 Oct 2009 14:02:22 -0700

En Thu, 22 Oct 2009 17:08:21 -0300, <[email protected]> escribió:

On 10/22/2009 03:23 AM, Gabriel Genellina wrote:

En Wed, 21 Oct 2009 15:14:32 -0300, <[email protected]> escribió:

On Oct 21, 4:59 am, Bruno Desthuilliers <bruno.
[email protected]> wrote:

beSTEfar a écrit :
(snip)
 > When parsing strings, use Regular Expressions.

And now you have _two_ problems <g>

For some simple parsing problems, Python's string methods are powerful
enough to make REs overkill. And for any complex enough parsing (any
recursive construct for example - think XML, HTML, any programming
language etc), REs are just NOT enough by themselves - you need a full
blown parser.


But keep in mind that many XML, HTML, etc parsing problems
are restricted to a subset where you know the nesting depth
is limited (often to 0 or 1), and for that large set of
problems, RE's *are* enough.


I don't think so. Nesting isn't the only problem. RE's cannot handle

comments, by example. And you must support unquoted attributes, singleanddouble quotes, any attribute ordering, empty tags, arbitrarywhitespace...

If you don't, you are not reading XML (or HTML), only a specific file
format that resembles XML but actually isn't.


OK, then let me rephrase my point as: in the real world it is often
not necessary to parse XML in it's full generality; parsing, as you
put it, "a specific file format that resembles XML" is all that is
really needed.

Given that using a real XML parser like ElementTree is as easy as (or eveneasier than) building a regular expression, and more robust, and morelikely to survive small changes in the input format, why use the worsesolution?

RE's are good in solving some problems, but parsing XML isn't one of those.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: a simple unicode question

Reply via email to