On Thu, Apr 26, 2012 at 12:10 PM, Vinay Sajip <vinay_sa...@yahoo.co.uk> wrote:
> Following recent changes in html.parser, the Python 3 port of Django I'm 
> working
> on has started failing while parsing HTML.
>
> The reason appears to be that Django uses some module-level data in 
> html.parser,
> for example tagfind, which is a regular expression pattern. This has changed
> recently (Ezio changed it in ba4baaddac8d).
>
> Now tagfind (and other such patterns) are not marked as private (though not
> documented), but should they be? The following script (tagfind.py):
>
>    import html.parser as Parser
>
>    data = '<select name="stuff">'
>
>    m = Parser.tagfind.match(data, 1)
>    print('%r -> %r' % (Parser.tagfind.pattern, data[1:m.end()]))
>
> gives different results on 3.2 and 3.3:
>
>    $ python3.2 tagfind.py
>    '[a-zA-Z][-.a-zA-Z0-9:_]*' -> 'select'
>    $ python3.3 tagfind.py
>    '([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\\s|/(?!>))*' -> 'select '
>
> The trailing space later causes a mismatch with the end tag, and leads to the
> errors. Django's use of the tagfind pattern is in a subclass of HTMLParser, in
> an overridden parse_startag method.
>
> Do we need to indicate more strongly that data like tagfind are private? Or 
> has
> the change introduced inadvertent breakage, requiring a fix in Python?

I think both. Looks like it wasn't meant to be exported. But it should
have been marked as such. And I think it would behoove us to reduce
random failures in important 3rd party libraries by keeping the old
version around (but mark it as deprecated with an explaining comment,
and submit a Django fix to stop using it).

Also the module should be updated to use _tagfind internally (and
likewise for other accidental exports).

Traditionally we've been really lax about this stuff. We should strive
to improve and clarify the exact boundaries of our APIs better.

-- 
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to