On Thu, Apr 26, 2012 at 12:10 PM, Vinay Sajip <vinay_sa...@yahoo.co.uk> wrote: > Following recent changes in html.parser, the Python 3 port of Django I'm > working > on has started failing while parsing HTML. > > The reason appears to be that Django uses some module-level data in > html.parser, > for example tagfind, which is a regular expression pattern. This has changed > recently (Ezio changed it in ba4baaddac8d). > > Now tagfind (and other such patterns) are not marked as private (though not > documented), but should they be? The following script (tagfind.py): > > import html.parser as Parser > > data = '<select name="stuff">' > > m = Parser.tagfind.match(data, 1) > print('%r -> %r' % (Parser.tagfind.pattern, data[1:m.end()])) > > gives different results on 3.2 and 3.3: > > $ python3.2 tagfind.py > '[a-zA-Z][-.a-zA-Z0-9:_]*' -> 'select' > $ python3.3 tagfind.py > '([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\\s|/(?!>))*' -> 'select ' > > The trailing space later causes a mismatch with the end tag, and leads to the > errors. Django's use of the tagfind pattern is in a subclass of HTMLParser, in > an overridden parse_startag method. > > Do we need to indicate more strongly that data like tagfind are private? Or > has > the change introduced inadvertent breakage, requiring a fix in Python?
I think both. Looks like it wasn't meant to be exported. But it should have been marked as such. And I think it would behoove us to reduce random failures in important 3rd party libraries by keeping the old version around (but mark it as deprecated with an explaining comment, and submit a Django fix to stop using it). Also the module should be updated to use _tagfind internally (and likewise for other accidental exports). Traditionally we've been really lax about this stuff. We should strive to improve and clarify the exact boundaries of our APIs better. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com