On 26.04.2012 21:10, Vinay Sajip wrote: > Following recent changes in html.parser, the Python 3 port of Django I'm > working > on has started failing while parsing HTML. > > The reason appears to be that Django uses some module-level data in > html.parser, > for example tagfind, which is a regular expression pattern. This has changed > recently (Ezio changed it in ba4baaddac8d). > > Now tagfind (and other such patterns) are not marked as private (though not > documented), but should they be? The following script (tagfind.py): > > import html.parser as Parser > > data = '<select name="stuff">' > > m = Parser.tagfind.match(data, 1) > print('%r -> %r' % (Parser.tagfind.pattern, data[1:m.end()])) > > gives different results on 3.2 and 3.3: > > $ python3.2 tagfind.py > '[a-zA-Z][-.a-zA-Z0-9:_]*' -> 'select' > $ python3.3 tagfind.py > '([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\\s|/(?!>))*' -> 'select ' > > The trailing space later causes a mismatch with the end tag, and leads to the > errors. Django's use of the tagfind pattern is in a subclass of HTMLParser, in > an overridden parse_startag method. > > Do we need to indicate more strongly that data like tagfind are private? Or > has > the change introduced inadvertent breakage, requiring a fix in Python?
Since it's a module level constant without a leading underscore, IMO it was okay for Django to use it, even if not documented. In this case, especially since we actually have evidence of someone using the constant, I would keep it as-is and use a new (underscored, this time) name for the new pattern. And yes, I think that we do need to indicate private-ness of module-level data. Georg _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com