Ezio Melotti added the comment:

I would still do a benchmark, for these reasons:
1) IIRC rawdata might be the whole document (or at least everything that has 
not been parsed yet);
2) the '>' is very likely to be found;

This situation is fairly different from the one presented in #17170, where the 
strings are shorts and the character is not present in the majority of the 
strings.

Profiling and improving html.parser (and hence _markupbase) was already on my 
todo list (even if admittedly not anywhere near the top :), so writing a 
benchmark for it might be useful for further enhancements too.

(Note: HTMLParser is already fairly fast, parsing ~1.3MB/s according to 
http://www.crummy.com/2012/02/06/0, but I've never done anything to make it 
even faster, so there might still be room for improvements.)

----------
type: enhancement -> performance

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17183>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to