Hi, In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's not surprising that minor details like singleton differ. You should not use "is" to compare strings in Python, or your program will fail on other Python implementations (like PyPy, IronPython, or Jython) or even on a different CPython version.
Anyway, you spotted a missed optimization: it's now "fixed" in Python 3.3 and 3.4 by the following commits. Copy/paste of the CIA IRC bot: 19:30 < irker555> cpython: Victor Stinner 3.3 * 82517:3dd2fa78fb89 / Objects/unicodeobject.c: _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character http://hg.python.org/cpython/rev/3dd2fa78fb89 19:30 < irker032> cpython: Victor Stinner default * 82518:fa59a85b373f / Objects/unicodeobject.c: (Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character http://hg.python.org/cpython/rev/fa59a85b373f Victor 2013/3/6 Amaury Forgeot d'Arc <amaur...@gmail.com>: >> So, in the end, I have went the long way and bisected cpython to >> find the commit which broke my tests, and it seems that the >> culprit is http://hg.python.org/cpython/rev/123f2dc08b3e so it is >> clearly something Unicode related. >> >> Unfortunately, it really doesn't tell me what exactly is broken >> (is it a known regression) and if there is known workaround. >> Could anybody suggest a way how to find bugs on >> http://bugs.python.org related to some particular commit (plain >> search for 123f2dc0 didn’t find anything). > > > I strongly suspect an incorrect usage of the "is" operator: > https://github.com/mcepl/html2text/blob/master/html2text.py#L95 > Identity of strings is not guaranteed... > > Does it change something if you use "==" instead? > > -- > Amaury Forgeot d'Arc _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com