New issue 2441: Unicode surrogate codepoints in string literals combined when using unittest module https://bitbucket.org/pypy/pypy/issues/2441/unicode-surrogate-codepoints-in-string
byllyfish: When running unit tests under PyPy3, I sometimes see unicode literals containing surrogate pairs combined into a single non-BMP character. Here is `test_surrogate.py`: ```python import unittest class TestSurrogate(unittest.TestCase): def test_surrogate(self): s = '\ud800\udc00' if len(s) != 2: raise ValueError(s.encode('raw-unicode-escape')) ``` The first time I run it, it works fine. ```bash $ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate . ---------------------------------------------------------------------- Ran 1 test in 0.000s OK ``` When I run the test a second time, it fails. The surrogate pair has been replaced. ```bash $ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate E ====================================================================== ERROR: test_surrogate (test_surrogate.TestSurrogate) ---------------------------------------------------------------------- Traceback (most recent call last): File "./test_surrogate.py", line 7, in test_surrogate raise ValueError(s.encode('raw-unicode-escape')) ValueError: b'\\U00010000' ---------------------------------------------------------------------- Ran 1 test in 0.010s FAILED (errors=1) ``` The failures continue until I touch the file. After that, the test will succeed the first time, then fail subsequently. If I touch the file and run pypy3 with -B (don't write .py[co] files on import), all the test runs succeed. N.B. The problem does NOT occur under normal conditions. I've only seen it using unittest. ``` # This small program always works fine! s = '\ud800\udc00' if len(s) != 2: raise ValueError(s.encode('raw-unicode-escape')) ``` I am running on Mac OS X 10.11.6. Please let me know if you can reproduce this. ``` $ ~/pypy3-v5.5.0-osx64/bin/pypy3 --version Python 3.3.5 (619c0d5af0e5, Oct 08 2016, 22:08:19) [PyPy 5.5.0-alpha0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] ``` _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue