New issue 2997: ''.join(somestring) is buggy in presence of non-ascii characters https://bitbucket.org/pypy/pypy/issues/2997/join-somestring-is-buggy-in-presence-of
Antonio Cuni: The following snippet prints a weird result on the latest pypy3 (nightly): ``` #-*- encoding: utf-8 -*- def dump(s): print(" len():", len(s)) print(" repr():", repr(s)) print(" chars:", [ord(ch) for ch in s]) x = "a = 'à'" y = ''.join(x) print("x == y: ", x == y) print("x:") dump(x) print() print("y: ") dump(y) ``` ``` $ ./pypy3 foo.py x == y: True x: len(): 7 repr(): "a = 'à'" chars: [97, 32, 61, 32, 39, 224, 39] y: len(): 8 repr(): "a = 'à'" chars: [97, 32, 61, 32, 39, 224, 39, 208] `` Note that `x==y` even if they differ in length, and note that y has an extra char (208) which is not printed by repr(). 208 seems to be non-deterministic, so I suppose it is caused by an off-by-one error which causes someone to read past the string. This is the ultimate cause of the `\u0000` reported by issue #2983 _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue