[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yes, I have came to the same as random832. String objects have "fast path" for concatenating, and in this path cached UTF8 representation is not cleaned. Pickle is one of simplest ways to reproduce this issue. May be it can be reproduced with compile() or ty

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread random832
random832 added the comment: I can't reproduce without pickle. I did some further digging, though, and it *looks like*... 1. Pickle causes the built-in UTF-8 representation of a string to be populated, whereas encode('utf-8') does not. Can anyone think of any other operations that do this? 2.

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Eryk Sun
Eryk Sun added the comment: unicode_modifiable in Objects/unicodeobject.c should return 0 if there's cached PyUnicode_UTF8 data. In this case PyUnicode_Append won't operate in place but instead concatenate a new string. -- nosy: +eryksun ___ Python

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread random832
random832 added the comment: I've looked at the raw bytes [through a ctypes pointer to id(s)] of a string affected by the issue, and decoded enough to be able to tell that the bad string has an incorrect UTF-8 length and data, which pickle presumably relies on. HEADlength..hash...

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is reproducer without IDLE. Looks as pickle is a culprit. >>> import pickle >>> s = '' >>> for i in range(5): ... s += chr(0xe0) ... print(len(s), s, s.encode(), repr(s)) ... print(' ', pickle.dumps(s)) ... 1 à b'\xc3\xa0' 'à'

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Confirmed on IDLE. >>> s = '' >>> for i in range(5): s += '\xe0' print(s) à àà àà àà àà >>> s = '' >>> for i in range(5): s += chr(0xe0) print(s) à àà àà àà àà >>> s = '' >>> for i in range(5): s += '

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Steven D'Aprano
Steven D'Aprano added the comment: I'm afraid I'm unable to replicate this bug report in Python 3.4. If you are able to replicate it, can you tell us the exact version number of Python you are using? Also, which operating system are you using? -- nosy: +steven.daprano

[issue25709] greek alphabet bug it is very disturbing...

2015-11-23 Thread Árpád Kósa
New submission from Árpád Kósa: One of my students found this bug. For ascii characters it works as you expect, but for greek alphabet it works unexpectedly. The program works correctly for Python 3.2.x but for 3.4.x and 3.5 it gives erroneous result. -- files: greekbug.py messages: 25