[issue42593] Consistency in unicode string multiplication with an integer

2020-12-07 Thread syl-nktaylor
syl-nktaylor added the comment: The build did seem to run, despite memset using fillchar without the explicit casting, so I assumed it did an implicit casting, but the original casting can be kept of course. With this build, my sample tests for 1-byte, 2-byte and 4-byte chars also ran ok, so

[issue42593] Consistency in unicode string multiplication with an integer

2020-12-07 Thread STINNER Victor
STINNER Victor added the comment: +Py_UCS4 fill_char = PyUnicode_READ(char_size, PyUnicode_DATA(str), 0); +memset(to, fill_char, len); The second parameter of memset() is a byte (8-bit "octet"). You cannot pass Py_UCS4 to memset(), it doesn't work. -- ___

[issue42593] Consistency in unicode string multiplication with an integer

2020-12-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: BTW, CPython does not use UTF-8 and UTF-16 encoding in internal representation of strings. It uses Latin1, UCS2 and UCS4 (UTF-32). What benchmarks show? Is your code always faster and how much? If it is slower for some data, for what data and how much? --

[issue42593] Consistency in unicode string multiplication with an integer

2020-12-07 Thread syl-nktaylor
New submission from syl-nktaylor : In https://github.com/python/cpython/blob/master/Objects/unicodeobject.c#L12930, unicode_repeat does string multiplication with an integer in 3 different ways: 1) one memset call, for utf-8 when string size is 1 2) linear 'for' loops, for utf-16 and utf-32 wh