Hello!
Please help to solve the problem with Cyrillic UTF-8!
Windows 10
Python 3.9.10 (b332b321bbaa72bffb0207da5b7fe4c38047d3b2, Mar 16 2022, 16:03:21)
[PyPy 7.3.9 with MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>> print ("АБВвба")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position
8: surrogates not allowed
Visual Studio Code + PyPy 7.3.9
print ("АБВвба")
╨Р╨С╨Т╨▓╨▒╨░ != Normal output
Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print ("АБВвба")
АБВвба == Normal output
Because of this behavior, all actions on strings containing Cyrillic are
incorrect.
Is it possible to solve this problem?
I tried using "setlocale" by analogy with C, assuming that the code is
translated to C, but this does not work in PyPy.
from locale import setlocale , LC_ALL
setlocale ( LC_ALL , "ru_RU.UTF-8" )
Maybe it is necessary to add a localization check in the PyPy sources.
#include <stdio.h>
#include <conio.h>
#include <locale.h>
int main ()
{
setlocale ( LC_ALL , "ru_RU.UTF-8" );
printf ( "АБВвба" );
_getch ();
}
I can't do it myself since I just started learning C, I don't have enough
knowledge for this, but I really want to learn)
Interestingly, when compiling to GCC (MinGW or MinGW64), Cyrillic support does
not work, and Cyrillic output to C is not correct, but if you compile Clang
(LLVM), everything works correctly.
I am grateful in advance for any answer!
Thanks!
With great respect!
Max.
_______________________________________________
pypy-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/pypy-dev.python.org/
Member address: [email protected]