[pypy-dev] Please help to solve the problem with Cyrillic UTF-8

Максим Кужильный via pypy-dev Thu, 25 Aug 2022 19:48:21 -0700

Hello!
Please help to solve the problem with Cyrillic UTF-8!
Windows 10
 
Python 3.9.10 (b332b321bbaa72bffb0207da5b7fe4c38047d3b2, Mar 16 2022, 16:03:21)
[PyPy 7.3.9 with MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>> print ("АБВвба")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 
8: surrogates not allowed
 
Visual Studio Code + PyPy 7.3.9
print ("АБВвба") 
╨Р╨С╨Т╨▓╨▒╨░  != Normal output
 
Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print ("АБВвба")
АБВвба == Normal output
 
Because of this behavior, all actions on strings containing Cyrillic are 
incorrect.
Is it possible to solve this problem?
I tried using "setlocale" by analogy with C, assuming that the code is 
translated to C, but this does not work in PyPy.
from locale import setlocale ,  LC_ALL
setlocale ( LC_ALL , "ru_RU.UTF-8" )
 
Maybe it is necessary to add a localization check in the PyPy sources.
#include <stdio.h>
#include <conio.h>
#include <locale.h>
int main ()
{
     setlocale ( LC_ALL ,  "ru_RU.UTF-8" );     
     printf ( "АБВвба" );
     _getch ();
}
I can't do it myself since I just started learning C, I don't have enough 
knowledge for this, but I really want to learn)
Interestingly, when compiling to GCC (MinGW or MinGW64), Cyrillic support does 
not work, and Cyrillic output to C is not correct, but if you compile Clang 
(LLVM), everything works correctly.
 
I am grateful in advance for any answer!
Thanks!
With great respect!
Max.

_______________________________________________
pypy-dev mailing list -- pypy-dev@python.org
To unsubscribe send an email to pypy-dev-le...@python.org
https://mail.python.org/mailman3/lists/pypy-dev.python.org/
Member address: arch...@mail-archive.com

[pypy-dev] Please help to solve the problem with Cyrillic UTF-8

Reply via email to