Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
On Fri, Nov 11, 2022 at 8:16 PM Eryk Sun wrote: > If sys.std* are console files, then in Python 3.6+, sys.std*.buffer.raw will > be _io._WindowsConsoleIO > io.TextIOWrapper uses locale.getpreferredencoding(False) as the default > encoding Thank you for your replies - checking the sys.stdout.buffer.raw value is what finally helped me understand. Turns out, the Windows agent is redirecting the output of all python commands to a file, so sys.stdout is a file using the locale encoding of cp1252, instead of being a stream using encoding utf8. I wrote up a gist with my findings to hopefully help out some other poor soul in the future: https://gist.github.com/NodeJSmith/e7e37f2d3f162456869f015f842bcf15 -- https://mail.python.org/mailman/listinfo/python-list
Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
On Sat, Nov 12, 2022 at 11:53 AM Inada Naoki wrote: > > On Sat, Nov 12, 2022 at 10:21 AM 12Jessicasmith34 > <12jessicasmit...@gmail.com> wrote: > > > > > > Two questions: any idea why this would be happening in this situation? > > AFAIK, stdout *is* a console when these images are running the python > > process. Second - is there a way I can check the locale and code page > > values that you mentioned? I assume I could call GetACP using ctypes, but > > maybe there is a simpler way? > > > > Maybe, python doesn't write to console in this case. > > python -(pipe)-> PowerShell -> Console > > In this case, python uses ACP for writing to pipe. > And PowerShell uses OutputEncoding for reading from pipe. > > If you want to use UTF-8 on PowerShell in Windows, > > * Set PYTHONUTF8=1 (Python uses UTF-8 for writing into pipe). > * Set `$OutputEncoding = > [System.Text.Encoding]::GetEncoding('utf-8')` in PowerShell profile. > I forgot [Console]::OutputEncoding. This is what PowerShell uses when reading from pipe. So PowerShell profile should be: $OutputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
On Sat, Nov 12, 2022 at 10:21 AM 12Jessicasmith34 <12jessicasmit...@gmail.com> wrote: > > > Two questions: any idea why this would be happening in this situation? AFAIK, > stdout *is* a console when these images are running the python process. > Second - is there a way I can check the locale and code page values that you > mentioned? I assume I could call GetACP using ctypes, but maybe there is a > simpler way? > Maybe, python doesn't write to console in this case. python -(pipe)-> PowerShell -> Console In this case, python uses ACP for writing to pipe. And PowerShell uses OutputEncoding for reading from pipe. If you want to use UTF-8 on PowerShell in Windows, * Set PYTHONUTF8=1 (Python uses UTF-8 for writing into pipe). * Set `$OutputEncoding = [System.Text.Encoding]::GetEncoding('utf-8')` in PowerShell profile. Regards, -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
On 11/11/22, 12Jessicasmith34 <12jessicasmit...@gmail.com> wrote: > > any idea why this would be happening in this situation? AFAIK, stdout > *is* a console when these images are running the python process. If sys.std* are console files, then in Python 3.6+, sys.std*.buffer.raw will be _io._WindowsConsoleIO. The latter presents itself to Python code as a UTF-8 file stream, but internally it uses UTF-16LE with the wide-character API functions ReadConsoleW() and WriteConsoleW(). > is there a way I can check the locale and code page values that you > mentioned? I assume I could call GetACP using ctypes, but maybe > there is a simpler way? io.TextIOWrapper uses locale.getpreferredencoding(False) as the default encoding. Actually, in 3.11+ it uses locale.getencoding() unless UTF-8 mode is enabled, which is effectively the same as locale.getpreferredencoding(False). On Windows this calls GetACP() and formats the result as "cp%u" (e.g. "cp1252"). -- https://mail.python.org/mailman/listinfo/python-list
Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
> If stdout isn't a console (e.g. a pipe), it defaults to using the process code page (i.e. CP_ACP), such as legacy code page 1252 (extended Latin-1). First off, really helpful information, thank you. That was the exact background I was missing. Two questions: any idea why this would be happening in this situation? AFAIK, stdout *is* a console when these images are running the python process. Second - is there a way I can check the locale and code page values that you mentioned? I assume I could call GetACP using ctypes, but maybe there is a simpler way? -- https://mail.python.org/mailman/listinfo/python-list
Re: Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
On 11/10/22, Jessica Smith <12jessicasmit...@gmail.com> wrote: > > Weird issue I've found on Windows images in Azure Devops Pipelines and > Github actions. Printing Unicode characters fails on these images because, > for some reason, the encoding is mapped to cp1252. What is particularly > weird about the code page being set to 1252 is that if you execute "chcp" > it shows that the code page is 65001. If stdout isn't a console (e.g. a pipe), it defaults to using the process code page (i.e. CP_ACP), such as legacy code page 1252 (extended Latin-1). You can override just sys.std* to UTF-8 by setting the environment variable `PYTHONIOENCODING=UTF-8`. You can override all I/O to use UTF-8 by setting `PYTHONUTF8=1`, or by passing the command-line option `-X utf8`. Background The locale system in Windows supports a common system locale, plus a separate locale for each user. By default the process code page is based on the system locale, and the thread code page (i.e. CP_THREAD_ACP) is based on the user locale. The default locale of the Universal C runtime combines the user locale with the process code page. (This combination may be inconsistent.) In Windows 10 and later, the default process and thread code pages can be configured to use CP_UTF8 (65001). Applications can also override them to UTF-8 in their manifest via the "ActiveCodePage" setting. In either case, if the process code page is UTF-8, the C runtime will use UTF-8 for its default locale encoding (e.g. "en_uk.utf8"). Unlike some frameworks, Python has never used the console input code page or output code page as a locale encoding. Personally, I wouldn't want Python to default to that old MS-DOS behavior. However, I'd be in favor of supporting a "console" encoding that's based on the console input code page that's returned by GetConsoleCP(). If the process doesn't have a console session, the "console" encoding would fall back on the process code page from GetACP(). -- https://mail.python.org/mailman/listinfo/python-list
Strange UnicodeEncodeError in Windows image on Azure DevOps and Github
Hello, Weird issue I've found on Windows images in Azure Devops Pipelines and Github actions. Printing Unicode characters fails on these images because, for some reason, the encoding is mapped to cp1252. What is particularly weird about the code page being set to 1252 is that if you execute "chcp" it shows that the code page is 65001. At the end of this email are the cleaned up logs from GH actions. The actions are very simple - print out unicode characters using echo to prove the characters can be printed to the console. The rest of the commands are in Python, and they include printing out the "encoding" variable of sys.stdout, as well as printing sys.flags and sys.getfilesystemencoding. Then print the same unicode character using print, which causes a UnicodEncodeError because the character isn't in the cp1252 charmap. I've also uploaded the logs to pastebin here: https://pastebin.com/ExzGRHav I also uploaded a screenshot to imgur, since the logs are not the easiest to read. https://imgur.com/a/dhvLWOJ I'm trying to determine why this issue only happens on these images - I can replicate it on multiple versions of Python (from 3.9 to 3.7 at least, haven't tried more), but I can't replicate this on my own machines. There are a few issues on GH regarding this issue but they seem to stay open since they are hard to replicate. Here are the ones I have stumbled upon while researching this. https://github.com/databrickslabs/dbx/issues/455 https://github.com/PrefectHQ/prefect/issues/5754 https://github.com/pallets/click/issues/2121 Any insight or ideas on how to test and validate the cause would be great. I'm pulling my hair out trying to find the root cause of this - not because it really matters to any of my processes but because it is weird and broken. Thanks for any help, Jessica Begin Logs: 2022-11-10T23:54:51.7272453Z Requested labels: windows-latest 2022-11-10T23:54:51.7272494Z Job defined at: NodeJSmith/wsl_home/.github/workflows/blank.yml@refs/heads/main 2022-11-10T23:54:51.7272514Z Waiting for a runner to pick up this job... 2022-11-10T23:54:52.3387510Z Job is waiting for a hosted runner to come online. 2022-11-10T23:55:04.8574435Z Job is about to start running on the hosted runner: Hosted Agent (hosted) 2022-11-10T23:55:15.8332600Z Current runner version: '2.298.2' 2022-11-10T23:55:15.8366947Z ##[group]Operating System 2022-11-10T23:55:15.8367650Z Microsoft Windows Server 2022 2022-11-10T23:55:15.8367954Z 10.0.20348 2022-11-10T23:55:15.8368389Z Datacenter 2022-11-10T23:55:15.8368696Z ##[endgroup] 2022-11-10T23:55:15.8369023Z ##[group]Runner Image 2022-11-10T23:55:15.8369654Z Image: windows-2022 2022-11-10T23:55:15.8369931Z Version: 20221027.1 2022-11-10T23:55:15.8370539Z Included Software: https://github.com/actions/runner-images/blob/win22/20221027.1/images/win/Windows2022-Readme.md 2022-11-10T23:55:15.8371174Z Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20221027.1 2022-11-10T23:55:15.8371622Z ##[endgroup] 2022-11-10T23:55:15.8371955Z ##[group]Runner Image Provisioner 2022-11-10T23:55:15.8372277Z 2.0.91.1 2022-11-10T23:55:15.8372514Z ##[endgroup] 2022-11-10T23:55:16.3619998Z ##[group]Run echo " └── ID:" 2022-11-10T23:55:16.3620626Z echo " └── ID:" 2022-11-10T23:55:16.3927292Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'" 2022-11-10T23:55:16.3927894Z ##[endgroup] 2022-11-10T23:55:32.9958751Z └── ID: 2022-11-10T23:55:34.0835652Z ##[group]Run chcp 2022-11-10T23:55:34.0836104Z chcp 2022-11-10T23:55:34.0878901Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'" 2022-11-10T23:55:34.0879350Z ##[endgroup] 2022-11-10T23:55:34.4878247Z Active code page: 65001 2022-11-10T23:55:34.7917219Z ##[group]Run python -c "import sys; print('sys.stdout.encoding', sys.stdout.encoding); print('sys.flags',sys.flags);print('sys.getfilesystemencoding',sys.getfilesystemencoding())" 2022-11-10T23:55:34.7918148Z python -c "import sys; print('sys.stdout.encoding', sys.stdout.encoding); print('sys.flags',sys.flags);print('sys.getfilesystemencoding',sys.getfilesystemencoding())" 2022-11-10T23:55:34.7960873Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'" 2022-11-10T23:55:34.7961202Z ##[endgroup] 2022-11-10T23:55:36.2324642Z sys.stdout.encoding cp1252 2022-11-10T23:55:36.2325910Z sys.flags sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0, dev_mode=False, utf8_mode=0) 2022-11-10T23:55:36.2327055Z sys.getfilesystemencoding utf-8 2022-11-10T23:55:36.4553957Z ##[group]Run python -c "print('└── ID:')" 2022-11-10T23:55:36.4554395Z python -c "print('└── ID:')" 2022-11-10T23:55:36.4595413Z shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'" 2022-11-10T23:55:36.4595740Z ##[endgroup] 2022-11-10T23:55:36.8739309Z Traceback (most recent call last): 2022-11-10T23:55:37.1316425Z File