[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Eryk Sun


Eryk Sun  added the comment:

Python 3.7+ doesn't need to explicitly enable UTF-8 mode in this case on POSIX 
systems. If the locale encoding is the "POSIX" or "C" locale, and "C" locale 
coercion is not disabled via LC_ALL or PYTHONCOERCECLOCALE=0, the interpreter 
tries to coerce the LC_CTYPE locale to "C.UTF-8", "C.utf8", or "UTF-8". If 
these attempts fail, or if coercion is disabled, the interpreter will 
automatically enable UTF-8 mode, unless that's also explicitly disabled. For 
example:

$ unset LANG
$ unset LC_ALL
$ unset PYTHONCOERCECLOCALE
$ unset PYTHONUTF8 
$ python -c 'import locale; print(locale.getpreferredencoding())'
UTF-8

$ PYTHONCOERCECLOCALE=0 python -c 'import locale; 
print(locale.getpreferredencoding())'
UTF-8

$ PYTHONUTF8=0 python -c 'import locale; 
print(locale.getpreferredencoding())'
UTF-8

$ PYTHONCOERCECLOCALE=0 PYTHONUTF8=0 python -c 'import locale; 
print(locale.getpreferredencoding())'
ANSI_X3.4-1968

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Marc-Andre Lemburg


Change by Marc-Andre Lemburg :


--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Olivier Delhomme

Olivier Delhomme  added the comment:

>> Hi Marc-Andre,
>>
>> Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1":
>>
>> * Is external to the program and user dependent
>> * It does not seems to work on my use case:
>>
>>$ unset LANG
>>$ export PYTHONUTF8=1
>>$ python3
>>Python 3.6.4 (default, Jan 11 2018, 16:45:55)
>>[GCC 4.8.5] on linux
>>Type "help", "copyright", "credits" or "license" for more information.
>>>>> machaine='help me if you can'
>>   File "", line 0
>>  
>> ^
>> SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: 
>> ordinal not in range(128)
> 
> UTF-8 mode is only supported in Python 3.7 and later:
> 
> https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep540

Oh. Thanks.

$ unset LANG
$ export PYTHONUTF8=1
$ python3
Python 3.7.5 (default, Dec 24 2019, 08:52:13)
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> machaine='help me if you can'
 >>>

 From the code point of view:

$ unset LANG
$ unset PYTHONUTF8
$ python3
Python 3.7.5 (default, Dec 24 2019, 08:52:13)
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.environ['PYTHONUTF8'] = '1'
 >>> machaine='help me if you can'
 >>>

Even better:

$ unset LANG
$ unset PYTHONUTF8
$ python3
Python 3.7.5 (default, Dec 24 2019, 08:52:13)
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> machaine='help me if you can'
 >>>

Works as expected. Thank you very much. You can close this bug report.

Regards,

Olivier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

On 17.09.2021 15:45, Olivier Delhomme wrote:
> 
> Olivier Delhomme  added the comment:
> 
> Hi Marc-Andre,
> 
> Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1":
> 
> * Is external to the program and user dependent
> * It does not seems to work on my use case:
> 
>   $ unset LANG
>   $ export PYTHONUTF8=1
>   $ python3 
>   Python 3.6.4 (default, Jan 11 2018, 16:45:55) 
>   [GCC 4.8.5] on linux
>   Type "help", "copyright", "credits" or "license" for more information.
>   >>> machaine='help me if you can'
>  File "", line 0
> 
>^
>SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal 
> not in range(128)

UTF-8 mode is only supported in Python 3.7 and later:

   https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-pep540
-- 
Marc-Andre Lemburg
eGenix.com

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Olivier Delhomme

Olivier Delhomme  added the comment:

Hi Marc-Andre,

Please note that setting PYTHONUTF8 with "export PYTHONUTF8=1":

* Is external to the program and user dependent
* It does not seems to work on my use case:

  $ unset LANG
  $ export PYTHONUTF8=1
  $ python3 
  Python 3.6.4 (default, Jan 11 2018, 16:45:55) 
  [GCC 4.8.5] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> machaine='help me if you can'
 File "", line 0

   ^
   SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal 
not in range(128)


Regards,

Olivier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

Yes, this is intended. ASCII is used as fallback in case Python
cannot determine the I/O encoding to use during startup. This is
also the reason why later changes to the environment have no
affect on this - the determination of the encoding has already
been applied.

You can force UTF-8 by enabling the UTF-8 mode:

export PYTHONUTF8=1

This will then have Python use UTF-8 regardless of the LANG
env var setting.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45232] ascii codec is used by default when LANG is not set

2021-09-17 Thread Olivier Delhomme

New submission from Olivier Delhomme :

$ python3 --version
Python 3.6.4

Setting LANG to en_US.UTF8 works like a charm

$ export LANG=en_US.UTF8   
$ python3
Python 3.6.4 (default, Jan 11 2018, 16:45:55) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> machaine='help me if you can'   
>>> 
>>>  
>>> print('{}'.format(machaine))
>>> 
>>>  
help me if you can


Unsetting LANG shell variable fails the program:

$ unset LANG
$ python3
Python 3.6.4 (default, Jan 11 2018, 16:45:55) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> machaine='help me if you can'
  File "", line 0

^
SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not 
in range(128)


Setting LANG inside the program does not change this behavior:

$ unset LANG
$ python3
Python 3.6.4 (default, Jan 11 2018, 16:45:55) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['LANG'] = 'en_US.UTF8'
>>> machaine='help me if you can'
  File "", line 0

^
SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not 
in range(128)


Is this an expected behavior ? How can I force an utf8 codec ?

--
components: Interpreter Core
messages: 402046
nosy: od-cea
priority: normal
severity: normal
status: open
title: ascii codec is used by default when LANG is not set
type: behavior
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com