On 26. 08. 21 9:54, Marc-Andre Lemburg wrote:
On 26.08.2021 06:07, Christopher Barker wrote:
I'm working on a PR now. It seems there is little support for keeping the
python2 content in the docs, so I'm re-writing it as though it was never there.
If someone wants to add a note about Python 2, of course that can be added 
later.

Note that "moving the Python 2 content to a section at the end" is not all that
straightforward, as it is pretty mixed in with the text at this point.

But now a question -- the current text reads:

"Code in the core Python distribution should always use UTF-8"

and then:

"In the standard library, non-default encodings should be used only for
test purposes or when a comment or docstring needs to mention an author
name that contains non-ASCII characters ..."

I *think* that's a remnant of the Py2 ASCII encoding days -- but I wanted to
make sure, a bit later on, it says:

"The following policy is prescribed for the
standard library ... In addition, string literals and comments must also be in
ASCII."

For Python 2 code we mandated ASCII for the stdlib, with some exceptions
using the source code encoding for testing purposes or in case e.g.
Martin von Löwis or Marc-André Lemburg wanted to put his name into the code
without escaping part of it ;-)

Note that Python 2 defaults to ASCII as source code encoding.

With UTF-8 as standard source code encoding, this is no longer
necessary.

So the second quote can be changed to "In the standard library, non-default
source code encodings should be used only for test purposes ...".

Is that still correct for string literals and comments? And what about 
docstrings?

It seems to me that if we really are utf-8, then there is no need for those
"textual" elements to be ASCII. e.g they can still contain non-ascii characters,
and escaping those makes things less readable, not more.

So I think that section should now read:

"""
Source File Encoding
--------------------

Code in the core Python distribution should always use UTF-8, and should not
have an encoding declaration.

In the standard library, non-UTF-8 encodings should be used only for
test purposes.

I think the above should be limited to Python code. In C or other
source files you may well still need a source code encoding.

The following policy is prescribed for the standard library (see PEP
3131): All identifiers in the Python standard library MUST use
ASCII-only identifiers, and SHOULD use English words wherever feasible
(in many cases, abbreviations and technical terms are used which aren't
English). In comment and docstrings, authors whose names tht are not
based on the Latin alphabet (latin-1, ISO/IEC 8859-1 character set)
MUST provide a transliteration of their names in this character set.

Open source projects with a global audience are encouraged to adopt a
similar policy.
"""

But maybe we do want to keep comments, docstrings and literals as ASCII with
escapes?

No need for the stdlib, since UTF-8 is widely accepted by now
and why should people with non-ASCII names not be able to write
their true name ?

You may have noted that I rarely do... the reason is that in the
past, the accent on the "e" caused me too many problems. Perhaps
one of these days, I'll go back to adding it again :-)

I would drop the weirdly specific "(latin-1, ISO/IEC 8859-1 character set)" note, and only keep "based on the Latin alphabet". The Ł in Łukasz's name is not in latin-1, and I don't think it needs different treatment than German or French names. (As opposed to a Russian or Chinese name, where an an average English speaker isn't able to type an approximation of the name on their keyboard.)

- Peťa Viktorin

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E6B6INCC5IH5477XF5BGXPC3GPIEER5R/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to