[Python-Dev] Draft PEP: Remove wstr from Unicode

2020-06-18 Thread Inada Naoki
PEP: 
Title: Remove wstr from Unicode
Author: Inada Naoki  
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Jun-2020
Python-Version: TBD

Abstract


PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``,
and ``Py_ssize_t wstr_length`` in unicode implementation for backward
compatibility of these deprecated APIs. [1]_

This PEP is planning removal of ``wstr``, and ``wstr_length`` with
deprecated APIs using these members.


Motivation
==

Memory usage


``str`` is one of the most used types in Python.  Even most simple ASCII
strings have a ``wstr`` member.  It consumes 8 bytes on 64bit systems.


Runtime overhead


To support legacy Unicode object created by
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
``PyUnicode_READY()`` check.

When we drop support of legacy unicode object, We can reduce this overhead
too.


Simplicity
--

Support of legacy Unicode object makes Unicode implementation complex.
Until we drop legacy Unicode object, it is very hard to try other Unicode
implementation like UTF-8 based implementation in PyPy.


Specification
=

Affected APIs
--

>From the Unicode implementation, ``wstr`` and ``wstr_length`` members are
removed.

Macros and functions to be removed:

* PyUnicode_GET_SIZE
* PyUnicode_GET_DATA_SIZE
* Py_UNICODE_WSTR_LENGTH
* PyUnicode_AS_UNICODE
* PyUnicode_AS_DATA
* PyUnicode_AsUnicode
* PyUnicode_AsUnicodeAndSize


Behaviors to be removed:

* PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where
  ``size > 0`` cause RuntimeError instead of creating legacy Unicode
  object. While this API is deprecated by PEP 393, this API will be kept
  when ``wstr`` is removed. This API will be removed later.

* PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode,
  ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError
  instead of creating legacy unicode object.

* PyArg_ParseTuple, PyArg_ParseTupleAndKeywords -- 'u', 'u#', 'Z', and
  'Z#' format will be removed.


Deprecation
---

All APIs to be removed should have compiler deprecation warning
(e.g. `Py_DEPRECATED(3.3)`) from Python 3.9. [2]_

All APIs to be changed should raise DeprecationWarning for behavior to be
removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation
warning and runtime DeprecationWarning. [3]_, [4]_.


Plan
-

All deprecations will be implemented in Python 3.10.
Some deprecations will be backported in Python 3.9.

Actual removal will happen in Python 3.12.


Alternative Ideas
=

Advanced Schedule
-

Backport warnings in 3.9, and do the removal in early development phase
in Python 3.11. If many third packages are broken by this change, we will
revert the change and back to the regular schedule.

Pros: There is a chance to remove ``wstr`` in Python 3.11. Even if we need
to revert it, third party maintainers can have more time to prepare the
removal and we can get feedback from the community early.

Cons: Adding warnings in beta period will make some confusion. Note that
we need to avoid the warning from CPython core and stdlib.


Use hashtable to store wstr
---

Store the ``wstr`` in a hashtable, instead of Unicode structure.

Pros: We can save memory usage even from Python 3.10. We can have
more longer timeline to remove the ``wstr``.

Cons: This implementation will increase the complexity of Unicode
implementation.


References
==
A collection of URLs used as references through the PEP.

.. [1] PEP 393 -- Flexible String Representation
   (https://www.python.org/dev/peps/pep-0393/)

.. [2] GH-20878 -- Add Py_DEPRECATED to deprecated unicode APIs
   (https://github.com/python/cpython/pull/20878)

.. [3] GH-20933 -- Raise DeprecationWarning when creating legacy Unicode
   (https://github.com/python/cpython/pull/20933)

.. [4] GH-20927 -- Raise DeprecationWarning for getargs with 'u', 'Z' #20927
   (https://github.com/python/cpython/pull/20927)

Copyright
=

This document has been placed in the public domain.


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Accepting PEP 618: zip(strict=True)

2020-06-18 Thread Ethan Furman

On 06/16/2020 04:07 PM, Guido van Rossum wrote:


I am hereby accepting PEP 618.


Congratulations, Brandt!

--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WNSD27K4LLZBPP7OBDPZLDVFS5UGPOQ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Accepting PEP 618: zip(strict=True)

2020-06-18 Thread Serhiy Storchaka

17.06.20 11:42, Victor Stinner пише:

zip(strict=True) should help to write more reliable code. Maybe it's
time to review stdlib code to check if some functions would deserve
the addition of strict=True? I had a look and found a few suspicious
usage of zip(). But I'm not sure if we want to make these functions
stricter.


I did have such plan:

1. Add the zip_equal() builtin and replace all calls of zip() with 
zip_equal().

2. Run tests and revert zip_equal() back to zip() until tests pass.
3. Manually review all remaining zip_equal() and left only these which 
are absolutely correct.

4. Replace zip_equal() with zip(strict=True).

It would be easier if add a new function instead of a new keyword 
argument to the existing function.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WLCNG5OHMU7DKG3FFGHPXZKPANINZHG7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Accepting PEP 618: zip(strict=True)

2020-06-18 Thread Eric Fahlgren
On Thu, Jun 18, 2020 at 8:06 AM Serhiy Storchaka 
wrote:

> It would be easier if add a new function instead of a new keyword
> argument to the existing function.
>

We've implemented the new zip in our sitecustomize.py, and think the
keyword makes it easier.  I've instructed our development staff to examine
all use of zip as they come across them and add either "strict=True" or
"strict=False" when they've determined which is appropriate.  Any zip calls
without an explicit "strict=" will be deemed "unknown" and requiring
further investigation.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EUREEQNPGPMIIZ4MUQJWUBLN2CZGF53Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Accepting PEP 618: zip(strict=True)

2020-06-18 Thread Guido van Rossum
On Thu, Jun 18, 2020 at 2:36 PM Eric Fahlgren 
wrote:

> We've implemented the new zip in our sitecustomize.py, and think the
> keyword makes it easier.  I've instructed our development staff to examine
> all use of zip as they come across them and add either "strict=True" or
> "strict=False" when they've determined which is appropriate.  Any zip calls
> without an explicit "strict=" will be deemed "unknown" and requiring
> further investigation.
>

That's actually a really nice validation of the choice to use a keyword --
none of the other options debated (which were all variations on "give the
alternate behavior a different name") would offer the opportunity to state
"I've thought about it and it's definitely okay that the iterables have
different lengths at this call site." Sure, in most places this would just
look redundant, but in large corporate code bases that's exactly the kind
of thing that people like to do.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XKMPPQLPICJFM6IZYQTABMX6G2QGHIWM/
Code of Conduct: http://python.org/psf/codeofconduct/