[Python-Dev] PEP 597: Add optional EncodingWarning

2021-01-31 Thread Inada Naoki
Hi, all.

I updated the PEP 597 yesterday.
Please review it to move it forward.

PEP: https://www.python.org/dev/peps/pep-0597/
Previous thread: https://discuss.python.org/t/3880

Main difference from the previous version:

* Added new warning category; EncodingWarning
* Added dedicated option to enable the warning instead of using dev mode.


Abstract


Add a new warning category ``EncodingWarning``. It is emitted when
``encoding`` option is omitted and the default encoding is a locale
encoding.

The warning is disabled by default. New ``-X warn_encoding``
command-line option and ``PYTHONWARNENCODING`` environment variable
are used to enable the warnings.


Motivation
==

Using the default encoding is a common mistake
--

Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.

For example, ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install
the package if there is at least one non-ASCII character (e.g. emoji)
in the ``README.md`` file which is encoded in UTF-8.

For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1_] They used the default encoding to read README or TOML
file.

Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]

Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3_] and [4_].

Emitting a warning when the ``encoding`` option is omitted will help
to find such mistakes.


Prepare to change the default encoding to UTF-8
---

We had chosen to use locale encoding for the default text encoding in
Python 3.0. But UTF-8 has been adopted very widely since then.

We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be emitted if we start emitting the
warning by default. It will be too noisy.

Although this PEP doesn't propose to change the default encoding,
this PEP will help to reduce the warning in the future if we decide
to change the default encoding.


Specification
=

``EncodingWarning``


Add new ``EncodingWarning`` warning class which is a subclass of
``Warning``. It is used to warn when the ``encoding`` option is
omitted and the default encoding is locale-specific.


Options to enable the warning
--

``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
environment variable are added. They are used to enable the
``EncodingWarning``.

``sys.flags.encoding_warning`` is also added. The flag represents
``EncodingWarning`` is enabled.

When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
other modules using them will emit ``EncodingWarning`` when
``encoding`` is omitted.


``encoding="locale"`` option


``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.

Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
be used to avoid confusing ``LookupError: unknown encoding: locale``
error when the code is run in old Python accidentally.

The constant can be used to test that ``encoding="locale"`` option is
supported too. For example,

.. code-block::

   # Want to suppress an EncodingWarning but still need support
   # old Python versions.
   locale_encoding = getattr(io, "LOCALE_ENCODING", None)
   with open(filename, encoding=locale_encoding) as f:
   ...


``io.text_encoding()``
---

``io.text_encoding()`` is a helper function for functions having
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
``open()``.

Pure Python implementation will be like this::

   def text_encoding(encoding, stacklevel=1):
   """Helper function to choose the text encoding.

   When *encoding* is not None, just return it.
   Otherwise, return the default text encoding (i.e., "locale").

   This function emits EncodingWarning if *encoding* is None and
   sys.flags.encoding_warning is true.

   This function can be used in APIs having encoding=None option
   and pass it to TextIOWrapper or open.
   But please consider using encoding="utf-8" for new APIs.
   """
   if encoding is None:
   if sys.flags.encoding_warning:
   import warnings
   warnings.warn("'encoding' option is omitted",
EncodingWarning, stacklevel + 2)
   encoding = LOCALE_ENCODING
   return encoding

For example, 

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-01-21 Thread Inada Naoki
Hi, Lemburg.

I want to send the PEP to SC.
I think I wrote all your points in the PEP. Would you review it?

Regards,

On Tue, Aug 4, 2020 at 5:04 PM Inada Naoki  wrote:
>
> On Tue, Aug 4, 2020 at 3:31 PM M.-A. Lemburg  wrote:
> >
> > Hi Inada-san,
> >
> > thanks for attending EuroPython. I won't be back online until
> > next Wednesday. Would it be possible to wait until then to continue
> > the discussion ?
> >
>
> Of course. The PEP is for Python 3.11. We have a lot of time.
>
> Bests,

--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E2KSOHWSI5H2YAUP7LLLRUABBYAH64BW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Deferred Evaluation Of Annotations Using Descriptors

2021-01-18 Thread Inada Naoki
On Tue, Jan 19, 2021 at 8:54 AM Larry Hastings  wrote:
>
> On 1/18/21 3:42 PM, Inada Naoki wrote:
>
> Many type hinting use cases don't need type objects in runtime.
> So I think PEP 563 is better for type hinting user experience.
>
> You mean, in situations where the user doesn't want to import the types, 
> because of heavyweight imports or circular imports?  I didn't think those 
> were very common.
>

Personally, I dislike any runtime overhead caused by type hints. That
is one reason I don't use type hinting much for now.
I don't want to import modules used only in type hints. I don't want
to import even "typing".

I planned to use type hinting after I can drop Python 3.6 support and
use `from __future__ import annotations`.
And I love lightweight function annotation implementation (*) very much.

(*) https://github.com/python/cpython/pull/23316

I expect we can start to write type hints even in stdlibs, because it
doesn't require extra imports and overhead become very cheap.
Maybe, I am a minority. But I dislike any runtime overhead and extra
dependencies.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WLGZULJRK7PLQ37HJDJZPIZL5SM3NGF2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Deferred Evaluation Of Annotations Using Descriptors

2021-01-18 Thread Inada Naoki
On Tue, Jan 19, 2021 at 6:02 AM Larry Hastings  wrote:
>
>
> Oh, okay.  I haven't used the static type checkers, so it's not clear to me 
> what powers they do and don't have.  It was only a minor suggestion anyway.  
> Perhaps PEP 649 will be slightly inconvenient to people exploring their code 
> inside IPython.
>

Not only IPython, but many REPLs. Especially, Jupyter notebook is the
same to IPython.
We can see string annotations even in CPython REPL via pydoc.

```
>>> def func(a: "Optional[int]") -> "Optional[str]":
... ...
...
>>> help(func)

func(a: 'Optional[int]') -> 'Optional[str]'
```

Since this signature with type hints came from
inspect.signature(func), all tools using inspect.signature() will be
affected too.
I think Sphinx autodoc will be affected, but I am not sure.


> Or maybe it'd work if they gated the if statement on running in ipython?
>
> if typing.TYPE_CHECKING or os.path.split(sys.argv[0])[1] == "ipython3":
> import other_mod
>

It is possible for heavy modules, but not possible to avoid circular imports.
Additionally, there are some cases modules are not runtime importable.

* Optional dependency, user may not install it.
* Dummy modules having only "pyi" files.

If PEP 563 becomes the default, we can provide a faster way to get the
text signature without eval() annotated string. So eval() performance
is not a problem here.
Many type hinting use cases don't need type objects in runtime.
So I think PEP 563 is better for type hinting user experience.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QR2KOGAXR5T4GTLGL5NLPWSVWPGVFQAI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Deferred Evaluation Of Annotations Using Descriptors

2021-01-16 Thread Inada Naoki
This PEP doesn't cover about what happened when __co_annotation__()
failed (e.g. NameError).

Forward reference is a major reason, but not a only reason for using
string annotation. There are two other reasons:

* Avoid importing heavy module.
* Avoid circular imports.

In these cases, this pattern is used:

```
from __future__ import annotations
import typing
from dataclasses import dataclass

if typing.TYPE_CHECKING:
import other_mod  # do not want to import actually

@dataclass
class Foo:
a: other_mod.spam
b: other_mod.ham

def fun(a: other_mod.spam, b: other_mod.ham) -> None: ...
```

Of course, mypy works well with string annotation because it is static checker.
IPython shows signature well too:

```
In [3]: sample.Foo?
Init signature: sample.Foo(a: 'other_mod.spam', b: 'other_mod.ham') -> None
Docstring:  Foo(a: 'other_mod.spam', b: 'other_mod.ham')
```

PEP 563 works fine in this scenario. How PEP 649 works?

Regards,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YMMYKST2B4IJJOHAQIFIBAT57MKBBG56/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Deferred Evaluation Of Annotations Using Descriptors

2021-01-12 Thread Inada Naoki
On Wed, Jan 13, 2021 at 1:47 AM Larry Hastings  wrote:
>
> On 1/11/21 5:33 PM, Inada Naoki wrote:
>
> Note that PEP 563 semantics allows more efficient implementation.
> Annotation is just a single constant tuple, not a dict.
> We already have the efficient implementation for Python 3.10.
>
> The efficient implementation in 3.10 can share tuples. If there are
> hundreds of methods with the same signature, annotation is just a
> single tuple, not hundreds of tuples. This is very efficient for auto
> generated codebase. I think this PEP can share the code objects for
> same signature by removing co_firstlineno information too.
>
> That's very clever!  My co_annotations repo was branched from before this 
> feature was added, and I haven't pulled and merged recently.  So I hadn't 
> seen it.
>

Please see this pull request too. It merges co_code and co_consts. It
will save more RAM and importing time of your implementation.
https://github.com/python/cpython/pull/23056

>
> Additionally, we should include the cost for loading annotations from
> PYC files, because most annotations are "load once, set once".
> Loading "simple code object" from pyc files is not so cheap. It may
> affect importing time of large annotated codebase and memory
> footprints.
>
> I did some analysis in a separate message.  The summary is, the code object 
> for a single annotation costs us 232 bytes; that includes the code object 
> itself, the bytestring for the bytecode, and the bytestring for the lnotab.  
> This grows slowly as you add new parameters; the code object for ten 
> parameters is 360 bytes.
>
> It seems possible to create a hybrid of these two approaches!  Here's my 
> idea: instead of the compiler storing a code object as the annotations 
> argument to MAKE_FUNCTION, store a tuple containing the fields you'd need to 
> recreate the code object at runtime--bytecode, lnotab, names, consts, etc. 
> func_get_annotations would create the code object from that, bind it to a 
> function object, call it, and return the result.  These code-object-tuples 
> would then be automatically shared in the .pyc file and at runtime the same 
> way that 3.10 shares the tuples of stringized annotations today.

It may be good idea if we can strip most code object members, like
argcount, kwonlyargcount, nlocals, flags, freevars, cellvars,
filename, name, firstlineno, linetable.
It can be smaller than Python 3.9.

>
> That said, I suggest PEP 649's memory consumption isn't an urgent 
> consideration in choosing to accept or reject it.  PEP 649 is competitive in 
> terms of startup time and memory usage with PEP 563, and PEP 563 was accepted 
> and shipped with several versions of Python.
>

I still want a real-world application/library with heavy annotation.
My goal is to use annotations in the stdlib without caring about
resource usage or importtime.
But I agree with you if PEP 649 will be smaller than Python 3.9.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CNXCFW2PXUCZ75OBFZTUS3TVKI3IEKZH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Deferred Evaluation Of Annotations Using Descriptors

2021-01-11 Thread Inada Naoki
> Performance
> ---
>
> Performance with this PEP should be favorable.  In general,
> resources are only consumed on demand—"you only pay for what you use".
>

Nice!

> There are three scenarios to consider:
>
> * the runtime cost when annotations aren't defined,
> * the runtime cost when annotations are defined but *not* referenced, and
> * the runtime cost when annotations are defined *and* referenced.
>

Note: The first two cases are major. Many codes doesn't have
annotations. Many codes use annotations just only for documentation or
static checker.
In the second scenario, the annotations must be very cheap. Its cost
must be comparable with docstrings.
Otherwise, people can not use annotation freely in a large codebase.
Or we must provide an option like -OO to strip annotations.


> We'll examine each of these scenarios in the context of all three
> semantics for annotations: stock, PEP 563, and this PEP.
>
> When there are no annotations, all three semantics have the same
> runtime cost: zero. No annotations dict is created and no code is
> generated for it.  This requires no runtime processor time and
> consumes no memory.
>
> When annotations are defined but not referenced, the runtime cost
> of Python with this PEP should be slightly faster than either
> original Python semantics or PEP 563 semantics.  With those, the
> annotations dicts are built but never examined; with this PEP,
> the annotations dicts won't even be built.  All that happens at
> runtime is the loading of a single constant (a simple code
> object) which is then set as an attribute on an object.  Since
> the annotations are never referenced, the code object is never
> bound to a function, the code to create the dict is never
> executed, and the dict is never constructed.
>

Note that PEP 563 semantics allows more efficient implementation.
Annotation is just a single constant tuple, not a dict.
We already have the efficient implementation for Python 3.10.

The efficient implementation in 3.10 can share tuples. If there are
hundreds of methods with the same signature, annotation is just a
single tuple, not hundreds of tuples. This is very efficient for auto
generated codebase. I think this PEP can share the code objects for
same signature by removing co_firstlineno information too.

Additionally, we should include the cost for loading annotations from
PYC files, because most annotations are "load once, set once".
Loading "simple code object" from pyc files is not so cheap. It may
affect importing time of large annotated codebase and memory
footprints.

I think we need a reference application that has a large codebase and
highly annotated. But we need to beware even if the large application
is 100% annotated, libraries used are not 100% annotated.
Many libraries are dropping Python 2 support and start annotating. The
cost of the annotations will become much more important in next
several years.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q5MSBHXD3VPCVQODSSO3FOB3DRQS4SVG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Enhancement request for PyUnicode proxies

2020-12-29 Thread Inada Naoki
On Mon, Dec 28, 2020 at 7:22 PM Phil Thompson
 wrote:
>
> > So I'm +1 to make Unicode simple by removing PyUnicode_READY(), and -1
> > to make Unicode complicated by adding customizable callback for lazy
> > population.
> >
> > Anyway, I am OK to un-deprecate PyUnicode_READY() and make it no-op
> > macro since Python 3.12.
> > But I don't know how many third-parties use it properly, because
> > legacy Unicode objects are very rare already.
>
> For me lazy population might not be enough (as I'm not sure precisely
> what you mean by it). I would like to be able to use my foreign unicode
> thing to be used as the storage.
>
> For example (where text() returns a unicode object with a foreign
> kind)...
>
> some_text = an_editor.text()
> more_text = another_editor.text()
>
> if some_text == more_text:
>  print("The text is the same")
>
> ...would not involve any conversions at all.

So you mean custom internal representation of exact Unicode object?

Then, I am more strong -1, sorry.
I can not believe the merits of it is bigger than the costs of its complexity.
If 3rd party wants to use completely different internal
representation, it must not be a unicode object at all.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7SBDIVIPLG4JS7MTJOGDEQXME6TFK5FU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Enhancement request for PyUnicode proxies

2020-12-28 Thread Inada Naoki
On Mon, Dec 28, 2020 at 10:53 PM Antoine Pitrou  wrote:
>
> On Mon, 28 Dec 2020 11:07:46 +0900
> Inada Naoki  wrote:
> >
> > Additionally, if we introduce the customizable lazy str object, it's
> > very easy to release GIL during basic Unicode operations. Many third
> > parties may assume PyUnicode_Compare doesn't release GIL if both
> > operands are Unicode objects.
>
> 1) You have to prove such "many third parties" exist.  I've written my
> share of C extension code and I don't remember assuming that
> PyUnicode_Compare doesn't release the GIL.
>

It is my fault that I said "many", but I just pointed out possible
backward incompatibility. Why I have to prove it?

> 2) Even if there is such third party code, it is clearly making
> assumptions about undocumented implementation details. It is therefore
> ok to break it in new versions of CPython.
>

But it should be considered carefully, because these APIs are not
releasing GIL for a long time.
And this type of change do not cause just a simple crash, but very
rare undefined behaviors in multithreaded complex applications.
For example, borrowed references in the caller can be changed to other
objects with same size because memory blocks are reused.
It is very difficult to notice and reproduce.

> However, I agree that having to call PyUnicode_READY() before calling
> C unicode APIs is probably an obscure detail that few people remember
> about.

If we provide custom callback and call it in PyUnicode_READY(), many
Unicode APIs using PyUnicode_READY() will be changed from predictable
behavior API to "may run arbitrary code" behavior. It is obscure
detail too.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FZCXLNKZDQXJL6FQ63GWS4DKLXVDFW2Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Enhancement request for PyUnicode proxies

2020-12-28 Thread Inada Naoki
On Mon, Dec 28, 2020 at 8:52 PM Phil Thompson
 wrote:
>
>
> I would have thought that an object was defined by its behaviour rather
> than by any particular implementation detail.
>

As my understanding, the policy "an object was defined by its
behavior..." doesn't mean "put unlimited amount of implementation
behind one concrete type."
The policy means APIs shouldn't limit input to one concrete type
without a reason. In other words, duck typing and structural subtyping
are good.

For example, we can try making io.TextIOWrapper accepts not only
Unicode objects (including subclass) but any objects implementing some
protocol.
We already have __index__ for integers and buffer protocol for
byts-like objects. That is examples of the policy.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E3ZMFJDYKDCFPA4ROESPK6T4JPYQMTLU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Enhancement request for PyUnicode proxies

2020-12-27 Thread Inada Naoki
On Sun, Dec 27, 2020 at 8:20 PM Ronald Oussoren via Python-Dev
 wrote:
>
> On 26 Dec 2020, at 18:43, Guido van Rossum  wrote:
>
> On Sat, Dec 26, 2020 at 3:54 AM Phil Thompson via Python-Dev 
>  wrote:
>>
>
> That wouldn’t be a solution for code using the PyUnicode_* APIs of course, 
> nor Python code explicitly checking for the str type.
>
> In the end a new string “kind” (next to the 1, 2 and 4 byte variants) where 
> callbacks are used to provide data might be the most pragmatic.  That will 
> still break code peaking directly in the the PyUnicodeObject struct, but 
> anyone doing that should know that that is not a stable API.
>

I had a similar idea for lazy loading or lazy decoding of Unicode objects.
But I have rejected the idea and proposed to deprecate
PyUnicode_READY() because of the balance between merits and
complexity:

* Simplifying the Unicode object may introduce more room for
optimization because Unicode is the essential type for Python. Since
Python is a dynamic language, a huge amount of str comparison happened
in runtime compared with static languages like Java and Rust.
* Third parties may forget to check PyErr_Occurred() after API like
PyUnicode_Contains or PyUnicode_Compare when the author knows all
operands are exact Unicode type.

Additionally, if we introduce the customizable lazy str object, it's
very easy to release GIL during basic Unicode operations. Many third
parties may assume PyUnicode_Compare doesn't release GIL if both
operands are Unicode objects. It will produce bugs hard to find and
reproduce.

So I'm +1 to make Unicode simple by removing PyUnicode_READY(), and -1
to make Unicode complicated by adding customizable callback for lazy
population.

Anyway, I am OK to un-deprecate PyUnicode_READY() and make it no-op
macro since Python 3.12.
But I don't know how many third-parties use it properly, because
legacy Unicode objects are very rare already.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YQAWPENXXSRTEXOFUAWAUPXXJKFFAMIU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Ideas for improving the contribution experience

2020-11-25 Thread Inada Naoki
On Sat, Oct 17, 2020 at 7:31 AM Tal Einat  wrote:
>
>
> 4. Encourage core devs to dedicate some of their time to working through 
> issues/PRs which are "ignored" or "stalled". This would require first 
> generating reliable lists of issues/PRs in such states. This could be in 
> various forms, such as predefined GitHub/b.p.o. queries, a dedicated 
> web-page, a periodic message similar to b.p.o.'s "weekly summary" email, or 
> dedicated tags/labels for issues/PRs. (Perhaps prioritize "stalled" over 
> "ignored".)
>

I looked random pull requests during the core sprint and I have merged
dozen pull requests.
But it is difficult to find any pull requests I can do anything. See this list:

open + awaiting merge: https://github.com/python/cpython/labels/awaiting%20merge
open + awaiting core review:
https://github.com/python/cpython/labels/awaiting%20merge

* Some PR is "owned" by some core developer already. We just need to
wait them. (Although assignee is not set in most case)
* Some PR is under discussion in b.p.o. issue.
* Some PR requires expertise at some areas.

etc...

Significant effort is required to find PRs I can help. My ideas:

* Recommend to set "assignee" more aggressive.
  * "Assign me" when you will review/work on it later.
  * Check "assign to me" when you have time.
  * Feel free to un-assign.
* Add "waiting who's help" in the issue title.
  * E.g. "needs compiler experts", "xml", "http", "mail", etc...

It will help to find PRs I can help in my spare time.

Regards,



--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U66XTG6YWPWRQZMFVRQJ3PPLPWWCEEKR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-11-10 Thread Inada Naoki
On Wed, Nov 11, 2020 at 10:19 AM John Hagen  wrote:
>
> I admit I am in a very small minority however. That being said, I have 
> discovered
> a few minor bugs in my code or in third party libraries over the years using 
> -bb.
> But I would understand still wanting to remove this feature to lower
> maintenance burden.

Which warning helped you? str(bytes)? bytes == str? or bytes == int?

I am not much concerned about removing str(bytes) warning anytime soon.
Only bytes== warning is significant maintenance burden.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KROXMCDQGYAJDFH7ZSMZG2SW22TTWJ4Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please do not remove random bits of information from the tutorial

2020-11-08 Thread Inada Naoki
On Mon, Nov 9, 2020 at 9:20 AM Terry Reedy  wrote:
>
> >
> > Again, I did it following not only a brief exchange on the b.p.o., but
> > also long discussion in Python-dev.
>
> When I act on an issue on the basis of a pydev discussion, I hopefully
> remember to avoid such after-the-fact concerns by mentioning the thread
> with a title and brief summary.  Example: "In pydev thread '', 3 of
> 4 coredevs agreed with this change." (Made up numbers do not refer to
> any particular issue.)  FWIW, I agreed with this particular change also.
>

OK. Since checking all mails in the long thread is tedious job, I will
pick some up and leave a comment in the b.p.o.


>  > exception chaining is described well in other place
>
> If on the ball, I would have mentioned this also, and where.
>

https://docs.python.org/3/library/exceptions.html#built-in-exceptions
This is mentioned from the reviced tutorial, python-dev thread, and b.p.o issue.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VRWGDZWUGHELGO77AANXQSA6G4ROPXZO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please do not remove random bits of information from the tutorial

2020-11-08 Thread Inada Naoki
On Mon, Nov 9, 2020 at 3:46 AM Riccardo Polignieri via Python-Dev
 wrote:
>
> Hi Inada,
>
> > Note that the discussion not only in the b.p.o thread, but in this
> > mailing list too.
> > Please read this long thread.
>
> I have, but I don't know if I want to step in...
> I do have, indeed, opinions (and some expertise) on the matter, but
> I've never really thought about it in depth...
>

Then, you know the deletion is after long discussion, not only short
discussion in b.p.o.


>
> In any case, I think that any changes to the tutorial should not be done
> in the same way as the rest of the documentation, but ideally only
> through an editorial, centralized, process.
>

Agree. Tutorial should be written in special care.
When editing library or language references, editors can focus one
section and technical correctness.
On the other hand, when editing tutorials, editors should read the
whole chapter and understand story. And keep in mind that new Python
users may learn Python by reading tutorial always.


> In the meantime, I would insist that any changes should be made
> very carefully,

Agree.

> especially when we delete something.
>

Can't agree. Same care is required for adding something too.


> > For the record, when I removed the __cause__ from the tutorial, I
> > believe I was careful and conservative enough.
> > I think from exc syntax is not new Python users should know.
> > Documenting implicit chaining is enough for 99% use cases, and from
> > None covers 0.99% of the rest.
>
> As I said, I have no strong opinion on this particular case.
> Rather, I am slightly concerned about the method in itself - that a deletion
> may occur following only a brief exchange on the bug tracker.
>

Again, I did it following not only a brief exchange on the b.p.o., but
also long discussion in Python-dev.


> For what it's worth, however, I would have kept the passing mention on
> __cause__, and would have added a passing mention on __context__ too.
> It's not what I would write today in a tutorial for the "modern" beginner,
> but it's certainly more *consistent* with what the tutorial is right now.

I don't think so. I agree that the tutorial is "syntax showcase" for
now. Many minor or expert syntaxes are described.
But the tutorial isn't a "special attribute showcase".
It doesn't cover all special attributes and describe how Python
interpreter use the special attributes under the ground.
So removing __cause__ made the tutorial more consistent right now.


>
> > So I considered removing explicit chaining (e.g. from exc) from the
> > section too.
>
> See... this is what really concerns me. At some point someone may decide
> out of the blue to remove an entire important concept from the tutorial
> because "it's just noise for a beginner".

You are ignoring me!
I said I considered removing "explicit" chaining (e.g. `from exc`).
But I said "implicit chaining is enough for 99% use cases".
Definitely, I never tried to "remove an entire important concept from
the tutorial."


> But very often the documentation is very terse and the tutorial is the only 
> place
> where some concepts are presented in a discursive way ... even if not
> "beginner friendly".
>

You know, exception chaining is described well in other place so it was removed.
I promise that I don't remove something in the tutorial without
checking it is described well in other places.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B55E5SWJ5XZZ6LOTA4VG76MAYMJ3DMO7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please do not remove random bits of information from the tutorial

2020-11-07 Thread Inada Naoki
Hi, Riccardo.

I'm who removed the __cause__ from tutorial, and I had translated
Japanese translation of official Python document hard.


On Sun, Nov 8, 2020 at 3:07 AM Riccardo Polignieri via Python-Dev
 wrote:
>
> This morning I noticed this new commit, referring to bpo-42179: "Remove 
> mention of __cause__" (https://bugs.python.org/issue42179). From reading this 
> thread, it turns out that a minor confusion in the wording, about __cause__ 
> and __context__, very quickly turned into the decision to completely remove 
> any reference to __cause__, the reason being that "generally speaking (...) 
> we should *reduce* some details from tutorial".
>

Note that the discussion not only in the b.p.o thread, but in this
mailing list too.
Please read this long thread.
https://mail.python.org/archives/list/python-dev@python.org/thread/MXMEFFYB6JXAKSS36SZ7DX4ASP6APWFP/

>
> We all know that the tutorial is "too difficult", not really suitable for 
> beginners (perhaps not even Python beginners!). Having spent a couple of 
> weeks translating it as recently as this spring, I am well aware of this. At 
> times, it feels more like a demo or a features' showcase: a feeling 
> reinforced, by the way, often by the *recent* additions like the special 
> parameters syntax 
> (https://docs.python.org/3/tutorial/controlflow.html#special-parameters).
>

Agree.

> However, unless someone undertakes a sweeping rewriting of the tutorial (and 
> until they do), I think it would be unwise to start cherry-picking the 
> occasional bit of information to expunge or "simplify" from time to time, 
> without an overall guideline.
>

I expect documenting working group will discuss the guideline.

> Reason is, right now the tutorial is packed with informations 
> (beginner-unfriendly as they might be) that you would be hard pressed to find 
> elsewhere in the documentation: see the aforementioned section on special 
> parameters; see the maddening chapter on Classes (and especially the 
> exposition on scopes and namespaces); see, of course, the floating point 
> appendix; and I could go on.
>

Agree. There are tons of beginner unfriendly bits in the tutorial.


> My concern here is that if you start removing or simplifying some 
> "too-difficult-for-a-tutorial" bits of information on an occasional basis, 
> and without too much scrutiny or editorial guidance, you will end up loosing 
> something precious.
>
> Like everyone, I also look forward to an overall rewriting of the tutorial; 
> but in the meantime, I would kindly ask you to be very careful and 
> conservative about deleting information solely for "didactic" reasons.
>

For the record, when I removed the `__cause__` from the tutorial, I
believe I was careful and conservative enough.

I think `from exc` syntax is not new Python users should know.
Documenting implicit chaining is enough for 99% use cases, and `from
None` covers 0.99% of the rest.
So I considered removing explicit chaining (e.g. `from exc`) from the
section too.
But I kept it, because it's a "syntax showcase" even though it will
give more noise to beginners.

And deleting `__cause__` is not solery for "didactic" reason, nor
"loosing something precious."
As written in the b.p.o. issue, mention only `__cause__` "lose some precious".
We need to describe `__context__` and `__suppress_context__` too.
But all of them are documented in library/exceptions.html.
Removing `__cause__` and adding a link to library/exceptions.html
makes sense more than documenting all.

Bests,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O3TKYY24BEJHJBROC2KEJJCR6SYZLIKM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Who is target reader of tutorial?

2020-11-05 Thread Inada Naoki
Hi, all.

Since "How To" guide is not organized well, it is very tempting to
write all details in tutorial.
I have seen may pull requests which "improve" tutorial by describe
more details and make
the tutorial more perfect.

But adding more and more information into tutorial makes it slower to
learn Python by reading tutorial.
10+ years ago, Python is far less popular in Japan and reading
tutorial is the best way to learn Python to me.

But now Python is popular and there are many free/paid good books and
tutorials on the Web.
Some of them would be more new user friendly than official tutorial.
Then, should official Python tutorial become detailed guide to Python?
Or should we keep new user learning Python as targeted reader?

There is ongoing issue for example: https://bugs.python.org/issue42179

Chaining exception was added in tutorial.  Current tutorial mention to
`__cause__` attribute.
https://docs.python.org/3/tutorial/errors.html#exception-chaining

bpo-42179 proposes to add mention to `__context__` to make the
tutorial more accurate about implicit chaining.
And https://github.com/python/cpython/pull/23160 is the pull request
to mention `__context__`.

On the other hand, I want to remove confusion by removing mention to
`__cause__`.
Because I don't think `__context__` and `__cause__` is important for new users.
See https://github.com/python/cpython/pull/23162 for my proposal.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MXMEFFYB6JXAKSS36SZ7DX4ASP6APWFP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Inada Naoki
On Tue, Oct 27, 2020 at 5:23 AM Victor Stinner  wrote:
>
> os.get_exec_path() must modify temporarily warnings filters to ignore
> BytesWarning when it looks for 'PATH' (unicode) or b'PATH' (bytes) in
> the 'env' dictionary which may contain unicode or bytes strings.
> Modifying warnings filters impact all threads which is bad.
>
> I dislike having to workaround this annoying behavior for dict lookup
> when -b or -bb is used.
>

Completely agree with you.

> I'm quite sure that almost nobody uses -b or -bb when running their
> test suite or to develop. I expect that nobody uses it. According to
> replies, it seems like porting Python 2 code to Python 3 is the only
> use case. Python 3.9 and older can be used for that, no?
>

I think so. But I became a bit conservative when writing this proposal.


> > When can we remove it? My idea is:
> >
> > 3.10: Deprecate the -b option.
>
> Do you mean writing a message into stderr? Or just deprecate it in the
> documentation?

I thought document only.

>
> > 3.11: Make the -b option no-op. Bytes warning never emits.
> > 3.12: Remove the -b option.
>
> There is no _need_ to raise an error when -b is used. The -t option
> was kept even after the feature was removed (in Python 3.0 ?). -J
> ("used by Jython" says a comment) is a second command line option
> which is silently ignored.
>

I see.

>
> > BytesWarning will be deprecated in the document, but not to be removed.
>
> I don't see what you mean here. I dislike the idea of deprecating a
> feature without scheduling its removal. I don't see the point of
> deprecating it in this case. I only see that as an annoyance.
>

Document only deprecation is useful for readers. Readers can know "I
can just ignore this.".


> I'm fine with removing the exception. If you don't plan to remove it,
> just leave it unchanged (not deprecated), no?
>

OK, my new proposal is:

3.10: Stop emitting BytesWarning for bytes == unicode case, because
this is the most annoying part.
3.11: Stop emitting BytesWarning in core and stdlib.
4.0: Remove `-b` option, `sys.flags.bytes_warning`, and `BytesWarning`.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OQOCZISWWKRFAFMZJI5GMA3SNEQ2TYIJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When to remove BytesWarning?

2020-10-26 Thread Inada Naoki
On Mon, Oct 26, 2020 at 4:35 PM Senthil Kumaran  wrote:
>
> On Sat, Oct 24, 2020 at 6:18 AM Christian Heimes  wrote:
>>
>> In my experience it would be useful to keep the bytes warning for
>> implicit representation of bytes in string formatting. It's still a
>> common source of issues in code.
>
> I am with Christian here.

Do you mean you are OK to remove BytesWarning from b"abc" == u"def"
and b"abc" == 42?

> Still notice a possibility of people running into this because all the 
> Python2 code is not dead yet.
> Perhaps this warning might stay for a long time.
>

I never proposed to remove it "now", but 3.11.
3.10 will become security only mode at 2022-04, and EOL at 2026-10.
But you can use Python 3.10 after EOL for porting Python 2 code,
because security fix is not required while porting.

> > BytesWarning has maintenance costs. It is not huge, but significant.
>
> Should we know by how much so that the proposal of `-b` switch can be 
> weighted against?
>

It is difficult to say "how much". We need to keep it in mind that `a
== b` is not safe even for builtin types
everytime we write a patch or review pull request. Especially, when
u"foo" and b"bar" are used as keys
of the same dict, BytesWarnings happens only when (randomized) hash collision.
It is very hard to find this bug.

Of course, there are some runtime costs too.

https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Modules/_functoolsmodule.c#L802
https://github.com/python/cpython/blob/fb5db7ec58624cab0797b4050735be865d380823/Objects/codeobject.c#L724
(maybe more, but I'm not sure)

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/52WLUUIL2LS27R5UFYFICJH5OX3ETSTA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] When to remove BytesWarning?

2020-10-23 Thread Inada Naoki
Hi, all.

To avoid BytesWarning, the compiler needs to do some hack when they
need to store bytes and str constants in one dict or set.
BytesWarning has maintenance costs. It is not huge, but significant.

When can we remove it? My idea is:

3.10: Deprecate the -b option.
3.11: Make the -b option no-op. Bytes warning never emits.
3.12: Remove the -b option.

BytesWarning will be deprecated in the document, but not to be removed.
Users who want to use the -b option during 2->3 conversion need to use
Python ~3.10 for a while.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XBIZSPXCSH4KHPX7A6W7XB3H26LLNZQ4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-08-04 Thread Inada Naoki
On Tue, Aug 4, 2020 at 3:31 PM M.-A. Lemburg  wrote:
>
> Hi Inada-san,
>
> thanks for attending EuroPython. I won't be back online until
> next Wednesday. Would it be possible to wait until then to continue
> the discussion ?
>

Of course. The PEP is for Python 3.11. We have a lot of time.

Bests,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YSRQWCOGXHFL6BOYBAFGW72YOTRII5AR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-08-03 Thread Inada Naoki
Hi, Lemburg.

Thank you for organizing the EuroPython 2020.
I enjoyed watching some sessions from home.

I think current PEP 624 covers all your points and ready for Steering
Council discussion.
Would you like to review the PEP before it?

Regards,


On Thu, Jul 9, 2020 at 8:19 AM Inada Naoki  wrote:
>
> On Thu, Jul 9, 2020 at 5:46 AM M.-A. Lemburg  wrote:
> > - the fact that the encode APIs encoding from a Unicode buffer
> >   to a bytes object; this is an important fact, since the removal
> >   removes access to this codec functionality for extensions
> >
> > - PyUnicode_AsEncodedString() is not a proper alternative, since
> >   it requires to create a temporary PyUnicode object, which is
> >   inefficient and wastes memory
>
> I wrote your points in the "Alternative Idea > Replace Py_UNICODE*
> with Py_UCS4* "
> section. I wrote "User can encode UCS-4 string in C without creating
> Unicode object." in it.
>
> https://www.python.org/dev/peps/pep-0624/#replace-py-unicode-with-py-ucs4
>
> Note that the current Py_UNICODE* encoder APIs create temporary
> PyUnicode objects.
> They are inefficient and wastes memory now. Py_UNICODE* may be UTF-16 on some
> platforms (e.g. Windows) and builtin codecs don't support UTF-16 input.
>
>
> >
> > - the maintenance effect mentioned in the PEP does not really
> >   materialize, since the underlying functionality still exists
> >   in the codecs - only access to the functionality is removed
> >
>
> In the same section, I described the maintenance cost as below.
>
> * Other Python implementations may not have builtin codec for UCS-4.
> * If we change the Unicode internal representation to UTF-8, we need
> to keep UCS-4 support only for these APIs.
>
> > - keeping just the generic PyUnicode_Encode() API would be a
> >   compromise
> >
> > - if we remove the codec specific PyUnicode_Encode*() APIs, why
> >   are we still keeping the specisl PyUnicde_Decode*() APIs ?
> >
>
> OK, I will add "Discussions" section. (I don't like "FAQ" because some 
> question
> are important even if it is not "frequently" asked.)
>
> Quick answer is:
>
> * They are stable ABI. (Py_UNICODE is excluded from stable ABI).
> * Decoding from char* is more common and generic use case than encoding from
>   Py_UNICODE*.
> * Other Python implementations using UTF-8 as internal representation
> can implement
>   it easily.
>
> But I'm not opposite to remove it (especially for minor UTF-7 codec).
> It is just out of scope of this PEP.
>
>
> > - the deprecations were just done because the Py_UNICODE data
> >   type was replaced by a hybrid type. Using this as an argument
> >   for removing functionality is not really good practice, when
> >   these are ways to continue exposing the functionality using other
> >   data types.
>
> I hope the "Replace Py_UNICODE* with Py_UCS4* " section describe this.
>
> Regards,
>
> --
> Inada Naoki  



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LXS6SXGX3HADR2GHWWC3C4Q3UGN4M2CR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 626: Precise line numbers for debugging and other tools.

2020-07-22 Thread Inada Naoki
On Wed, Jul 22, 2020 at 10:53 PM Antoine Pitrou  wrote:
>
>
> Le 22/07/2020 à 15:48, Inada Naoki a écrit :
> > On Wed, Jul 22, 2020 at 8:51 PM Antoine Pitrou  wrote:
> >>
> >>>
> >>> I don't think all attempts are failed.  Note that current CPython includes
> >>> some optimization already.
> >>
> >> The set of compile-time optimizations has almost not changed since at
> >> least 15 years ago.
> >>
> >
> > Constant folding is rewritten and unused constants are removed from 
> > co_consts.
> > That's one of what Victor did his project.
>
> Constant folding is not a new optimization, so this does not contradict
> what I said.  Also, constant folding is not precluded by Mark's
> proposal, AFAIK.
>

Yes, this is to off topic. Please stop it.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ARPS3HD4AGATCX4ZXSO2QIRICE4FFE7O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 626: Precise line numbers for debugging and other tools.

2020-07-22 Thread Inada Naoki
On Wed, Jul 22, 2020 at 8:51 PM Antoine Pitrou  wrote:
>
> >
> > I don't think all attempts are failed.  Note that current CPython includes
> > some optimization already.
>
> The set of compile-time optimizations has almost not changed since at
> least 15 years ago.
>

Constant folding is rewritten and unused constants are removed from co_consts.
That's one of what Victor did his project.

> > And I think there are some potential optimization if we can limit some
> > debugging/introspecting features, like some C variables are "optimzed away"
> > in gdb when
> > we use -O option.
>
> You can think it, but where's the proof?  Or at least the design
> document for these optimizations?  How do you explain that Victor's
> attempt at static optimization failed?
>

I have some opinion about it (especially, PHP 7.x achieved significant
performance improvement without JIT. I envy it.).
But I don't have time to prove it, and it is too off topic because it
is not related to precise line number. Please forget what I said about
blocking future optimization.

My idea was just merging code blocks, but it is not worth enough. And
it is not related to execution speed.

On the other hand, if we can not remove lnotab, it is still
considerable to avoid having two lnotabs in -O mode.
Memory overhead of lnotab is not negligible.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5T6ORCSTFS4EBQTYB7XLUJZ37C2WZECP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 626: Precise line numbers for debugging and other tools.

2020-07-22 Thread Inada Naoki
On Wed, Jul 22, 2020 at 6:12 PM Antoine Pitrou  wrote:
>
> >
> > But if we merge two equal code blocks, we can not produce precise line
> > numbers, can we?
> > Is this inconsistent microoptimization that real optimization harder?
> > This optimization must be prohibited in future Python?
>
> All attempts to improve Python performance by compile-time
> bytecode optimizations have more or less failed (the latter was
> Victor's, AFAIR).  Is there still interest in pursuing that avenue?
>
> Regards
>
> Antoine.
>

I don't think all attempts are failed.  Note that current CPython includes
some optimization already. If they are all failed, we must remove them
 to make compiler simple.

And I think there are some potential optimization if we can limit some
debugging/introspecting features, like some C variables are "optimzed away"
in gdb when
we use -O option.

Regards,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BMWA4JES4FFFXMNZAVWCCRJD5NQCPMAK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 626: Precise line numbers for debugging and other tools.

2020-07-21 Thread Inada Naoki
On Wed, Jul 22, 2020 at 3:43 AM Mark Shannon  wrote:
>
> On 18/07/2020 9:20 am, Inada Naoki wrote:
> > It seems great improvement, but I am worrying about performance.
> >
> > Adding more attributes to the code object will increase memory usage
> > and importing time. Is there some estimation of the overhead?
>
> Zero overhead (approximately).
> We are just replacing one compressed table with another at the C level.
> The other attributes are computed.
>
> >
> > And I am worrying precise tracing blocks future advanced bytecode 
> > optimization.
> > Can we omit precise tracing and line number information when
> > optimization (`-O`) is enabled?
>
> I don't think that is a good idea.
> Performing any worthwhile performance optimization requires that we can
> reason about the behavior of programs.
> Consistent behavior makes that much easier.
> Inconsistent "micro optimizations" make real optimizations harder.
>
> Cheers,
> Mark.
>

Tracing output is included in the program behavior?

For example, if two code block is completely equal:

if a == 1:
   very very
   long
   code block
elif a == 2:
   very very
   long
   code block

This code can be translated into like this (pseudo code):

if a == 1:
goto block1
if a == 2:
goto block1
block1:
very very
long
code block

But if we merge two equal code blocks, we can not produce precise line
numbers, can we?
Is this inconsistent microoptimization that real optimization harder?
This optimization must be prohibited in future Python?

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F7J337VCPPS47QYSNKSQ2CXGRNQTAYJG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 626: Precise line numbers for debugging and other tools.

2020-07-18 Thread Inada Naoki
It seems great improvement, but I am worrying about performance.

Adding more attributes to the code object will increase memory usage
and importing time. Is there some estimation of the overhead?

And I am worrying precise tracing blocks future advanced bytecode optimization.
Can we omit precise tracing and line number information when
optimization (`-O`) is enabled?

Regards,

On Fri, Jul 17, 2020 at 11:49 PM Mark Shannon  wrote:
>
> Hi all,
>
> I'd like to announce a new PEP.
>
> It is mainly codifying that Python should do what you probably already
> thought it did :)
>
> Should be uncontroversial, but all comments are welcome.
>
> Cheers,
> Mark.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UARJFY3PZZYKRANS6RCMR2XBVVM/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AXA6MDF7R63LV5MULB2K5MJ4MNI3ZDK6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: I plan to accept PEP 623 "Remove wstr from Unicode" next week

2020-07-14 Thread Inada Naoki
Thank you Victor for PEP-Delegating and accepting.

Reducing 8bytes per str object (or 16bytes per non-ASCII str object)
will be a significant
win for all Python users.

On Wed, Jul 15, 2020 at 8:20 AM Victor Stinner  wrote:
>
> Hi,
>
> I have the pleasure of announcing that I accept the PEP 623 "Remove wstr from 
> Unicode", congratulations INADA-san!
>
> I see this PEP as a good way to better communicate on incoming backward 
> incompatible C API changes. The PEP is a good document to explain the 
> Motivation, the Rationale and also to list affected C functions. It can also 
> be used and referenced in What's New in Python documents.
>
> INADA-san: I let you update the PEP status. You may also announce the PEP 
> approval on the capi-sig mailing list.
>
> Victor
>
> Le mer. 8 juil. 2020 à 10:56, Victor Stinner  a écrit :
>>
>> Hi,
>>
>> As the PEP delegate of the PEP 623, I plan to accept PEP 623 "Remove
>> wstr from Unicode" next week. As far as I know, all previous remarks
>> have been taken in account.
>>
>> https://www.python.org/dev/peps/pep-0623/
>>
>> I worked with INADA-san to adjust his PEP 623 plan:
>>
>> * DeprecationWarning warnings will be emitted as soon as Python 3.10
>> to help developers detect the deprecation at runtime, rather than only
>> announcing the deprecation with compiler warnings and in the
>> documentation.
>>
>> * Developers have two Python releases (3.10 and 3.11) with these
>> runtime warnings before functions are removed
>>
>> * The PEP lists all APIs which will be removed in Python 3.12.
>>
>> * The PEP gives links to past discussions and issues.
>>
>> About the "size > 0" condition in "PyUnicode_FromUnicode(NULL, size)
>> and PyUnicode_FromStringAndSize(NULL, size) emit DeprecationWarning
>> when size > 0". INADA-san made sure that Cython avoids
>> PyUnicode_FromUnicode(NULL, 0) to create an empty string: it's already
>> fixed! The fix will be part of the next Cython 0.29.x release (it
>> should be 0.29.21). But it will take time until popular extension
>> modules using Cython will distribute a new release with updated
>> generated C code.
>>
>> INADA-san checked popular PyPI projects. The majority of projects
>> impacted by the PEP are using Cython and so are easy to fix: just
>> regenerate C code with the fixed Cython. He added: "A few projects,
>> pyScss and Genshi are not straightforward. But it is not too hard and
>> I will help them." We have time before Python 3.12 final to update
>> these projects.
>>
>> The PEP 623 is backward incompatible on purpose. If needed, it remains
>> possible to use a single code base working on Python 2.7 and Python
>> 3.12 using #ifdef. But Python 3.12 will not be released before 2023:
>> three years after Python 2 end of life, so I think that it's
>> reasonable for extension modules to consider dropping Python 2 support
>> to implement the PEP 623 (stop using these deprecated C APIs).
>>
>> Victor
>> --
>> Night gathers, and now my watch begins. It shall not end until my death.
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/YAX5DP7IBDJOEAAFJRND26OIRR6Y6APA/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GZVMT72UKN774O4LV6HERNFSRVP6S65N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-09 Thread Inada Naoki
On Thu, Jul 9, 2020 at 10:13 PM Jim J. Jewett  wrote:
>
> Unless I'm missing something, part of M.-A. Lemburg's objection is:
>
> 1.  The wchar_t type is itself an important interoperability story in C.  
> (I'm not sure if this includes the ability, at compile time, to define 
> wchar_t as either of two widths.)
>

Of course.  But wchar_t* is not the only way to use Unicode in C.
UTF-8 is the most common way to use Unicode in C in recent days.
(except Java, .NET, and Windows API)
So the importance of wchar_t* APIs are relative, not absolute.

In other words, why don't we have an encode API with direct UTF-8 input?
Is there any evidence wchar_t* is much more important than UTF-8?


> 2.  The ability to work directly with wchar_t without a round-trip in/out of 
> python format is an important feature that CPython has provided for C 
> integrators.
>

Note that current API *does* the round-trip:
For example: 
https://github.com/python/cpython/blob/61bb24a270d15106decb1c7983bf4c2831671a75/Objects/unicodeobject.c#L5631-L5644

Users can not use the API without initializing Python VM.
Users can not avoid time and space for the round-trip.
So removing these APIs doesn't reduce any ability.


> 3.  The above support can be kept even without the wchar_t* member ... so 
> saving the extra space on each string instance does not require dropping this 
> support.
>

This is why I split PEP 623 and PEP 624.  I never said removing the
wchar_t* member is motivation for PEP 624.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UOE2ZYNSB7UEUTEGH27LB5IWPDYO5IDY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-08 Thread Inada Naoki
On Thu, Jul 9, 2020 at 5:46 AM M.-A. Lemburg  wrote:
> - the fact that the encode APIs encoding from a Unicode buffer
>   to a bytes object; this is an important fact, since the removal
>   removes access to this codec functionality for extensions
>
> - PyUnicode_AsEncodedString() is not a proper alternative, since
>   it requires to create a temporary PyUnicode object, which is
>   inefficient and wastes memory

I wrote your points in the "Alternative Idea > Replace Py_UNICODE*
with Py_UCS4* "
section. I wrote "User can encode UCS-4 string in C without creating
Unicode object." in it.

https://www.python.org/dev/peps/pep-0624/#replace-py-unicode-with-py-ucs4

Note that the current Py_UNICODE* encoder APIs create temporary
PyUnicode objects.
They are inefficient and wastes memory now. Py_UNICODE* may be UTF-16 on some
platforms (e.g. Windows) and builtin codecs don't support UTF-16 input.


>
> - the maintenance effect mentioned in the PEP does not really
>   materialize, since the underlying functionality still exists
>   in the codecs - only access to the functionality is removed
>

In the same section, I described the maintenance cost as below.

* Other Python implementations may not have builtin codec for UCS-4.
* If we change the Unicode internal representation to UTF-8, we need
to keep UCS-4 support only for these APIs.

> - keeping just the generic PyUnicode_Encode() API would be a
>   compromise
>
> - if we remove the codec specific PyUnicode_Encode*() APIs, why
>   are we still keeping the specisl PyUnicde_Decode*() APIs ?
>

OK, I will add "Discussions" section. (I don't like "FAQ" because some question
are important even if it is not "frequently" asked.)

Quick answer is:

* They are stable ABI. (Py_UNICODE is excluded from stable ABI).
* Decoding from char* is more common and generic use case than encoding from
  Py_UNICODE*.
* Other Python implementations using UTF-8 as internal representation
can implement
  it easily.

But I'm not opposite to remove it (especially for minor UTF-7 codec).
It is just out of scope of this PEP.


> - the deprecations were just done because the Py_UNICODE data
>   type was replaced by a hybrid type. Using this as an argument
>   for removing functionality is not really good practice, when
>   these are ways to continue exposing the functionality using other
>   data types.

I hope the "Replace Py_UNICODE* with Py_UCS4* " section describe this.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N4F5WLSNYUWQO4FEPIOOUCHG4ZFLQVLI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching -- followup

2020-07-08 Thread Inada Naoki
On Wed, Jul 8, 2020 at 6:14 PM Chris Angelico  wrote:
>
>
> These two I would be less averse to, but the trouble is that they make
> the semantics a bit harder to explain. "Dotted names are looked up if
> not already looked up, otherwise they use the same object from the
> previous lookup". If you have (say) "case
> socket.AddressFamily.AF_INET", does it cache "socket",
> "socket.AddressFamily", or both?
>

I meant "It is implementation detail" and "User must not rely on side effects
of attribute access."


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QN6EG4332J2RO27DJYMBLGD7KLQCUZ5A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching -- followup

2020-07-08 Thread Inada Naoki
Since this is very new system, can we have some restriction
to allow aggressive optimization than regular Python code?

# Class Pattern

Example:

match val:
case Point(0, y): ...
case Point(x, 0): ...
case Point(x, y): ...

* Can VM lookup "Point" only once per executing `match`, instead three times?
* Can VM cache the "Point" at first execution, and never lookup in
next time? (e.g. function executed many times)


# Constant value pattern

Example:

match val:
case Sides.SPAM: ...
case Sides.EGGS: ...

* Can VM lookup "Sides" only once, instead of two?
* Can VM cache the value of "Sides.SPAM" and "Sides.EGGS" for next execution?

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GUFY4MNHTF2F75X5YI574LVP6QAZQ5KI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] PEP 624: Remove Py_UNICODE encoder APIs

2020-07-07 Thread Inada Naoki
Hi, folks.

Since the previous discussion was suspended without consensus, I wrote
a new PEP for it. (Thank you Victor for reviewing it!)

This PEP looks very similar to PEP 623 "Remove wstr from Unicode",
but for encoder APIs, not for Unicode object APIs.

URL (not available yet): https://www.python.org/dev/peps/pep-0624/

---

PEP: 624
Title: Remove Py_UNICODE encoder APIs
Author: Inada Naoki 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 06-Jul-2020
Python-Version: 3.11


Abstract


This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in
Python 3.11:

* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``

.. note::

   `PEP 623  <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
   Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
   is not relating to Unicode object. These PEPs are split because they have
   different motivation and need different discussion.


Motivation
==

In general, reducing the number of APIs that have been deprecated for
a long time and have few users is a good idea for not only it
improves the maintainability of CPython, but it also helps API users
and other Python implementations.


Rationale
=

Deprecated since Python 3.3
---

``Py_UNICODE`` and APIs using it are deprecated since Python 3.3.


Inefficient
---

All of these APIs are implemented using ``PyUnicode_FromWideChar``.
So these APIs are inefficient when user want to encode Unicode
object.


Not used widely
---

When searching from top 4000 PyPI packages [1]_, only pyodbc use
these APIs.

* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``

pyodbc uses these APIs to encode Unicode object into bytes object.
So it is easy to fix it. [2]_


Alternative APIs


There are alternative APIs to accept ``PyObject *unicode`` instead of
``Py_UNICODE *``. Users can migrate to them.


=
==
Deprecated APIAlternative APIs
=
==
``PyUnicode_Encode()````PyUnicode_AsEncodedString()``
``PyUnicode_EncodeASCII()``   ``PyUnicode_AsASCIIString()`` \(1)
``PyUnicode_EncodeLatin1()``  ``PyUnicode_AsLatin1String()`` \(1)
``PyUnicode_EncodeUTF7()``\(2)
``PyUnicode_EncodeUTF8()````PyUnicode_AsUTF8String()`` \(1)
``PyUnicode_EncodeUTF16()``   ``PyUnicode_AsUTF16String()`` \(3)
``PyUnicode_EncodeUTF32()``   ``PyUnicode_AsUTF32String()`` \(3)
``PyUnicode_EncodeUnicodeEscape()``   ``PyUnicode_AsUnicodeEscapeString()``
``PyUnicode_EncodeRawUnicodeEscape()``
``PyUnicode_AsRawUnicodeEscapeString()``
``PyUnicode_EncodeCharmap()`` ``PyUnicode_AsCharmapString()`` \(1)
``PyUnicode_TranslateCharmap()``  ``PyUnicode_Translate()``
``PyUnicode_EncodeDecimal()``  \(4)
``PyUnicode_TransformDecimalToASCII()``\(4)
=
==

Notes:

(1)
   ``const char *errors`` parameter is missing.

(2)
   There is no public alternative API. But user can use generic
   ``PyUnicode_AsEncodedString()`` instead.

(3)
   ``const char *errors, int byteorder`` parameters are missing.

(4)
   There is no direct replacement. But ``Py_UNICODE_TODECIMAL``
   can be used instead. CPython uses
   ``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting
   from Unicode to numbers instead.


Plan


Python 3.9
--

Add ``Py_DEPRECATED(3.3)`` to following APIs. This change is committed
already [3]_. All other APIs have been marked ``Py_DEPRECATED(3.3)``
already.

* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``.

Document all APIs as "will be removed in version 3.11".


Python 3.11
---

These APIs are removed.

* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``


Alternative ideas
=

Instead of just removing deprecated APIs, we may be able to use thier
names with different signature.


Mak

[Python-Dev] Tips: Searching deprecated API usage.

2020-07-04 Thread Inada Naoki
Hi, folks.

After 3.9 becomes beta, I am searching deprecated APIs we can remove
in Python 3.10.
I want to share how I am checking how the API is not used.

Please teach me if you know an easy and better approach to find
deprecated API usage.

## Sourcegraph

Github code search is not powerful enough and there is a lot of noise.
(e.g. many people copy CPython source code).
On the other hand, Sourcegraph only searches from major repositories,
and has powerful filtering.
This is an example of `PyEval_ReleaseLock` search.

https://sourcegraph.com/search?q=PyEval_ReleaseLock+file:.*%5C.%28cc%7Ccxx%7Ccpp%7Cc%29+-file:ceval.c+-file:pystate.c=literal=yes


## Top 4000 packages

You can download a list of top 4000 PyPI packages in JSON format from this site.
https://hugovk.github.io/top-pypi-packages/

I used this script to download sdist packages from the JSON file.
https://github.com/methane/notes/blob/master/2020/wchar-cache/download_sdist.py

Note that this script doesn't download packages without sdist (e.g.
only universal wheel).
It is because I have searched Python/C API.
We can reduce the pain of the removal by fixing most of top 4000 packages.


Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/USDJAUABEULVSS2TLODZBYBEEDN2MNHR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Stable ABI question.

2020-07-03 Thread Inada Naoki
On Fri, Jul 3, 2020 at 6:23 PM Victor Stinner  wrote:
>
>
> So it seems possible to fix JEP and pydevd-pycharm. IMHO it's fine to
> remove PyEval_ReleaseLock() in Python 3.10. The deprecation warning is
> there since Python 3.2.
>

While PyEval_AcquireLock is deprecated, PyEval_ReleaseLock is not
deprecated yet in C.
https://github.com/python/cpython/blob/master/Include/ceval.h#L132

Maybe, we can uncomment Py_DEPRECATE in 3.10, and remove it from
header file in 3.12.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/J7GZBVNTBGCFQPWGN5MRNPE2SE5NNXXE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Stable ABI question.

2020-07-02 Thread Inada Naoki
On Thu, Jul 2, 2020 at 7:28 PM Victor Stinner  wrote:
>
> Hi,
>
> Last time I looked at PyEval_AcquireLock(), it was used in the wild,
> but I don't recall exactly where, sorry :-( Before removing the
> functions, I suggest to first notify impacted projects of the incoming
> removal, and maybe even propose a fix.

Thank you for suggestion.

I had grepped PyEval_AcquireLock in top4000 packages and I confirmed
it is not used.
But I had not checked PyEval_ReleaseLock because I thought it is used
only pair with PyEval_AcquireLock.

Actually, PyEval_ReleaseLock is used in three packages:

pydevd-pycharm-202.5103.19/pydevd_attach_to_process/windows/attach.cpp
330:DEFINE_PROC(releaseLock, PyEval_Lock*, "PyEval_ReleaseLock", -160);

jep-3.9.0/src/main/c/Jep/pyembed.c
836:PyEval_ReleaseLock();

ptvsd-4.3.2.zip/ptvsd-4.3.2/src/ptvsd/_vendored/pydevd/pydevd_attach_to_process/windows/attach.cpp
330:DEFINE_PROC(releaseLock, PyEval_Lock*, "PyEval_ReleaseLock", -160);


I will keep PyEval_ReleaseLock.

>
> Getting rid of PyEval_AcquireLock() and PyEval_ReleaseLock() in JEP
> doesn't seem trivial. This project uses subinterpreters and uses/used
> daemon threads.
>

I think they use only PyEval_ReleaseLock().  Do they use
PyEval_AcquireLock() too?

Regards,
--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HMDNL4MC2UMOGRMJ7PVDBWRRFOV7NPAO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Recent PEP-8 change

2020-07-02 Thread Inada Naoki
On Thu, Jul 2, 2020 at 6:40 PM Chris Angelico  wrote:
>
> True, but "inclusive" isn't just about the people *writing*. If you
> write your comments in French, and someone else uses Turkish, another
> uses Japanese, and still another opts for Hebrew, it becomes nearly
> impossible for anyone to *read* those comments. Standardizing on a
> single language ensures that everyone can read the comments in a
> single, consistent language.
>

Thank you for mentioning Japanese.
I totally agree with you. Readability counts, not writability.

I am not good at English. I can not live in English world. I don't
understand many proverbs
and idioms. I may not be able to buy even food!

But I can read technical documents like RFC. English used in RFC is
very clear to me.
I don't know English in RFC is S English or not. But I believe
English used in RFC is
very inclusive for the engineers in the world.

I don't think I can write such clear English without help. But having
such a goal is inclusive
for non native English readers.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U5DZGTW77BUOL3L6TRQDBBRBPVNZVB6C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Stable ABI question.

2020-07-01 Thread Inada Naoki
Thanks.  I will do it.

On Wed, Jul 1, 2020 at 5:50 PM Serhiy Storchaka  wrote:
>
> 01.07.20 04:35, Inada Naoki пише:
> > Hi, folks.
> >
> > I found PyEval_AcquireLock and PyEval_ReleaseLock are deprecated since
> > Python 3.2.
> > But the same time, stable ABI is defined in Python 3.2 too.
> > The deprecated APIs are stable ABI too because `ceval.h` is not
> > excluded from the stable ABI PEP.
> >
> > As far as my understanding, we can not remove them until Python 4.0. Am I 
> > right?
> >
> > I will add comment like this.
> > /* This is a part of stable ABI. Do not remove until Python 4.0 */
>
> We can remove them from public headers, but should keep their
> implementation and export their names to preserve the binary compatibility.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/255WEQX2JTSNJHHZJ7NYGSAS5WPN7NI5/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JDSJIRJP3ENTIM6AHWKLH255ZPQL2FB3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-07-01 Thread Inada Naoki
On Thu, Jul 2, 2020 at 5:20 AM M.-A. Lemburg  wrote:
>
>
> The reasoning here is the same as for decoding: you have the original
> data you want to process available in some array and want to turn
> this into the Python object.
>
> The path Victor suggested requires always going via a Python Unicode
> object, but that it very expensive and not really an appropriate
> way to address the use case.
>

But current PyUnicode_Encode* APIs does `PyUnicode_FromWideChar`.
It is no direct API already.

Additionally, pyodbc, the only user of the encoder API, did
PyUnicode_EncodeUTF16(PyUnicode_AsUnicode(unicode), ...)
It is very inefficient.  Unicode Object -> Py_UNICODE* -> Unicode
Object -> byte object.

And as many others already said, most C world use UTF-8 for Unicode
representation in C,
not wchar_t.

So I don't want to undeprecate current API.


> As an example application, think of a database module which provides
> the Unicode data as Py_UNICODE buffer.

Py_UNICODE is deprecated.  So I assume you are talking about wchar_t.


> You want to write this as UTF-8
> data to a file or a socket, so you have the PyUnicode_EncodeUTF8() API
> decode this for you into a bytes object which you can then write out
> using the Python C APIs for this.

PyUnicode_FromWideChar + PyUnicode_AsUTF8AndSize is better than
PyUnicode_EncodeUTF8.

PyUnicode_EncodeUTF8 allocate temporary Unicode object anyway. So it needs
to allocate Unicode object *and* char* buffer for UTF-8.
On the other hand, PyUnicode_AsUTF8AndSize can just expose internal
data when it is plain ASCII. Since ASCII string is very common, this
is effective
optimization.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UYOPQDKLSNOVPFGPCR5BIW3GHYB3V3KZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 620: Hide implementation details from the C API

2020-06-30 Thread Inada Naoki
On Tue, Jun 30, 2020 at 6:45 AM Raymond Hettinger
 wrote:
>
>
> > Converting macros to static inline functions should only impact very few
> > C extensions which use macros in unusual ways.
>
> These should be individually verified to make sure they actually get inlined 
> by the compiler.  In https://bugs.python.org/issue39542 about nine PRs were 
> applied without review or discussion.  One of those, 
> https://github.com/python/cpython/pull/18364 , converted PyType_Check() to 
> static inline function but I'm not sure that it actually does get inlined.  
> That may be the reason named tuple attribute access slowed by about 25% 
> between Python 3.8 and Python 3.9.¹  Presumably, that PR also affected every 
> single type check in the entire C codebase and will affect third-party 
> extensions as well.
>

I confirmed the performance regression, although the difference is 12%.
And I find the commit cause the regression.

https://github.com/python/cpython/commit/45ec5b99aefa54552947049086e87ec01bc2fc9a
https://bugs.python.org/issue40170

The regression is not caused by "static inline" function is not
inlined by compiler.
The commit changed PyType_HasFeature to call regular function
PyType_GetFlags always.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FOKJXG2SYMXCHYPGUZWVYMHLDR42BYFB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Stable ABI question.

2020-06-30 Thread Inada Naoki
Hi, folks.

I found PyEval_AcquireLock and PyEval_ReleaseLock are deprecated since
Python 3.2.
But the same time, stable ABI is defined in Python 3.2 too.
The deprecated APIs are stable ABI too because `ceval.h` is not
excluded from the stable ABI PEP.

As far as my understanding, we can not remove them until Python 4.0. Am I right?

I will add comment like this.
/* This is a part of stable ABI. Do not remove until Python 4.0 */

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EJF67ZM2HMLWCVKAYNU4JCATO7CRILOS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-06-29 Thread Inada Naoki
On Mon, Jun 29, 2020 at 6:51 PM Victor Stinner  wrote:
>
>
> I understand that these ".. deprecated" markups will be added to 3.8
> and 3.9 documentation, right?
>

They are documented as "Deprecated since version 3.3, will be removed
in version 4.0" already.
I am proposing s/4.0/3.10/ in 3.8 and 3.9 documents.

> For each function, I would be nice to suggest a replacement function.
> For example, PyUnicode_EncodeMBCS() (Py_UNICODE*) can be replaced with
> PyUnicode_EncodeCodePage() using code_page=CP_ACP (PyObject*).

Of course.

> > ## PyUnicode_EncodeDecimal
> >
> > It is not documented.  It has not been deprecated by Py_DEPRECATED.
> > Plan: Add Py_DEPRECATED in Python 3.9 and remove it in 3.11.
>
> I understood that the replacement function is the private
> _PyUnicode_TransformDecimalAndSpaceToASCII() function. This function
> is used by complex, float and int types to convert a string into a
> number.
>

Should we make it public?

>
> > ## _PyUnicode_ToLowercase, _PyUnicode_ToUppercase
> >
> > They are not deprecated by PEP 393, but bpo-12736.
> >
> > They are documented as deprecated, but don't have ``Py_DEPRECATED``.
> >
> > Plan: Add Py_DEPRECATED in 3.9, and remove them in 3.11.
> >
> > Note: _PyUnicode_ToTitlecase has Py_DEPRECATED. It can be removed in 3.10.
>
> bpo-12736 is "Request for python casemapping functions to use full not
> simple casemaps per Unicode's recommendation". IMHO the replacement
> function is to call lower() and method() of a Python str object.
>

We have private functions; _PyUnicode_ToTitleFull, _PyUnicode_ToLowerFull,
and _PyUnicode_ToUpperFull.
I am not sure we should make them public too.

> If you change the 3.9 documentation, please also update 3.8 doc.
>

I see.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FMO57MPHTGZZULWL4RGEJHER3ZZFCYBO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-06-29 Thread Inada Naoki
Many existing public APIs doesn't have `const char *errors` argument.
As there are very few users, we can ignore that limitation.

On the other hand, some encoding have special options.
* UTF16 and UTF32; `int byteorder` parameter.
* UTF7;  int base64SetO, int base64WhiteSpace

So PyUnicode_AsEncodedString can not replace them.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KLICTHLKXCSHOMG3LEZLFTDMWJIJA2U4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-06-29 Thread Inada Naoki
On Mon, Jun 29, 2020 at 12:17 AM Inada Naoki  wrote:
>
>
> More aggressive idea: override current PyUnicode_EncodeXXX() apis.
> Change from `Py_UNICODE *object` to `PyObject *unicode`.
>

This is a list of PyUnicode_Encode usage in top4000 packages.
https://gist.github.com/methane/0f97391c9dbf5b53a818aa39a8285a29

Scandir use PyUnicode_EncodeMBCS only in `#if PY_MAJOR_VERSION < 3 &&
defined(MS_WINDOWS)` block.
So it is false positive.

Cython has prototype of these APIs.  pyodbc uses PyUnicode_EncodeUTF16
and PyUnicode_EncodeUTF8.
But pyodbc is converting Unicode Object into bytes object.  So current
API is very inefficient.

That's all.
Now I think it is safe to override deprecated APIs with private APIs
accepts Unicode Object.

* _PyUnicode_EncodeUTF7 -> PyUnicode_EncodeUTF7
* _PyUnicode_AsUTF8String -> PyUnicode_EncodeUTF8
* _PyUnicode_EncodeUTF16 -> PyUnicode_EncodeUTF16
* _PyUnicode_EncodeUTF32 -> PyUnicode_EncodeUTF32
* _PyUnicode_AsLatin1String -> PyUnicode_EncodeLatin1
* _PyUnicode_AsASCIIString -> PyUnicode_EncodeASCII
* _PyUnicode_EncodeCharmap -> PyUnicode_EncodeCharmap

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7PC5SYKPZSQQVC6KPOMO6GLGMOXGE76U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-06-28 Thread Inada Naoki
On Sun, Jun 28, 2020 at 11:24 PM Inada Naoki  wrote:
>
>
> So how about making them public, instead of undeprecate Py_UNICODE* encode 
> APIs?
>
> 1. Add PyUnicode_AsXXXBytes public APIs in Python 3.10.
>Current private APIs can become macro (e.g. #define
> _PyUnicode_AsAsciiString PyUnicode_AsAsciiBytes),
>or deprecated static inline function.
> 2. Remove Py_UNICODE* encode APIs in Python 3.12.
>

More aggressive idea: override current PyUnicode_EncodeXXX() apis.
Change from `Py_UNICODE *object` to `PyObject *unicode`.

This idea might look crazy.  But PyUnicode_EncodeXXX APIs are
deprecated for a long time, and there are only a few users.
I grepped from 3874 source packages in top 4000 downloaded packages.
(126 packages are wheel-only)

$ rg -w PyUnicode_EncodeASCII
Cython-0.29.20/Cython/Includes/cpython/unicode.pxd
424:bytes PyUnicode_EncodeASCII(Py_UNICODE *s, Py_ssize_t size,
char *errors)

$ rg -w PyUnicode_EncodeLatin1
Cython-0.29.20/Cython/Includes/cpython/unicode.pxd
406:bytes PyUnicode_EncodeLatin1(Py_UNICODE *s, Py_ssize_t size,
char *errors)

$ rg -w PyUnicode_EncodeUTF7
(no output)

$ rg -w PyUnicode_EncodeUTF8
subprocess32-3.5.4/_posixsubprocess_helpers.c
38:return PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(unicode),

pyodbc-4.0.30/src/params.cpp
1932:bytes = PyUnicode_EncodeUTF8(source, cb, "strict");

pyodbc-4.0.30/src/cnxninfo.cpp
45:Object bytes(PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(p),
PyUnicode_GET_SIZE(p), 0));
50:Object bytes(PyUnicode_Check(p) ?
PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(p), PyUnicode_GET_SIZE(p),
0) : 0);

Cython-0.29.20/Cython/Includes/cpython/unicode.pxd
304:bytes PyUnicode_EncodeUTF8(Py_UNICODE *s, Py_ssize_t size, char *errors)

Note that subprocess32 is Python 2 only project.  Only pyodbc-4.0.30
use this API.
https://github.com/mkleehammer/pyodbc/blob/b4ea03220dd8243e452c91689bef34823b2f7d8f/src/params.cpp#L1926-L1942
https://github.com/mkleehammer/pyodbc/blob/master/src/cnxninfo.cpp#L45

Anyway, current PyUnicode_EncodeXXX APis are not used commonly.
I don't think it's worth enough to undeprecate.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SFZ64X5JIOERQYCGGAD63FLRTJ657WWM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Plan to remove Py_UNICODE APis except PEP 623.

2020-06-28 Thread Inada Naoki
Hi, Lamburg.

Thank you for quick response.

>
> We can't just remove access to one half of a codec (the decoding
> part) without at least providing an alternative for C extensions
> to use.
>
> Py_UNICODE can be removed from the API, but only if there are
> alternative APIs which C extensions can use to the same effect.
>
> Given PEP 393, this would be APIs which use wchar_t instead of
> Py_UNICODE.
>

Decoding part is implemented as `const char *` -> `PyObject*` (Unicode object).
I think this is reasonable since `const char *` is perfect to abstract
the encoded string,

In case of encoding part, `wchar_t *` is not perfect abstraction for
(decoded) unicode
string.  Converting from Unicode object into `wchar_t *` is not zero-cost.
I think `PyObject *` (Unicode object) -> `PyObject *` (bytes object)
looks better signature than
`wchar_t *` -> `Pyobject *` (bytes object) because for encoders.

* Unicode object is more important than `wchar_t *` in Python.
* All PyUnicode_EncodeXXX APIs are implemented with PyUnicode_FromWideChar.

For example, we have these private encode APIs:

* PyObject* _PyUnicode_AsAsciiString(PyObject *unicode, const char *errors)
* PyObject* _PyUnicode_AsLatin1String(PyObject *unicode, const char *errors)
* PyObject* _PyUnicode_AsUTF8String(PyObject *unicode, const char *errors)
* PyObject* _PyUnicode_EncodeUTF16(PyObject *unicode, const char
*errors, int byteorder)
...

So how about making them public, instead of undeprecate Py_UNICODE* encode APIs?

1. Add PyUnicode_AsXXXBytes public APIs in Python 3.10.
   Current private APIs can become macro (e.g. #define
_PyUnicode_AsAsciiString PyUnicode_AsAsciiBytes),
   or deprecated static inline function.
2. Remove Py_UNICODE* encode APIs in Python 3.12.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Z4MT6IJBVWP2QOV3OMVJ32BZ5HLH5DG5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Plan to remove Py_UNICODE APis except PEP 623.

2020-06-27 Thread Inada Naoki
Hi, all.

I proposed PEP 623 to remove Unicode APIs deprecated by PEP 393.

In this thread, I am proposing removal of Py_UNICODE (not Unicode
objects) APIs deprecated by PEP 393.
Please reply for any comments.


## Undocumented, have Py_DEPRECATED

There is no problem to remove them in Python 3.10.  I will just do it.

* Py_UNICODE_str*** functions -- already removed in
https://github.com/python/cpython/pull/21164
* PyUnicode_GetMax()


## Documented and have Py_DEPRECATED

* PyLong_FromUnicode
* PyUnicode_AsUnicodeCopy
* PyUnicode_Encode
* PyUnicode_EncodeUTF7
* PyUnicode_EncodeUTF8
* PyUnicode_EncodeUTF16
* PyUnicode_EncodeUTF32
* PyUnicode_EncodeUnicodeEscape
* PyUnicode_EncodeRawUnicodeEscape
* PyUnicode_EncodeLatin1
* PyUnicode_EncodeASCII
* PyUnicode_EncodeCharmap
* PyUnicode_TranslateCharmap
* PyUnicode_EncodeMBCS

These APIs are documented.  The document has ``.. deprecated:: 3.3
4.0`` directive.
They have been `Py_DEPRECATED` since Python 3.6 too.

Plan: Change the document to ``.. deprecated:: 3.0 3.10`` and remove
them in Python 3.10.


## PyUnicode_EncodeDecimal

It is not documented.  It has not been deprecated by Py_DEPRECATED.

Plan: Add Py_DEPRECATED in Python 3.9 and remove it in 3.11.


## PyUnicode_TransformDecimalToASCII

It is documented, but doesn't have ``deprecated`` directive. It is not
deprecated by Py_DEPRECATED.

Plan: Add Py_DEPRECATED and ``deprecated 3.3 3.11`` directive in 3.9,
and remove it in 3.11.


## _PyUnicode_ToLowercase, _PyUnicode_ToUppercase

They are not deprecated by PEP 393, but bpo-12736.
They are documented as deprecated, but don't have ``Py_DEPRECATED``.

Plan: Add Py_DEPRECATED in 3.9, and remove them in 3.11.

Note: _PyUnicode_ToTitlecase has Py_DEPRECATED. It can be removed in 3.10.


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Draft PEP: Remove wstr from Unicode

2020-06-24 Thread Inada Naoki
On Tue, Jun 23, 2020 at 6:31 PM Victor Stinner  wrote:
>
> Le mar. 23 juin 2020 à 04:02, Inada Naoki  a écrit :
> > Legacy unicode representation is using wstr so legacy unicode support
> > is removed with wstr.
> > PyUnicode_READY() will be no-op when wstr is removed.  We can remove
> > calling of PyUnicode_READY() since then.
> >
> > I think we can deprecate PyUnicode_READY() when wstr is removed.
>
> Would it be possible to rewrite the plan differently (merge
> Specification sections) to list changes per Python version? Something
> like:
>

OK, I rewrite the PEP.
https://github.com/python/peps/pull/1462

>
> Also, some functions are already deprecated. Would you mind to list
> them in the PEP? I fail to track the status of each function.
>

Do you mean APIs relating to Py_UNICODE, but not relating to
wstr nor legacy Unicode? (e.g. PyLong_FromUnicode, PyUnicode_Encode, etc...)

We can remove them one-by-one basis.

* Most APIs can be removed in 3.10.
* Some API can be undeprecated by changing Py_UNICODE to wchar_t.
* Some APIs needs more discussion (e.g. PyUnicodeEncodeError_Create,
PyUnicodeTranslateError_Create).

Since they are independent from wstr and legacy Unicode object,
I don't want to handle them in this PEP.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4OF4RT3JMCJEBPILSGRIES73ILNNLOVV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Draft PEP: Remove wstr from Unicode

2020-06-22 Thread Inada Naoki
On Tue, Jun 23, 2020 at 6:58 AM Victor Stinner  wrote:
>
> Hi INADA-san,
>
> First of all, thanks for writing down a PEP!
>
> Le jeu. 18 juin 2020 à 11:42, Inada Naoki  a écrit :
> > To support legacy Unicode object created by
> > ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
> > ``PyUnicode_READY()`` check.
>
> I don't see PyUnicode_READY() removal in the specification section.
> When can we remove these calls and the function itself?
>

Legacy unicode representation is using wstr so legacy unicode support
is removed with wstr.
PyUnicode_READY() will be no-op when wstr is removed.  We can remove
calling of PyUnicode_READY() since then.

I think we can deprecate PyUnicode_READY() when wstr is removed.

>
> > Support of legacy Unicode object makes Unicode implementation complex.
> > Until we drop legacy Unicode object, it is very hard to try other Unicode
> > implementation like UTF-8 based implementation in PyPy.
>
> I'm not sure if it should be in the scope of the PEP or not, but there
> are also other C API functions which are too close to the PEP 393
> concrete implementation. For example, I'm not sure that
> PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python
> str is reimplemented to use UTF-8 internally. Should we deprecate it
> as well? Do you think that it should be addressed in a separated PEP?
>

I don't like optimizations which is heavily relying on CPython
implementation. But I think it is too early to deprecate it.
We should just recommend UTF-8 based approach.


> In fact, a large part of the Unicode C API is based on the current
> implementation of the Python str type. For example, I'm not sure that
> PyUnicode_New(size, max_char) would still make sense if we change the
> code to store strings as UTF-8 internally.
>
> In an ideal world, I would prefer to have a "string builder" API, like
> the current _PyUnicodeWriter C API, to create a string, and only never
> allow to modify a string in-place.

I completely agree with you.  But current _PyUnicodeWriter is tight
coupled with PEP 393 and it is not UTF-8 based.  I am not sure that
we should make it public and stable from Python 3.10.

I think we should recommend `PyUnicode_FromStringAndSize(utf8, utf8_len)`
for now to avoid too tightly coupled with PEP 393.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3MT3ZHA66PW7K7OLZERTDLFQEDFPYHQI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Draft PEP: Remove wstr from Unicode

2020-06-18 Thread Inada Naoki
PEP: 
Title: Remove wstr from Unicode
Author: Inada Naoki  
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Jun-2020
Python-Version: TBD

Abstract


PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``,
and ``Py_ssize_t wstr_length`` in unicode implementation for backward
compatibility of these deprecated APIs. [1]_

This PEP is planning removal of ``wstr``, and ``wstr_length`` with
deprecated APIs using these members.


Motivation
==

Memory usage


``str`` is one of the most used types in Python.  Even most simple ASCII
strings have a ``wstr`` member.  It consumes 8 bytes on 64bit systems.


Runtime overhead


To support legacy Unicode object created by
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
``PyUnicode_READY()`` check.

When we drop support of legacy unicode object, We can reduce this overhead
too.


Simplicity
--

Support of legacy Unicode object makes Unicode implementation complex.
Until we drop legacy Unicode object, it is very hard to try other Unicode
implementation like UTF-8 based implementation in PyPy.


Specification
=

Affected APIs
--

>From the Unicode implementation, ``wstr`` and ``wstr_length`` members are
removed.

Macros and functions to be removed:

* PyUnicode_GET_SIZE
* PyUnicode_GET_DATA_SIZE
* Py_UNICODE_WSTR_LENGTH
* PyUnicode_AS_UNICODE
* PyUnicode_AS_DATA
* PyUnicode_AsUnicode
* PyUnicode_AsUnicodeAndSize


Behaviors to be removed:

* PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where
  ``size > 0`` cause RuntimeError instead of creating legacy Unicode
  object. While this API is deprecated by PEP 393, this API will be kept
  when ``wstr`` is removed. This API will be removed later.

* PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode,
  ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError
  instead of creating legacy unicode object.

* PyArg_ParseTuple, PyArg_ParseTupleAndKeywords -- 'u', 'u#', 'Z', and
  'Z#' format will be removed.


Deprecation
---

All APIs to be removed should have compiler deprecation warning
(e.g. `Py_DEPRECATED(3.3)`) from Python 3.9. [2]_

All APIs to be changed should raise DeprecationWarning for behavior to be
removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation
warning and runtime DeprecationWarning. [3]_, [4]_.


Plan
-

All deprecations will be implemented in Python 3.10.
Some deprecations will be backported in Python 3.9.

Actual removal will happen in Python 3.12.


Alternative Ideas
=

Advanced Schedule
-

Backport warnings in 3.9, and do the removal in early development phase
in Python 3.11. If many third packages are broken by this change, we will
revert the change and back to the regular schedule.

Pros: There is a chance to remove ``wstr`` in Python 3.11. Even if we need
to revert it, third party maintainers can have more time to prepare the
removal and we can get feedback from the community early.

Cons: Adding warnings in beta period will make some confusion. Note that
we need to avoid the warning from CPython core and stdlib.


Use hashtable to store wstr
---

Store the ``wstr`` in a hashtable, instead of Unicode structure.

Pros: We can save memory usage even from Python 3.10. We can have
more longer timeline to remove the ``wstr``.

Cons: This implementation will increase the complexity of Unicode
implementation.


References
==
A collection of URLs used as references through the PEP.

.. [1] PEP 393 -- Flexible String Representation
   (https://www.python.org/dev/peps/pep-0393/)

.. [2] GH-20878 -- Add Py_DEPRECATED to deprecated unicode APIs
   (https://github.com/python/cpython/pull/20878)

.. [3] GH-20933 -- Raise DeprecationWarning when creating legacy Unicode
   (https://github.com/python/cpython/pull/20933)

.. [4] GH-20927 -- Raise DeprecationWarning for getargs with 'u', 'Z' #20927
   (https://github.com/python/cpython/pull/20927)

Copyright
=

This document has been placed in the public domain.


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-17 Thread Inada Naoki
On Mon, Jun 15, 2020 at 4:25 PM Serhiy Storchaka  wrote:
>
> I have a plan for more graduate removing of this feature. I created a PR
> which adds several compile options, so Python can be built in one of
> three modes:
>
> 1. Support wchar_t* cache and use it. It is the current mode.
>
> 2. Support wchar_t* cache, but do not use it internally in CPython. It
> can be used to test whether getting rid of the wchar_t* cache can have
> negative effects.
>
> 3. Do not support wchar_t* cache. It is binary incompatible build. Its
> purpose is to allow authors of third-party libraries to prepare to
> future breakage.
>
[snip]
> https://github.com/python/cpython/pull/12409

I like your pull request, although I am not sure option 2 is really needed.

With your compile-time option, we can remove wstr in early alpha stage
(e.g. Python 3.11a1), and get it back again if it breaks too many packages.


> The plan is:
>
> 1. Add support of the above compile options. Unfortunately I did not
> have time to do this before feature freeze in 3.9, but maybe make an
> exception?
> 2. Make option 2 default.
> 3. Remove option 1.
> 4. Enable compiler deprecations for all legacy C API. Currently they are
> silenced for the C API used internally.
> 5. Make legacy C API always failing.
> 6. Remove legacy C API from header files.
>
> There is a long way to steps 5 and 6. I think 3.11 is too early.
>

Note that compiler deprecation (4) is approved by Łukasz Langa.
So Python 3.9 will have compiler deprecation.
https://github.com/python/cpython/pull/20878#issuecomment-644830032

On the other hand, PyArg_ParseTuple(AndKeywords) with u/Z format
doesn't have any deprecation yet.  I'm not sure we can backport the
runtime DeprecationWarning to 3.9, because we need to fix Argument
Clinic too.  (Serhiy's pull request fix the Argument Clinic.)

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/76K5ASJTINWIHOB2KASQ5RMGIB6S2OGV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-16 Thread Inada Naoki
On Wed, Jun 17, 2020 at 4:16 AM Steve Dower  wrote:
>
> On 16Jun2020 1641, Inada Naoki wrote:
> > * This change doesn't affect to pure Python packages.
> > * Most of the rest uses Cython.  Since I already report an issue to Cython,
> >regenerating with new Cython release fixes them.
>
> The precedent set in our last release with tp_print was that
> regenerating Cython releases was too much to ask.
>
> Unless we're going to overrule that immediately, we should leave
> everything there and give users/developers a full release cycle with
> updated Cython version to make new releases without causing any breakage.
>

We have one year for 3.10 and two years for 3.11.

Additionally, unlike the case of tp_print, we don't need to wait all
of them are regenerated.
Cython used deprecated APIs in two cases:

* Cython used PyUnicode_FromUnicode(NULL, 0) to create empty string.
Many packages
  are affected.  But we can keep it working when we removed wstr.
  https://github.com/cython/cython/pull/3677

* Cython used PyUnicode_FromUnicode() in very minor cases.  Only few
packages are affected.
  https://github.com/cython/cython/issues/3678

So we need to ask to regenerate with Cython >= 0.9.21 only a few projects.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XGLIVTXIGDEPE5UH32ZGGUCXXQVYXYQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-16 Thread Inada Naoki
On Tue, Jun 16, 2020 at 9:30 PM Victor Stinner  wrote:
>
> Le mar. 16 juin 2020 à 10:42, Inada Naoki  a écrit :
> > Hmm,  Is there any chance to add DeprecationWarning in Python 3.9?
>
> In my experience, more and more projects are running their test suite
> with -Werror, which is a good thing. Introducing a new warning is
> likely to "break" many of these projects. For example, in Fedora, we
> run the test suite when we build a package. If a test fails, the
> package build fails and we have to decide to either ignore the failing
> tests (not good) or find a solution to repair the tests (update the
> code base to new C API functions).
>

But Python 3.9 is still in beta phase and we have enough time to get feedback.
If the new warning is unacceptable breakage, we can remove it in RC phase.

>
> > It is an interesting idea, but I think it is too complex.
> > Fixing all packages in the PyPI would be a better approach.
>
> It's not the first time that we have to take such decision. "Fixing
> all PyPI packages" is not possible. Python core developers are limited
> are so we can only port a very low number of packages. Breaking
> packages on purpose force developers to upgrade their code base, it
> should work better than deprecation warnings. But it is likely to make
> some people unhappy.
>

OK, My terminology was wrong.  Not all, but almost of living packages.

* This change doesn't affect to pure Python packages.
* Most of the rest uses Cython.  Since I already report an issue to Cython,
  regenerating with new Cython release fixes them.
* Most of the rest support PEP 393 already.

So I expect only few percents of active packages will be affected.

This is a list of use of deprecated APIs from the top 4000 packages,
except PyArg_ParseTuple(AndKeywords).
Files generated by Cython are excluded.  But most of them are false
positives yet (e.g. in `#if PY2`).
https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use

I have filed some issues and sent some pull requests already after I
created this thread.

> Having a separated hash table would prevent to break many PyPI
> packages by continuing to provide the backward compatibility. We can
> even consider to disable it by default, but provide a temporary option
> to opt-in for backward compatibility. For example, "python3.10 -X
> unicode_compat".
>
> I proposed sys.set_python_compat_version(version) in the rejected PEP
> 606, but this PEP was too broad:
> https://www.python.org/dev/peps/pep-0606/
>
> The question is if it's worth it to pay the maintenance burden on the
> Python side, or to drop backward compatibility if it's "too
> expensive".
>
> I understood that your first motivation is to reduce PyASCIObject
> structure size. Using a hash table, the overhead would only be paid by
> users of the deprecated functions. But it requires to keep the code
> and so continue to maintain it. Maybe I missed some drawbacks.
>

Memory usage is the most important motivation.  But runtime cost of
PyUnicode_READY and maintenance cost of legacy unicode matters too.

I will reconsider your idea.  But I still feel that helping many third
parties is the most constructive way.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KBBR2KQPNKSPQIPR5UKW2ALM3QGNDBEU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-16 Thread Inada Naoki
On Tue, Jun 16, 2020 at 12:35 AM Victor Stinner  wrote:
>
> Hi INADA-san,
>
> IMO Python 3.11 is too early because we don't emit a
> DeprecationWarning on every single deprecation function.
>
> 1) Emit a DeprecationWarning at runtime (ex: Python 3.10)
> 2) Wait two Python releases: see
> https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421
> 3) Remove the deprecated feature (ex: Python 3.12)
>

Hmm,  Is there any chance to add DeprecationWarning in Python 3.9?

* They are deprecated in document since Python 3.3 (2012)
* As far as grepping PyPI sdist sources, I feel two years may be
enough to remove them.
* We can postpone the schedule anyway.

> I don't understand if *all* deprecated functions are causing
> implementation issues, or only a few of them?

Of course.  I meant only APIs using PyASCIIObject.wstr.
As far as I know,

* PyUnicode_AS_DATA
* PyUnicode_AS_UNICODE
* PyUnicode_AsUnicode
* PyUnicode_AsUnicodeAndSize
* PyUnicode_FromUnicode(NULL, size)
* PyUnicode_FromStringAndSize(NULL, size)
* PyUnicode_GetSize
* PyUnicode_GET_SIZE
* PyUnicode_GET_DATA_SIZE
* PyUnicode_WSTR_LENGTH
* PyArg_ParseTuple, and PyArg_ParseTupleAndTuple with format 'u' or  'Z'.

>
> PyUnicode_AS_UNICODE() initializes PyASCIIObject.wstr if needed, and
> then return PyASCIIObject.wstr. I don't think that PyASCIIObject.wstr
> can be called "a cache": there are functions relying on this member.
>

OK, I will call it wstr, instead of wchar_t* cache.

> On the other hand, PyUnicode_FromUnicode(str, size) is basically a
> wrapper to PyUnicode_FromWideChar(): it doesn't harm to keep this
> wrapper to ease migration. Only PyUnicode_FromUnicode(NULL, size) is
> causing troubles, right?

You're right.

>
> Is there a list of deprecated functions and is it possible to group
> them in two categories: must be removed and "can be kept for a few
> more releases"?
>
> If the intent is to reduce Python memory footprint, PyASCIIObject.wstr
> can be moved out of PyASCIIObject structure, maybe we can imagine a
> WeakDict. It would map a Python str object to its wstr member (wchar_*
> string). If the Python str object is removed, we can release the wstr
> string. The technical problem is that it is not possible to create a
> weak reference to a Python str. We may insert code in
> unicode_dealloc() to delete manually the wstr in this case. Maybe a
> _Py_hashtable_t of pycore_hashtable.h could be used for that.
>

It is an interesting idea, but I think it is too complex.
Fixing all packages in the PyPI would be a better approach.

> Since this discussion is on-going for something like 5 years in
> multiple bugs.python.org issues and email threads, maybe it would help
> to have a short PEP describing issues of the deprecated functions,
> explain the plan to migrate to the new functions, and give a schedule
> of the incompatible changes. INADA-san: would you be a candidate to
> write such PEP?
>

OK, I will try to write it.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XLAXDWG6BQ4GXQKOUCOCSCVSGTAA4GX3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-14 Thread Inada Naoki
On Sat, Jun 13, 2020 at 8:20 PM Inada Naoki  wrote:
>
> 2020年6月13日(土) 20:12 Kyle Stanley :
>>
>> > Additionally, raise DeprecationWarning runtime when these APIs are used.
>>
>> So, just to clarify, current usage of these 7 unicode APIs does not emit any 
>> warnings and would only start doing so in 3.10?
>
> They have been deprecated in C already.  Compiler emits warning.
>
> This additional proposal is adding runtime warning before removal.
>

I'm sorry, I was wrong.  Py_DEPRECATED(3.3) is commented out for some APIs.
So Python 3.8 doesn't show warning for them.
I want to uncomment them in Python 3.9.
https://github.com/python/cpython/pull/20878

As far as I grepped, most of PyPI packages use deprecated APIs because
Cython generates it.
Updating Cython will fix them.
Some of them are straightforward and I have created an issue or sent
pull request already.

A few projects, pyScss and Genshi are not straightforward.  But it is
not too hard and I will help them.

I still think 2 years are enough to removal.

Regards,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4GVB4LK63VRCEEGW7RYRC7TAZKTWSLPH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-13 Thread Inada Naoki
2020年6月13日(土) 20:12 Kyle Stanley :

> > Additionally, raise DeprecationWarning runtime when these APIs are used.
>
> So, just to clarify, current usage of these 7 unicode APIs does not emit
> any warnings and would only start doing so in 3.10?
>

They have been deprecated in C already.  Compiler emits warning.

This additional proposal is adding runtime warning before removal.


In this case, it might be okay to remove in 3.11 since they've been
> deprecated for an exceptionally long period and appear to have a clear
> transition path. But, 3.12 would be safer for removal, and I don't think it
> would be much of an additional burden on our end to keep them around for
> one extra version.
>

I am trying to find and remove use of theses APIs in PyPI packages.
We will postpone the removal if the migration is slow.
But let's set the goal to 3.11 for now.

>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LZ7VFWJCUF3AL4QKCUUOKCCDDX6KEZMY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-13 Thread Inada Naoki
On Fri, Jun 12, 2020 at 5:32 PM Inada Naoki  wrote:
>
>
> My proposal is, schedule the removal on Python 3.11.  But we will postpone
> the removal if we can not remove its usage until it.
>

Additionally, raise DeprecationWarning runtime when these APIs are used.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3L456JBACI2UBGKAXYNVBOG7RVEREXPC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: When can we remove wchar_t* cache from string?

2020-06-13 Thread Inada Naoki
On Sat, Jun 13, 2020 at 1:36 AM MRAB  wrote:
>
> > * Most of them are `PyUnicode_FromUnicode(NULL, 0);`
> >* We may be able to keep PyUnicode_FromUnicode, but raise error when 
> > length>0.
> >
> I think it would be strange to keep PyUnicode_FromUnicode but complain
> unless length == 0. If it's going to be removed, then remove it and
> suggest a replacement for that use-case, such as PyUnicode_FromString
> with a NULL argument. (I'm not sure if PyUnicode_FromString will accept
> NULL, but if it currently doesn't, then maybe it should in future be
> treated as being equivalent to PyUnicode_FromString("").)

Of course, there is an API to create an empty string: PyUnicode_New(0, 0);
But since Cython is using `PyUnicode_FromString(NULL, 0)`,
keep it working for some versions will mitigate the breaking change.
Note that we can remove wchar_t cache while keeping it working.

Anyway, this is an idea for mitigation.  If all of maintained packages fixes it
before Python 3.11, mitigation is not needed.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3OD7WH2QUPF5NBM2S6R3KXCGBWLQEQOS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] When can we remove wchar_t* cache from string?

2020-06-12 Thread Inada Naoki
Hi, all.

Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).

wchar_t* cache in the string object is used only in deprecated APIs.
It waste 1 word (8 bytes on 64bit machine) per string instance.

The deprecated APIs are documented as "Deprecated since version 3.3,
will be removed in version 4.0."
See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis

But when PEP 393 is implemented, no one expects 3.10 will be released.
Can we reschedule the removal?

My proposal is, schedule the removal on Python 3.11.  But we will postpone
the removal if we can not remove its usage until it.

I grepped the use of the deprecated APIs from top 4000 PyPI packages.

result: 
https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use
step: https://github.com/methane/notes/blob/master/2020/wchar-cache/README.md

I noticed:

* Most of them are generated by Cython.
  * I reported it to Cython so Cython 0.29.21 will fix them.  I expect
more than 1 year
between Cython 0.29.21 and Python 3.11rc1.
* Most of them are `PyUnicode_FromUnicode(NULL, 0);`
  * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0.

Regards,

--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

2020-06-09 Thread Inada Naoki
On Tue, Jun 9, 2020 at 10:28 PM Petr Viktorin  wrote:
>
> Relatively recently, there is an effort to expose interpreter creation &
> finalization from Python code, and also to allow communication between
> them (starting with something rudimentary, sharing buffers). There is
> also a push to explore making the GIL per-interpreter, which ties in to
> moving away from process-global state. Both are interesting ideas, but
> (like banishing global state) not the whole motivation for
> changes/additions.
>

Some changes for per interpreter GIL doesn't help sub interpreters so much.
For example, isolating memory allocator including free list and
constants between
sub interpreter makes sub interpreter fatter.
I assume Mark is talking about such changes.

Now Victor proposing move dict free list per interpreter state and the code
looks good to me.  This is a change for per interpreter GIL, but not
for sub interpreters.
https://github.com/python/cpython/pull/20645

Should we commit this change to the master branch?
Or should we create another branch for such changes?

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/L7JRFJLDLO6E4SDXYKDPTEIEDZK2PNR4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-24 Thread Inada Naoki
On Fri, Jan 24, 2020 at 5:24 PM Victor Stinner  wrote:
>
> You're right that it's not only about list.count().
>
> list.count(), list.index(), tuple.count() and tuple.index() all
> consider that two elements are equal if id(x) == id(y):
>

FWIW, (list|tuple).__eq__ and (list|tuple).__contains__ uses it too.
It is very important to compare recursive sequences.

>>> x = []
>>> x.append(x)
>>> y = [x]
>>> z = [x]
>>> y == z
True

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ILPWRGJOB4AFECGQHOH5LIZ7HPMACXX7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Windows - rebuilding grammar files

2020-01-14 Thread Inada Naoki
His patch is merged already.
See https://github.com/python/cpython/pull/12654

On Tue, Jan 14, 2020 at 6:20 PM Abdur-Rahmaan Janhangeer
 wrote:
>
> Greetings list,
>
> On windows i want to play with the grammar file, but according to this 
> article:
>
> > For Windows, there is no officially supported way of running pgen. However, 
> > you can clone my fork and run build.bat --regen from within the PCBuild 
> > directory.
>
> But i don't want to work from his fork. Is there an official way of 
> regenerating
> the files? I just want to have my own keywords. As far as i've understood, 
> rebuilding won't do it.
>
> Yours,
>
> Abdur-Rahmaan Janhangeer
> Mauritius
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/IBK2T4YR5JARRJOCNHRLI3JN2Z737JXH/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QOSN5633JFMPMY2JGJ3Y5YQHBZ54OLRG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a scarier warning to object.__del__?

2020-01-03 Thread Inada Naoki
I don't want to recommend atexit or weakref because they have their pitfall.

I think the context manager should be considered at first when people want
to use __del__.

On Wed, Jan 1, 2020 at 9:35 AM Yonatan Zunger  wrote:
>
> Hey everyone,
>
> I just encountered yet another reason to beware of __del__: when it's called 
> during interpreter shutdown, for reasons which are kind of obvious in 
> retrospect, if it calls notify() on a threading.Condition, the waiting thread 
> may or may not ever actually receive it, and so if it does that and then 
> tries to join() the thread the interpreter may hang in a hard-to-debug way.
>
> This isn't something that can reasonably be fixed, and (like in most cases 
> involving __del__) there's a very simple fix of using weakref.finalize 
> instead. My question for the dev list: How would people feel about changing 
> the documentation for the method to more bluntly warn people against using 
> it, and refer them to weakref.finalize and/or atexit.register as an 
> alternative? The text already has an undertone of "lasciate ogni speranza, 
> voi ch'entrate" but it may be helpful to be more explicit to avoid people 
> getting caught in unexpected pitfalls.
>
> Yonatan
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/AAZQBWD6PHC4PVNCCPX4A2745SS7B3LS/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RSG6DHNG7ZTJCNYSAMVOFFQSD7HFSGKE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-20 Thread Inada Naoki
On Sat, Dec 21, 2019 at 3:17 AM Tim Peters  wrote:
>
> [Wes Turner ]
> >> How slow and space-inefficient would it be to just implement the set 
> >> methods
> >> on top of dict?
>
> [Inada Naoki ]
> > Speed:  Dict doesn't cache the position of the first item.  Calling
> > next(iter(D)) repeatedly is O(N) in worst case.
> > ...
>
> See also Raymond's (only) message in this thread.  We would also lose
> low-level speed optimizations specific to the current set
> implementation.

I read this thread, and I understand all of them.

I just meant the performance of the next(iter(D)) is the most critical part
when you implement orderdset on top of the current dict and use it as a queue.

This code should be O(N), but it's O(N^2) if q is implemented on top
of the dict.

   while q:
   item = q.popleft()

Sorry for the confusion.


>
> And the current set implementation (like the older dict
> implementation) never needs to "rebuild the table from scratch" unless
> the cardinality of the set keeps growing.

It is a bit misleading.  If "the cardinality of the set" means len(S),
set requires the rebuild in low frequency if the its items are random.

Anyway, it is a smaller problem than next(iter(D)) because of the
amortized cost is O(1).
Current dict is not optimized for queue usage.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DFJQOW225OSRPFDHBJ3UBYRRMZ52AXDH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-19 Thread Inada Naoki
On Fri, Dec 20, 2019 at 4:15 PM Wes Turner  wrote:
>
> How slow and space-inefficient would it be to just implement the set methods 
> on top of dict?

Speed:  Dict doesn't cache the position of the first item.  Calling
next(iter(D)) repeatedly is O(N) in worst case.
Space:  It waste 8bytes per member.

>
> Do dicts lose insertion order when a key is deleted? AFAIU, OrderedDict do 
> not lose insertion order on delete.

Dict keeps insertion order after deletion too.

>Would this limit the utility of an ordered set as a queue? What set methods 
>does a queue need to have?

I want O(1) D.popleft(). (The name is borrowed from deque.  popfirst()
would be better maybe).

--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/T4DT5P7CZE7A4J6XNOT6QGC5UDS4INTR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Inada Naoki
On Mon, Dec 16, 2019 at 6:25 PM Inada Naoki  wrote:
>
> +1 for 1 and 2.
>

If we find it broke some software, we can step back to regular
deprecation workflow.
Python 3.9 is still far from beta yet.  That's why I'm +1 on these proposals.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HWNLBBHSVB5NRQC6ESQQNCQQ2EYUMW27/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

2019-12-16 Thread Inada Naoki
On Sun, Dec 15, 2019 at 11:07 PM Serhiy Storchaka  wrote:
>
> I propose several changes:
>
> 1. Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.
>
> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only. Originally this feature was an implementation artifact:
> before 3.6 parameters of a C implemented function should be either all
> positional-only (if used PyArg_ParseTuple), or all keyword (if used
> PyArg_ParseTupleAndKeywords). So str(), bytes() and bytearray() accepted
> the first parameter by keyword. We already made similar changes for
> int(), float(), etc: int(x=42) no longer works.
>
> Unlikely str(object=object) is used in a real code, so we can skip a
> deprecation period for this change too.
>

+1 for 1 and 2.

> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.

-0.

We can omit `encoding="utf-8"` in bytes.decode() because the default
encoding is always UTF-8.

>>> x = "おはよう".encode()
>>> x.decode(errors="strict")
'おはよう'

So allowing `bytes(o, errors="replace")` instead of making encoding
mandatory also makes sense to me.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MDOU2IZ5YTCRS7VMR6DPHSQGSKGKDBFZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-15 Thread Inada Naoki
On Mon, Dec 16, 2019 at 1:33 PM Guido van Rossum  wrote:
>
> Actually, for dicts the implementation came first.
>

I had tried to implement the Ordered Set.  Here is the implementation.
https://github.com/methane/cpython/pull/23

Regards,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SGDD47GTMS7OGIEZTLLXEYHABL5OS4EN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deprecating the "u" string literal prefix

2019-12-04 Thread Inada Naoki
>
> Is this a serious question? Many things were removed in moving from
> Python 2 to Python 3.

We remove garbages not only between 2 and 3.
We regularly remove garbages.

https://docs.python.org/3/whatsnew/3.8.html#api-and-feature-removals
https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-removals
https://docs.python.org/3/whatsnew/3.6.html#removed
https://docs.python.org/3/whatsnew/3.5.html#removed

We need to avoid major breakage.
But we accept small breakages on every minor release.
And u-prefix is major for now.


> Maybe I missed something. Python used to pride itself on keeping old
> code working. When hash randomization was introduced, it was decided to
> be off by default in Python 2 because even though people shouldn't have
> counted on the order of dicts, they were counting on it, and turning on
> hash randomization would break code. So there we decided to keep things
> insecure because otherwise code would break, even "wrong" code.

It is not a good example because we didn't release Python 2.8.
Hash randomization might be enabled by default if we released Python 2.8.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WTEWGFFMJV7A6M5XEE7DFDF2QIQNVFUR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deprecating the "u" string literal prefix

2019-12-03 Thread Inada Naoki
On Wed, Dec 4, 2019 at 11:49 AM Ned Batchelder  wrote:
>
> On 12/3/19 8:13 PM, Inada Naoki wrote:
> > I think it is too early to determine when to remove it.
> > Even only talking about it causes blaming war.
>
> Has anyone yet given a reason to remove it?

Note that "never" is included in ”when".
I didn't promoting removing it.  I just said let's stop discussion about it.


> It will change working code
> into broken code.  Why do that?

Of course, everyone in this thread understands it.
No one proposes remove it now.

On the other hand, we remove some parts from Python language
and the stdlib regularly to keep Python clean.
All removal will break some code.  That's why we have a deprecation period.

Currently, u-prefix is very widely used.  It shouldn't be removed anytime soon.
And I agree that we shouldn't raise DeprecationWarning for now.

But how about 5, 10, and 20 years later?  No one knows it.
Let's stop discussing it.  It can not be productive.

Instead, we can do:

* Don't recommend u-prefix except in Python 2&3 code.
* Provide a tool to remove the u-prefix.

Regards,
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EVKCCO5KMOGEEFMSSY2PZRVGT2LDOB5K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deprecating the "u" string literal prefix

2019-12-03 Thread Inada Naoki
I think it is too early to determine when to remove it.
Even only talking about it causes blaming war.

BTW, I think 2to3 can help to move from 2&3 code to 3-only code.

* "future" fixer can be remove legacy futures.  But it seems to remove
all futures,
  including "annotations".  It should be kept.
* "unicode" fixer can be used to remove u-prefix.  But I'm not sure yet.

Are there any other things which are used for writing 2&3 code and
will be removed
someday in the future?
And is there any tool that can convert code using the "six" library to
normal Python 3 code?

Regards,

On Wed, Dec 4, 2019 at 2:29 AM Serhiy Storchaka  wrote:
>
> The 'u" string literal prefix was removed in 3.0 and reintroduced in 3.3
> to help writing the code compatible with Python 2 and 3 [1]. After the
> dead of Python 2.7 we will remove some deprecated features kept for
> compatibility with 2.7. When we are going to deprecate and remove the
> "u" prefix?
>
> [1] https://www.python.org/dev/peps/pep-0414/
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/EMFDKQ57JVWUZ6TPZM5VTFW7EUKVYAOY/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FDA2UHFY4PBZBWCUKU5HKM73KDKB7FT6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: small improvement idea for the CSV module

2019-10-31 Thread Inada Naoki
On Wed, Oct 30, 2019 at 11:55 PM Oz Tiram  wrote:
>
> Hi Steve,
>
> Thanks for your reply. While dataclass provide a cleaner API than DictRow 
> (you can access `row.id` instead of `row["id"]`).
> However, dataclass still use the built in `__dict__` instead of  `__slots__`.
>
> This means that the users reading large files won't see the suggested memory 
> improvements.
>

FWIW, there is memory improvements thanks to the Key-sharing dictionary.
See PEP 412 [1].
I have an idea about utilizing Key-sharing dictionary in DictReader, but I have
not implemented it yet.

[1]: https://www.python.org/dev/peps/pep-0412/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R4XMTCZKTJG32HJYTIZO7XJQMJBJMWQ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: How official binaries are built?

2019-10-17 Thread Inada Naoki
Thank you for your response.
And I'm sorry about ignoring this. Gmail marked it as spam.

On Tue, Oct 15, 2019 at 6:20 PM Ned Deily  wrote:
>
> We currently do not use those options to build the binaries for the 
> python.org macOS installers.  The main reason is that the Pythons we provide 
> are built to support a wide-range of macOS releases and to do so safely we 
> build the binaries on the oldest version of macOS supported by that 
> installer.  So, for example, the 10.9+ installer variant is built on a 10.9 
> system.  Some of the optimization features either aren't available or are 
> less robust on older build tools.

It makes sense.

>   And I believe it is more important for the python.org macOS installers to 
> continue to provide a single installer that is usable on many systems and can 
> be used in a broad range of applications and by a broad range of users rather 
> than trying to optimize performance for a specific application: you can 
> always build your own Python.
>
> As far as what other distributors of Python for macOS do, what we do 
> shouldn't necessarily constrain them.  I don't see any problem with Homebrew 
> optimizing for a particular user's installation.  I see that MacPorts, 
> another distributor of Python on macOS, provides a non-default variant that 
> uses --enable-optimizations.
>
> https://github.com/macports/macports-ports/blob/master/lang/python37/Portfile
>
> --
>   Ned Deily
>   n...@python.org -- []
>


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GN7AYLJGZKS3HEINE5AU7WMK3RHAYBHN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: How official binaries are built?

2019-10-15 Thread Inada Naoki
On Tue, Oct 15, 2019 at 10:57 PM Victor Stinner  wrote:
>
> Hi Inada-san,
>
> You can query the sysconfig module to check how Python has been built.

Thank you for pointing it out.  It seems official macOS binary doesn't use
--enable-optimizations and --with-lto options...

Python 3.8.0 (v3.8.0:fa919fdf25, Oct 14 2019, 10:23:27)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> sysconfig.get_config_var('PY_CFLAGS')
'-Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common
-dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g'
>>> sysconfig.get_config_var('PY_CFLAGS_NODIST')
'-std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter
-Wno-missing-field-initializers -Wstrict-prototypes
-Werror=implicit-function-declaration
-I/Users/sysadmin/build/v3.8.0/Include/internal'
>>> sysconfig.get_config_var('PY_LDFLAGS_NODIST')
''
>>> sysconfig.get_config_var('PY_LDFLAGS')
'-arch x86_64 -g'



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VUA2AG6J2QG2GK52DZ4VKZMRUTKJ4DWO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] How official binaries are built?

2019-10-15 Thread Inada Naoki
Hi, all.

I want Homebrew uses `--enable-optimizations` and `--with-lto` option
for building Python.  But maintainer said:

> Given this is not a default option, probably not, unless it is done in 
> upstream (“official”) binaries.

https://github.com/Homebrew/homebrew-core/pull/45337

Are these options used for official macOS binaries?
Is there official information about the build step of official binaries?

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EILILECNTLTW4VCBCPW37R4QRU7ZBDEU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is "%zd" format is portable now?

2019-08-05 Thread Inada Naoki
Thank you for confining it!

Currently, Python defines PY_FORMAT_SIZE_T as:

#ifndef PY_FORMAT_SIZE_T
# if SIZEOF_SIZE_T == SIZEOF_INT && !defined(__APPLE__)
# define PY_FORMAT_SIZE_T ""
# elif SIZEOF_SIZE_T == SIZEOF_LONG
# define PY_FORMAT_SIZE_T "l"
# elif defined(MS_WINDOWS)
# define PY_FORMAT_SIZE_T "I"
# else
# error "This platform's pyconfig.h needs to define PY_FORMAT_SIZE_T"
# endif
#endif

https://github.com/python/cpython/blob/1213123005d9f94bb5027c0a5256ea4d3e97b61d/Include/pyport.h#L158-L168

This can be changed to this:

#ifndef PY_FORMAT_SIZE_T
/* "z" is defined C99 and portable enough.  We can use "%zd" instead of
   "%" PY_FORMAT_SIZE_T "d" for now.
*/
# define PY_FORMAT_SIZE_T "z"
#endif


On Mon, Aug 5, 2019 at 6:05 PM Michael  wrote:
>
> OK - a too simple program - run on AIX 5.3 - seems to indicate that XLC
> (version 11, so quite old!) accepts "%zd".
>
> If someone would be kind enough to mail me a better example of what
> needs to be verfied - I am happy to compile and publish the results.
>
> root@x065:[/data/prj/aixtools/tests]cat printf-test.c
> #include 
>
> main()
> {
> printf("Hello World - testing for %%zd support\n");
> printf("%%zd of 1001 == %zd\n", 1001);
> printf("\nTest complete\n");
> }
>
> oot@x065:[/data/prj/aixtools/tests]./a.out
> Hello World - testing for %zd support
> %zd of 1001 == 1001
>
> Test complete
>
> fyi: AIX 4.3
> https://sites.ualberta.ca/dept/chemeng/AIX-43/share/man/info/C/a_doc_lib/libs/basetrf2/snprintf.htm
>
> and AIX 7.2
> https://www.ibm.com/support/knowledgecenter/ssw_aix_72/p_bostechref/printf.html
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UIIERFFTJIVQMOOZTZDLAILAAXCU7XSK/


[Python-Dev] Re: Is "%zd" format is portable now?

2019-08-01 Thread Inada Naoki
On Thu, Aug 1, 2019 at 10:21 PM Victor Stinner  wrote:
>
> Hi INADA-san,
>
> Is it supported on macOS, FreeBSD, AIX, Android, etc.?
>
> My notes on platforms supported by Python:
> https://pythondev.readthedocs.io/platforms.html
>
> For example, xlc C compiler seems to be commonly used on AIX. I don't
> know how is its C99 support.
>
> Can we write an unit test somewhere to ensure that %zd works as expected?
>
> Victor
>

I don't know about AIX too.  I googled, but I can not find even man manual for
snprintf(3) on AIX.

I'm frustrated I wasted a few hours to read some PDFs and searching
but I can not
find any official document about snprintf(3).
I feel it's impossible to support such platforms...

Except AIX, I believe all platforms supports size_t and %zd because
it's very basic
C99 feature.

Regards,
--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O7H4FBLDQBHSKGSEJQ2TU7IRNKUAPJDV/


[Python-Dev] Is "%zd" format is portable now?

2019-08-01 Thread Inada Naoki
Hi,

snprintf() in VC 2010 had not supported "%zd" format.
So we can not use it in PyOS_snprintf, PySys_WriteStdout, etc...

But VC supports "%zd" format since (at least) VC 2015.
See 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions?view=vs-2015

"%zd" is standard C99 feature.  Can we assume it is portable for now?

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CAXKWESUIWJNJFLLXXWTQDUWTN3F7KOU/


[Python-Dev] Re: Comparing dict.values()

2019-07-25 Thread Inada Naoki
On Fri, Jul 26, 2019 at 2:03 PM Random832  wrote:
>
>
> Items also sometimes contains unhashable types, and some methods simply fail 
> in that case. I suggest that this precedent provides a way forward - 
> implement the entire intuitive "contains the same amount of each value" 
> algorithm [more or less Counter(obj1) == Counter(obj2)], and have this fail 
> naturally, throwing e.g. an exception "TypeError: unhashable type: 'list'" if 
> any of the values are unhashable in the same way that trying to perform 
> certain set operations on an items view does.

-1.  What is the motivation of this?
In this case, I don't think "I found missing parts so I want to
implement it for consistency"
is not enough reason to implement it.

I want a real-world application which requires it.
Without a strong use case, I think the discussion is just wasting time.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CZ3K6BZ3AAWABO4456YP6BYFMSXZVHAM/


[Python-Dev] Re: Long-term deprecation policy

2019-07-17 Thread Inada Naoki
On Tue, Jul 16, 2019 at 11:07 PM Jeroen Demeyer  wrote:
>
> On 2019-07-16 15:33, Inada Naoki wrote:
> >> We currently have a deprecation policy saying that functions deprecated
> >> in version N cannot be removed before version N+2. That's a reasonable
> >> policy but some deprecation purists insist that it MUST (instead of MAY)
> >> be removed in version N+2. Following this reasoning, we cannot deprecate
> >> something that we cannot remove.
> >
> > Really?  Any example?
>
> * https://bugs.python.org/issue29548#msg287775
>

OK, Stable ABI was special.  We can not remove it until Python 4
regarding to PEP 384.
In 2017, Python 4 was looked like "forever".  But for now, we should
think Python 4 is
the real future in 2020s, not 3000 or 4000.

I think we should deprecate APIs in stable API like non-stable API if
we want to remove
it after Python 4.0.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WR4SE2GFZHW2CFD6C7Z4BI7A2FWK5SMQ/


[Python-Dev] Re: Long-term deprecation policy

2019-07-16 Thread Inada Naoki
On Tue, Jul 16, 2019 at 6:46 PM Jeroen Demeyer  wrote:
>
> I have seen multiple discussions where somebody wants to deprecate a
> useless function but somebody else complains that we cannot do that
> because the function in question cannot be removed (because of backwards
> compatibility). See https://bugs.python.org/issue29548 for an example.
>

FWIW, we didn't have deprecated macro in 2017.
Now we have it and I'm +1 to deprecate it.

Especially, I want to force Py_SSIZE_T_CLEAN support (remove int support for #)
in early 2020s (3.10, or 3.11).

But PyEval_CallFunction and PyEval_CallMethod doesn't respect Py_SSIZE_T_CLEAN.
We need breaking behavior change for them.  And we raise runtime deprecation
warning already.
I think we should add compile time warning too, regardless # is used or not.


> We currently have a deprecation policy saying that functions deprecated
> in version N cannot be removed before version N+2. That's a reasonable
> policy but some deprecation purists insist that it MUST (instead of MAY)
> be removed in version N+2. Following this reasoning, we cannot deprecate
> something that we cannot remove.

Really?  Any example?


>
> Personally, I think that this reasoning is flawed: even if we cannot
> *remove* a function, we can still *deprecate* it.

I totally agree with you.  Nothing wrong about long deprecation period.

> That way, we send a
> message that the function shouldn't be used anymore. And it makes it
> easier to remove it in the (far) future: if the function was deprecated
> for a while, we have a valid reason to remove it. The longer it was
> deprecated, the less likely it is to be still used, which makes it
> easier to remove eventually.
>
> So I suggest to embrace such long-term deprecations, where we deprecate
> something without planning in advance when it will be removed. This is
> actually how most other open source projects that I know handle
> deprecations.
>
> I'd like to know the opinion of the Python core devs here.
>

FWIW, there is PendingDeprecationWarning for something discouraged
but not deprecated, and will be deprecated in the future.
But I prefer simple deprecation.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4KDQ3QNO4YL6FWX6Q6YQVKPINJ7QK2DG/


[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Inada Naoki
> Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 182 ms +-
> 4 ms: 1.10x faster (-9%)
...
> I will try to split pymalloc_alloc and pymalloc_free to smaller functions.

I did it and pymalloc is now as fast as mimalloc.

$ ./python bm_spectral_norm.py --compare-to=./python-master
python-master: . 199 ms +- 1 ms
python: . 176 ms +- 1 ms

Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 176 ms +-
1 ms: 1.13x faster (-11%)

I filed an new issue for this: https://bugs.python.org/issue37543
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HKV6TQAHHLLLK4JS5F5JQ26MGWPLOD2M/


[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Inada Naoki
On Wed, Jul 10, 2019 at 5:18 PM Neil Schemenauer  wrote:
>
> On 2019-07-09, Inada Naoki wrote:
> > PyObject_Malloc inlines pymalloc_alloc, and PyObject_Free inlines 
> > pymalloc_free.
> > But compiler doesn't know which is the hot part in pymalloc_alloc and
> > pymalloc_free.
>
> Hello Inada,
>
> I don't see this on my PC.  I'm using GCC 8.3.0.  I have configured
> the build with --enable-optimizations.

I didn't use PGO and that's why GCC didn't know which part is hot.
Maybe, pymalloc performance is similar to mimalloc when PGO is used,
but I had not confirmed it.

While Linux distributions are using PGO, some people use non-PGO Python
(Homebrew, pyenv, etc...).  So better performance without PGO is worth.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LKU5FDWGWHHEBUMTNZ5ME23RC73B5JIF/


[Python-Dev] Optimizing pymalloc (was obmalloc

2019-07-09 Thread Inada Naoki
On Tue, Jul 9, 2019 at 5:29 PM Inada Naoki  wrote:
>
> On Tue, Jul 9, 2019 at 9:46 AM Tim Peters  wrote:
> >
> >> I was more intrigued by your first (speed) comparison:
> >
> > > - spectral_norm: 202 ms +- 5 ms -> 176 ms +- 3 ms: 1.15x faster (-13%)
> >
> > Now _that's_ interesting ;-)  Looks like spectral_norm recycles many
> > short-lived Python floats at a swift pace.  So memory management
> > should account for a large part of its runtime (the arithmetic it does
> > is cheap in comparison), and obmalloc and mimalloc should both excel
> > at recycling mountains of small objects.  Why is mimalloc
> > significantly faster?
>
> Totally agree.  I'll investigate this next.
>

I compared "perf" output of mimalloc and pymalloc, and I succeeded to
optimize pymalloc!

$ ./python bm_spectral_norm.py --compare-to ./python-master
python-master: . 199 ms +- 1 ms
python: . 182 ms +- 4 ms

Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 182 ms +-
4 ms: 1.10x faster (-9%)

mimalloc uses many small static (inline) functions.
On the other hand, pymalloc_alloc and pymalloc_free is large function
containing slow/rare path.

PyObject_Malloc inlines pymalloc_alloc, and PyObject_Free inlines pymalloc_free.
But compiler doesn't know which is the hot part in pymalloc_alloc and
pymalloc_free.
So gcc failed to chose code to inline.  Remaining part of
pymalloc_alloc and pymalloc_free
are called and many push/pop are executed because they contains complex logic.

So I tried to use LIKELY/UNLIKELY macro to teach compiler hot part.
But I need to use
"static inline" for pymalloc_alloc and pymalloc_free yet [1].
Generated assembly is optimized
well, the hot code is in top of the PyObject_Malloc [2] and PyObject_Free [3].
But there are many code duplication in PyObject_Malloc and
PyObject_Calloc, etc...

[1] https://github.com/python/cpython/pull/14674/files
[2] 
https://gist.github.com/methane/ab8e71c00423a776cb5819fa1e4f871f#file-obmalloc-s-L232-L274
[3] 
https://gist.github.com/methane/ab8e71c00423a776cb5819fa1e4f871f#file-obmalloc-s-L2-L32

I will try to split pymalloc_alloc and pymalloc_free to smaller functions.

Except above, there is one more important difference.

pymalloc return free pool to freepool list soon when pool become empty.
On the other hand, mimalloc return "page" (it's similar to "pool" in pymalloc)
not everytime when it's empty [4].  So they can avoid rebuilding linked list of
free blocks.
I think pymalloc should do same optimization.

[4] 
https://github.com/microsoft/mimalloc/blob/1125271c2756ee1db1303918816fea35e08b3405/src/page.c#L365-L375

BTW, which is proper name? pymalloc, or obmalloc.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YWNWHGTJUMZ4D34DPRFXECF7O7GRJK2M/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-07-09 Thread Inada Naoki
On Tue, Jul 9, 2019 at 9:46 AM Tim Peters  wrote:
>
> >  At last, all size classes has1~3 used/cached memory blocks.
>
> No doubt part of it, but hard to believe it's most of it.  If the loop
> count above really is 10240, then there's only about 80K worth of
> pointers in the final `buf`.

You are right.  List.append is not the major part of memory consumer
of "large" class (8KiB+1 ~ 512KiB).   They are several causes of large
size alloc:

* bm_logging uses StringIO.seek(0); StringIO.truncate() to reset buffer.
  So internal buffer of StringIO become Py_UCS4 array instead of a list
  of strings from the 2nd loop.  This buffer is using same policy to list
  for increase capacity.  `size + size >> 8 + (size < 9 ? 3 : 6)`.
  Actually, when I use `-n 1` option, memory usage is only 9MiB.
* The intern dict.
* Many modules are loaded, and FileIO.readall() is used to read pyc files.
  This creates and deletes various size of bytes objects.
* logging module uses several regular expressions.  `b'\0' * 0xff00` is
  used in sre_compile.
  https://github.com/python/cpython/blob/master/Lib/sre_compile.py#L320


>
> But does it really matter? ;-)  mimalloc "should have" done MADV_FREE
> on the pages holding the older `buf` instances, so it's not like the
> app is demanding to hold on to the RAM (albeit that it may well show
> up in the app's RSS unless/until the OS takes the RAM away).
>

mimalloc doesn't call madvice for each free().  Each size classes
keeps a 64KiB "page".
And several pages (4KiB) in the "page" are committed but not used.

I dumped all "mimalloc page" stat.
https://paper.dropbox.com/doc/mimalloc-on-CPython--Agg3g6XhoX77KLLmN43V48cfAg-fFyIm8P9aJpymKQN0scpp#:uid=671467140288877659659079=memory-usage-of-logging_format

For example:

bin block_size   used capacity reserved
 29   2560  1   22   25 (14 pages are committed, 2560
bytes are in use)
 29   2560 14   25   25 (16 pages are committed,
2560*14 bytes are in use)
 29   2560 11   25   25
 31   3584  15   18 (5 pages are committed, 3584
bytes are in use)
 33   5120  14   12
 33   5120  2   12   12
 33   5120  2   12   12
 37  10240  3   11  409
 41  20480  16  204
 57 327680  12   12

* committed pages can be calculated by `ceil(block_size * capacity /
4096)` roughly.

There are dozen of unused memory block and committed pages in each size classes.
This caused 10MiB+ memory usage overhead on logging_format and logging_simple
benchmarks.


>> I was more intrigued by your first (speed) comparison:
>
> > - spectral_norm: 202 ms +- 5 ms -> 176 ms +- 3 ms: 1.15x faster (-13%)
>
> Now _that's_ interesting ;-)  Looks like spectral_norm recycles many
> short-lived Python floats at a swift pace.  So memory management
> should account for a large part of its runtime (the arithmetic it does
> is cheap in comparison), and obmalloc and mimalloc should both excel
> at recycling mountains of small objects.  Why is mimalloc
> significantly faster?
[snip]
>  obmalloc's `address_in_range()` is definitely a major overhead in its
> fastest `free()` path, but then mimalloc has to figure out which
> thread is doing the freeing (looks cheaper than address_in_range, but
> not free).  Perhaps the layers of indirection that have been wrapped
> around obmalloc over the years are to blame?  Perhaps mimalloc's
> larger (16x) pools and arenas let it stay in its fastest paths more
> often?  I don't know why, but it would be interesting to find out :-)

Totally agree.  I'll investigate this next.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MXEE2NOEDAP72RFVTC7H4GJSE2CHP3SX/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-07-08 Thread Inada Naoki
On Thu, Jul 4, 2019 at 11:32 PM Inada Naoki  wrote:
>
> On Thu, Jul 4, 2019 at 8:09 PM Antoine Pitrou  wrote:
> >
> > Ah, interesting.  Were you able to measure the memory footprint as well?
> >
>
> Hmm, it is not good.  mimalloc uses MADV_FREE so it may affect to some
> benchmarks.  I will look it later.
>
> ```
> $ ./python  -m pyperf compare_to pymalloc-mem.json mimalloc-mem.json -G
> Slower (60):
> - logging_format: 10.6 MB +- 384.2 kB -> 27.2 MB +- 21.3 kB: 2.58x
> slower (+158%)
> - logging_simple: 10028.4 kB +- 371.2 kB -> 22.2 MB +- 24.9 kB: 2.27x
> slower (+127%)

I think I understand why the mimalloc uses more than twice memory than the
pymalloc + glibc malloc in logging_format and logging_simple benchmarks.

These two benchmarks does like this:

buf = [] # in StringIO
for _ in range(10*1024):
buf.append("important: some important information to be logged")
s = "".join(buf)  # StringIO.getvalue()
s.splitlines()

mimalloc uses size segregated allocator for ~512KiB.  And size class
is determined
top three bits.
On the other hand, list increases capacity by 9/8.  It means next size
class is used
on each realloc.  At last, all size classes has1~3 used/cached memory blocks.

This is almost worst case for mimalloc.  In more complex application,
there may be
more chance to reuse memory blocks.

In complex or huge application, this overhead will become relatively small.
It's speed is attractive.

But for memory efficiency, pymalloc + jemalloc / tcmalloc may be better for
common cases.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ORVLH5FAEO7LVE7SK44TQR6XK4YRRZ7L/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-07-04 Thread Inada Naoki
I found calibrated loop count is not stable so memory usage is very different
in some benchmarks.
Especially, RAM usage of logging benchmark is very relating to loop count:

$ PYTHONMALLOC=malloc LD_PRELOAD=$HOME/local/lib/libmimalloc.so
./python bm_logging.py simple --track-memory --fast --inherit-environ
PYTHONMALLOC,LD_PRELOAD -v
Run 1: calibrate the number of loops: 512
- calibrate 1: 12.7 MB (loops: 512)
Calibration: 1 warmup, 512 loops
Run 2: 0 warmups, 1 value, 512 loops
- value 1: 12.9 MB
Run 3: 0 warmups, 1 value, 512 loops
- value 1: 12.9 MB
...

$ PYTHONMALLOC=malloc LD_PRELOAD=$HOME/local/lib/libmimalloc.so
./python bm_logging.py simple --track-memory --fast --inherit-environ
PYTHONMALLOC,LD_PRELOAD -v -l1024
Run 1: 0 warmups, 1 value, 1024 loops
- value 1: 21.4 MB
Run 2: 0 warmups, 1 value, 1024 loops
- value 1: 21.4 MB
Run 3: 0 warmups, 1 value, 1024 loops
- value 1: 21.4 MB
...

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QBXLRFXDD5TLLDATV2PWE2QNLLDWRVXY/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-07-04 Thread Inada Naoki
r_contest: 9296.0 kB +- 11.9 kB -> 9996.8 kB +- 20.7 kB: 1.08x
slower (+8%)
- sympy_sum: 62.2 MB +- 20.8 kB -> 66.5 MB +- 21.1 kB: 1.07x slower (+7%)
- python_startup: 7946.6 kB +- 20.4 kB -> 8210.2 kB +- 16.6 kB: 1.03x
slower (+3%)
- python_startup_no_site: 7409.0 kB +- 18.3 kB -> 7574.6 kB +- 21.8
kB: 1.02x slower (+2%)
```

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YPMZIKREWIV7SNFIUI7U6AFXVA2T6CL2/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-07-04 Thread Inada Naoki
On Tue, Jun 25, 2019 at 5:49 AM Antoine Pitrou  wrote:
>
>
> For the record, there's another contender in the allocator
> competition now:
> https://github.com/microsoft/mimalloc/
>
> Regards
>
> Antoine.

It's a very strong competitor!

$ ./python -m pyperf compare_to pymalloc.json mimalloc.json -G --min-speed=3
Faster (14):
- spectral_norm: 202 ms +- 5 ms -> 176 ms +- 3 ms: 1.15x faster (-13%)
- unpickle: 19.7 us +- 1.9 us -> 17.6 us +- 1.3 us: 1.12x faster (-11%)
- json_dumps: 17.1 ms +- 0.2 ms -> 15.7 ms +- 0.2 ms: 1.09x faster (-8%)
- json_loads: 39.0 us +- 2.6 us -> 36.2 us +- 1.1 us: 1.08x faster (-7%)
- crypto_pyaes: 162 ms +- 1 ms -> 150 ms +- 1 ms: 1.08x faster (-7%)
- regex_effbot: 3.62 ms +- 0.04 ms -> 3.38 ms +- 0.01 ms: 1.07x faster (-7%)
- pickle_pure_python: 689 us +- 53 us -> 650 us +- 5 us: 1.06x faster (-6%)
- scimark_fft: 502 ms +- 2 ms -> 478 ms +- 2 ms: 1.05x faster (-5%)
- float: 156 ms +- 2 ms -> 149 ms +- 1 ms: 1.05x faster (-5%)
- pathlib: 29.0 ms +- 0.5 ms -> 27.7 ms +- 0.4 ms: 1.05x faster (-4%)
- mako: 22.4 ms +- 0.1 ms -> 21.6 ms +- 0.2 ms: 1.04x faster (-4%)
- scimark_sparse_mat_mult: 6.21 ms +- 0.04 ms -> 5.99 ms +- 0.08 ms:
1.04x faster (-3%)
- xml_etree_parse: 179 ms +- 5 ms -> 173 ms +- 3 ms: 1.04x faster (-3%)
- sqlalchemy_imperative: 42.0 ms +- 0.9 ms -> 40.7 ms +- 0.9 ms: 1.03x
faster (-3%)

Benchmark hidden because not significant (46): ...
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CTIOESA4NQSWSXH5SZ5D6D7YITDGK33S/


[Python-Dev] Re: PyAPI_FUNC() is needed to private APIs?

2019-07-01 Thread Inada Naoki
On Sun, Jun 30, 2019 at 12:26 AM Nick Coghlan  wrote:
>
>
> Hence Jeroen's point: if something is useful enough for Cython to want to use 
> it, it makes to provide a public API for it that hides any internal 
> implementation details that may not remain stable across releases.
>

I wanted to discuss about only when PyAPI_FUNC() is needed,
not about which function should be public.

But FYI, we have moved _PyObject_GetMethod to private to
cpython API already.

> We don't expect most Cython code to be regenerated for different versions - 
> we only expect it to be recompiled, as with any other extension.
>

We don't make some unstable API public to avoid breaking packages.
But it seems Cython choose performance over stable source code.
It seems "regenerate source code when it broken" is Cython policy.
It is out of our control.

For example, FastCall APIs were not public because we don't think it's
not stable yet and it can be changed in future versions.  But Cython used it.
Luckily, it isn't broken so much.  But it is just lucky.

I hope Cython provides option to produce more stable source code for projects
distributing generated C source code instead of a binary wheel
or Cython source code.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FJ5TDFJLWZ4SRB7IYFUBKTYBI525NUIK/


[Python-Dev] Re: _Py_Identifier should support non-ASCII string?

2019-06-21 Thread Inada Naoki
OK.  I start optimizing PyUnicode_GetString() already.

It was 2x slower than _PyUnicode_FromASCII.
But it can be only 1.5x slower than _PyUnicode_FromASCII.

And as a bonus,  `b"foo".decode()` become 10% faster too.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3HL4M5MLA2KUIZCV6AFFXL67ZKMDSXTV/


[Python-Dev] Re: _Py_Identifier should support non-ASCII string?

2019-06-20 Thread Inada Naoki
On Fri, Jun 21, 2019 at 6:55 AM David Mertz  wrote:
>
> This change would break two lovely functions I wrote this week.
>

Are your lovely functions in CPython, or in your project?

_Py_Identifier is still private and undocumented.
But I expect someone start using it outside of CPython.
So I wanted to discuss we really need to support non-ASCII before that.

But it seems I was late already.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MBUULS7DR2RUH26TIWAYP67YQNQJGKH2/


[Python-Dev] Re: _Py_Identifier should support non-ASCII string?

2019-06-20 Thread Inada Naoki
On Fri, Jun 21, 2019 at 1:28 AM Victor Stinner  wrote:
>
> Le jeu. 20 juin 2019 à 11:15, Inada Naoki  a écrit :
> > Can we change _PyUnicode_FromId to use _PyUnicode_FromASCII?
>
> How would a developer detect a mistake (non-ASCII) character? Does
> _PyUnicode_FromASCII() raise an exception, even in release mode?

No.  That's one of the reasons why _PyUnicode_FromASCII is much faster
than PyUnicode_FromString().

>
> The function is only called once (that's the whole purpose of the
> Py_IDENTIFER() API. Is it really worth it?
>

Maybe no, at least for now.
I just want to ask is it really worth it to support non-ASCII identifier,
before we make it public or 3rd parties start using it widely.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GCC5YDIPAIUT67EVKLGJ3INA24FRHNR5/


[Python-Dev] Re: _Py_Identifier should support non-ASCII string?

2019-06-20 Thread Inada Naoki
On Fri, Jun 21, 2019 at 1:23 AM Steve Dower  wrote:
>
> What benefit would this provide?

It is faster, of course.

The whole benefit will be not significant for now.
But I'm thinking of making _Py_Identifier public (CPython API) in the future.
If we make it public, breaking change is hard after that.

So I want to confirm it's intended that _Py_Identifier support non-ASCII.
If it's not intended, more strict equals to more possibility of
optimization in the future.


> And why is a non-ASCII identifier not
> practical?
>

Hm, I might wrong about the nuance of the word "practical".
I meant it's very uncommon.   It's because:

* While we allow non-ASCII identifier in Python, we don't use them in core.
* Most usage of _Py_Identifier creates C variable named like C PyId_foo.
* There is _Py_static_string which doesn't create variable named like PyId_foo,
  but all usage in CPython uses ASCII only, and _Py_Identifier is still private.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PTJEUFDKEIUSUFUCEQWWVRCERVQENQFA/


[Python-Dev] _Py_Identifier should support non-ASCII string?

2019-06-20 Thread Inada Naoki
Hi, all.

Both of PyUnicdoe_FromString() and _Py_Identifier are
supporting UTF-8 for now.  We can not change PyUnicode_FromString()
for backward compatibility,
so we have _PyUnicode_FromASCII instead.

But _Py_Identifier is still private. And non-ASCII identifier
is not practical.
Can we change _PyUnicode_FromId to use _PyUnicode_FromASCII?

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CHFLUCGEWBGXMBRX2E6OSE3BZ3ZGUNKS/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-17 Thread Inada Naoki
On Mon, Jun 17, 2019 at 6:14 PM Antoine Pitrou  wrote:
> But it's not enabled by default...  And there are reasons for that (see
> the manpage I quoted).

Uh, then if people want to use huge page, they need to enable it on system wide,
or add madvice in obmalloc.c.

> > In web applications, it's common to one Python worker process
> > use 50~100+ MiB RSS.  2MB arena seems reasonable for those
> > applications.
>
> Perhaps, but what is the problem you are trying to solve?  Do you have
> evidence that memory management of those 50-100 MB is costly?
>

I just meant we may be able to utilize THP if we provide large arena option.
In other words, *if* we provide configure option to increase arena & pool size,
2MB arena seems reasonable to me.  That's all I wanted to say here.

I didn't mean utilizing THP is the main motivation to increase arena size.
People who want to use huge page may have a problem to solve by huge page.
But I don't have it.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S3GELVBIRQS5LOA72OIPPUNPNZGEUV2J/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-17 Thread Inada Naoki
On Mon, Jun 17, 2019 at 5:18 PM Antoine Pitrou  wrote:
>
> On Mon, 17 Jun 2019 11:15:29 +0900
> Inada Naoki  wrote:
> >
> > Increasing pool size is one obvious way to fix these problems.
> > I think 16KiB pool size and 2MiB (huge page size of x86) arena size is
> > a sweet spot for recent web servers (typically, about 32 threads, and
> > 64GiB), but there is no evidence about it.
>
> Note that the OS won't give a huge page automatically, because memory
> management becomes much more inflexible then.

When we used contiguous 2MB pages, Linux may use Huge Page
implicitly when Transparent Huge Page is enabled.

Then, if we munmap one page of the 2MB, the kernel will split
the huge page into small many pages again.
I don't know it really happens on Python applications,
but using 2MB arena will reduce the risk of performance penalty
of page splitting.

In web applications, it's common to one Python worker process
use 50~100+ MiB RSS.  2MB arena seems reasonable for those
applications.

I am not proposing making this default.
I just meant this may be a reasonable configuration for many Python
users who create medium~large size web apps.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VGXTBA6JBYOMXVTFVZ7NEINLPBQ4ZYFT/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-16 Thread Inada Naoki
obmalloc is very nice at allocating small (~224 bytes) memory blocks.
But it seems current SMALL_REQUEST_THRESHOLD (512) is too large to me.

```
>>> pool_size = 4096 - 48  # 48 is pool header size
>>> for bs in range(16, 513, 16):
... n,r = pool_size//bs, pool_size%bs + 48
... print(bs, n, r, 100*r/4096)
...
16 253 48 1.171875
32 126 64 1.5625
48 84 64 1.5625
64 63 64 1.5625
80 50 96 2.34375
96 42 64 1.5625
112 36 64 1.5625
128 31 128 3.125
144 28 64 1.5625
160 25 96 2.34375
176 23 48 1.171875
192 21 64 1.5625
208 19 144 3.515625
224 18 64 1.5625
240 16 256 6.25
256 15 256 6.25
272 14 288 7.03125
288 14 64 1.5625
304 13 144 3.515625
320 12 256 6.25
336 12 64 1.5625
352 11 224 5.46875
368 11 48 1.171875
384 10 256 6.25
400 10 96 2.34375
416 9 352 8.59375
432 9 208 5.078125
448 9 64 1.5625
464 8 384 9.375
480 8 256 6.25
496 8 128 3.125
512 7 512 12.5
```

There are two problems here.

First, pool overhead is at most about 3.5 % until 224 bytes.
But it becomes 6.25% at 240 bytes, 8.6% at 416 bytes, 9.4% at 464
bytes, and 12.5% at 512 bytes.

Second, some size classes have the same number of memory blocks.
Class 272 and 286 have 14 blocks.  320 and 336 have 12 blocks.
It reduces utilization of pools.  This problem becomes bigger on 32bit platform.

Increasing pool size is one obvious way to fix these problems.
I think 16KiB pool size and 2MiB (huge page size of x86) arena size is
a sweet spot for recent web servers (typically, about 32 threads, and
64GiB), but there is no evidence about it.
We need a reference application and scenario to benchmark.
pyperformance is not good for measuring memory usage of complex
applications.

```
>>> header_size = 48
>>> pool_size = 16*1024
>>> for bs in range(16, 513, 16):
... n = (pool_size - header_size) // bs
... r = (pool_size - header_size) % bs + header_size
... print(bs, n, r, 100 * r / pool_size)
...
16 1021 48 0.29296875
32 510 64 0.390625
48 340 64 0.390625
64 255 64 0.390625
80 204 64 0.390625
96 170 64 0.390625
112 145 144 0.87890625
128 127 128 0.78125
144 113 112 0.68359375
160 102 64 0.390625
176 92 192 1.171875
192 85 64 0.390625
208 78 160 0.9765625
224 72 256 1.5625
240 68 64 0.390625
256 63 256 1.5625
272 60 64 0.390625
288 56 256 1.5625
304 53 272 1.66015625
320 51 64 0.390625
336 48 256 1.5625
352 46 192 1.171875
368 44 192 1.171875
384 42 256 1.5625
400 40 384 2.34375
416 39 160 0.9765625
432 37 400 2.44140625
448 36 256 1.5625
464 35 144 0.87890625
480 34 64 0.390625
496 32 512 3.125
512 31 512 3.125
```

Another way to fix these problems is shrinking SMALL_REQUEST_THRESHOLD
to 256 and believe malloc works well for medium size memory blocks.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AG6UUPKFXYOTZALFV7XD7EUV62SHOI3P/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Inada Naoki
103 ms  | 104 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
>
> Not significant (34): 2to3; chameleon; chaos; deltablue;
> django_template; dulwich_log; fannkuch; float; go; html5lib;
> logging_format; logging_silent; logging_simple; mako; nbody;
> pathlib; pickle; pidigits; python_startup; raytrace; regex_compile;
> regex_effbot; scimark_lu; scimark_monte_carlo; scimark_sor;
> sqlalchemy_declarative; sqlite_synth; sympy_integrate; sympy_sum;
> sympy_str; telco; unpack_sequence; unpickle; xml_etree_iterparse
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ZDFEYEC4P5JVRKJL5NIFJ7PZYJYZ3IMR/



-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HHIGGO3Q44CM2AICKNP677B5MWOA2BIG/


[Python-Dev] Re: PyAPI_FUNC() is needed to private APIs?

2019-06-13 Thread Inada Naoki
On Fri, Jun 14, 2019 at 12:29 AM Jeroen Demeyer  wrote:
>
>
> I think that the opposite is true actually: the reason that people
> access internals is because there is no public API doing what they want.
> Having more public API should *reduce* the need for accessing internals.
>
> For example, _PyObject_GetMethod is not public API but it's useful
> functionality. So Cython is forced to reinvent _PyObject_GetMethod (i.e.
> copy verbatim that function from the CPython sources), which requires
> accessing internals.

This became off topic... but.

In case of _PyObject_GetMethod, I agree that it means we don't provide
*some* useful API.  But it doesn't mean _PyObject_GetMethod is the missing
useful API.

We don't provide method calling API which uses optimization same to
LOAD_METHOD.   Which may be like this:

/* methname is Unicode, nargs > 0, and args[0] is self. */
PyObject_VectorCallMethod(PyObject *methname, PyObject **args,
Py_ssize_t nargs, PyObject *kwds)

(Would you try adding this?  Or may I?)

Anyway, do you think _PyObject_GetMethod is useful even after we provide
PyObject_VectorCallMethod ?

I'd like to put _PyObject_GetMethod  in private/ (with PyAPI_FUNC) in 3.9,
for users like Cython.

If you want to make it public in 3.9, please create new thread.
Let's discuss about how it is useful, and current name and signature
are good enough to make it public.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y5AG7N7YKO3GQ6H223JTIBMERGKSV7W5/


<    1   2   3   4   5   6   >