[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-04-03 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 4216dce04b7d3f329beaaafc82a77c4ac6cf4d57 by Inada Naoki in branch 
'main':
bpo-47000: Make `io.text_encoding()` respects UTF-8 mode (GH-32003)
https://github.com/python/cpython/commit/4216dce04b7d3f329beaaafc82a77c4ac6cf4d57


--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-30 Thread Inada Naoki


Inada Naoki  added the comment:

> Please see https://bugs.python.org/issue47000#msg415769 for what Victor
> suggested.

Of course, I read it.

> In particular, the locale module uses the "no underscore" convention.
> Not sure whether it's good to start using snake case now, but I'm also
> not against it.

Victor didn't mention about "no underscore" convention.
I just want to see preference from others. I will remove the underscore.

> I would like to reiterate my concern with the "locale" encoding, though.
>
> As mentioned earlier, I believe it adds too much magic. It would be better
> to leave this in the hands of the applications and not try to guess
> the correct encoding.

I don't recommend to use "locale" encoding for users.
I strongly recommend to consider using "utf-8" instead.
But "locale" encoding is needed when user don't want to change behavior of 
current application.
It had been accepted by PEP 597 already.

> It's better to expose easy to use APIs to access the various different
> settings and point users to those rather than try to do a best effort
> guess... explicit is better than implicit.

In some case, user need to decide "not change the encoding for now".
If we don't provide "locale", it's difficult to change the default encoding to 
UTF-8.

> After all, Mojibake potentially corrupts important data, without the
> alerting the user and that's not really what we should be after (e.g.
> UTF-8 is valid Latin-1 in most cases and this is a real problem we often
> run into in Germany with our Umlauts).

Changing the default encoding will temporary increase this risk.
But after changing the default encoding to UTF-8, this risk will be reduced 
overwhelmingly.
Most popular text editors, including VSCode, Atom, Sublime Text, Notepad.exe 
use UTF-8 by default.

--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-30 Thread Inada Naoki


Inada Naoki  added the comment:

@vstiner Since UTF-8 mode affects `locale.getpreferredencoding(False)`, I need 
to decide alternative API in the PEP 686.

If no objections, I will choose `locale.get_encoding()` for current locale 
encoding (ACP on Windows).

See https://github.com/python/peps/pull/2470/files

--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-24 Thread Inada Naoki


Inada Naoki  added the comment:

OK. Cache efficiency is dropped from motivations list.
Current motivations are:

* Memory saving (currently, 4 BytesObject (= 32 bytes of ob_shash) per code 
object.
* Make bytes objects immutable
  * Share objects among multi interpreters.
  * CoW efficiency.

I close this issue for now, because this issue is just for making direct access 
of ob_shash deprecated.

After Python 3.12 become beta, we will reconsider about we should remove 
ob_shash or keep it.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-24 Thread Inada Naoki

Inada Naoki  added the comment:

> I guess not much difference in benchmarks.
> But if put a bytes object into multiple dicts/sets, and len(bytes_key) is 
> large, it will take a long time. (1 GiB 0.40 seconds on i5-11500 DDR4-3200)
> The length of bytes can be arbitrary,so computing time may be very different.

I don't think calculating hash() for large bytes is not so common use case.
Rare use cases may not justify adding 8bytes to basic types, especially users 
expect it is compact.

Balance is important. Microbenchmark for specific case doesn't guarantee the 
good balance.
So I want real world examples. Do you know some popular libraries that are 
depending on hash(bytes) performance?


> Is it possible to let code objects use other types? In addition to ob_hash, 
> maybe the extra byte \x00 at the end can be saved.

Of course, it is possible. But it needs large refactoring around code, 
including pyc cache file format.
I will try it before 3.13.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-23 Thread Inada Naoki


Inada Naoki  added the comment:

First of all, this is just deprecating direct access of `ob_shash`. This makes 
users need to use `PyObject_Hash()`.
We don't make the final decision about removing it. We just make we can remove 
it in Python 3.13.

RAM and CACHE efficiency is not the only motivation for this.
There is a discussion about (1) increasing CoW efficiency, and (2) sharing data 
between subinterpreters after per-interpreter GIL.
Removing ob_shash will help them, especially about the (2).

But if we stop using bytes objects in code objects by Python 3.13, there is no 
need to remove ob_shash.


> If put a bytes object into multiple dicts/sets, the hash need to be computed 
> multiple times. This seems a common usage.

Doesn't it lose only some milliseconds?
I posted remove-bytes-hash.patch in this issue. Would you measure how this 
affects whole application performance rather than micro benchmarks?

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-23 Thread Inada Naoki


Inada Naoki  added the comment:

I am not sure about we really need "locale encoding at Python startup".

For this issue, I don't want to change `encoding="locale"` behavior except 
ignore UTF-8 mode. So what I want is "current locale encoding" or 
 ANSI codepage on Windows.

On the other hand, I know Eryk wants to support locale on Windows. So 
`locale.get_encoding()` might return current locale encoding (not ANSI 
codepage) even on Windows.
If so, I will use `sys.getlocaleencoding()` to implement `encoding="locale"` to 
keep using ANSI codepage, instead of adding yet another "get locale encoding" 
function.

--
nosy: +eryksun

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-23 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 894d0ea5afa822c23286e9e68ed80bb1122b402d by Inada Naoki in branch 
'main':
bpo-46864: Suppress deprecation warnings for ob_shash. (GH-32042)
https://github.com/python/cpython/commit/894d0ea5afa822c23286e9e68ed80bb1122b402d


--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Inada Naoki


Inada Naoki  added the comment:

Average RAM capacity doesn't grow as CPU cores grows.
Additionally, L1+L2 cache is really limited resource compared to CPU or RAM.

Bytes object is used for co_code that is hot. So cache efficiency is important.

Would you give us more realistic (or real world) example for caching bytes hash 
is important?

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +30157
pull_request: https://github.com/python/cpython/pull/32068

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread Inada Naoki


Inada Naoki  added the comment:

> * sys.getfilesystemencoding(): Python filesystem encoding, return "UTF-8" if 
> the Python UTF-8 Mode is enabled

Yes, althoguh PYTHONLEGACYWINDOWSFSENCODING takes priority.


> * locale.getencoding(): Get the locale encoding, LC_CTYPE locale encoding or 
> the Windows ANSI code page, *read at Python startup*. Ignore the Python UTF-8 
> Mode.

I proposed `locale.get_encoding()` in the PEP 686. I will remove underscore if 
you don't like it.


> * locale.getencoding(current=True): Get the *current* locale encoding. The 
> difference with locale.getencoding() is that on Unix, it gets the LC_CTYPE 
> locale encoding at each call.

Hmm, I don't add it to the PEP 686 because it is not relating to UTF-8 mode nor 
EncodingWarning.

Since `locale.getencoding()` returns locale encoding on startup, how about this 
idea?

* sys.getlocaleencoding() -- Get the locale encoding read at Python startup.
* locale.getencoding() -- Get the current locale encoding.

Note that we have `sys.getdefaultencoding()` and `sys.getfilesystemencoding()`. 
`sys.getlocaleencoding()` looks consistent with them.

--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Inada Naoki


Inada Naoki  added the comment:

Since the hash is randomized, using hash(bytes) for such use case is not 
recommended. User should use stable hash functions instead.

I agree that there is few use cases this change cause performance regression. 
But it is really few compared to overhead of adding 8bytes for all bytes 
instances.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-21 Thread Inada Naoki


Inada Naoki  added the comment:

Since Python 3.13, yes. It will be bit slower.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-21 Thread Inada Naoki


Inada Naoki  added the comment:

I'm sorry. Maybe, ccache hides the warning from me.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-21 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +30132
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/32042

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46126] Unittest output drives developers to avoid docstrings

2022-03-20 Thread Inada Naoki


Inada Naoki  added the comment:

> As you can see, the location of the failing test in the log is masked, and 
> instead the description is present.

Could you elaborate?

```
test_index_empty (idlelib.idle_test.test_text.MockTextTest)
Failing test with bad description. ... ERROR
(snip)
==
ERROR: test_index_empty (idlelib.idle_test.test_text.MockTextTest)
Failing test with bad description.
--
```

I can see `test_index_empty (idlelib.idle_test.test_text.MockTextTest)` in both 
places. What is masked?

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46126>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-19 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +30091
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32003

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47009] Streamline list.append for the common case

2022-03-15 Thread Inada Naoki


Inada Naoki  added the comment:

Thank you. I agree that inlining is worth enough.

But we already inlined too many functions in ceval and there is an issue caused 
by it... (bpo-45116)

--

___
Python tracker 
<https://bugs.python.org/issue47009>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-15 Thread Inada Naoki


Inada Naoki  added the comment:

I created another topic relating this issue.
https://discuss.python.org/t/add-legacy-text-encoding-option-to-make-utf-8-default/14281

If we add another option (e.g. legacy_text_encoding), we do not need to change 
UTF-8 mode behavior.

--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35228] Index search in CHM help crashes viewer

2022-03-14 Thread Inada Naoki


Inada Naoki  added the comment:

I know chm is handy. But Microsoft abandoned it already.
I think we should stop providing chm.

--

___
Python tracker 
<https://bugs.python.org/issue35228>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47009] Streamline list.append for the common case

2022-03-14 Thread Inada Naoki


Inada Naoki  added the comment:

Hmm. Would you measure benefit from inlining and skipping incref/decref 
separately?

If benefit of inlining is very small, making _PyList_AppendTakeRef() as regular 
internal API looks better to me.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue47009>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-13 Thread Inada Naoki


Inada Naoki  added the comment:

I created a related topic on discuss.python.org.
https://discuss.python.org/t/jep-400-utf-8-by-default-and-future-of-python/14246

If we recommend `PYTHONUTF8` as opt-in "UTF-8 by default", `encoding="locale"` 
should locale encoding in UTF-8 mode.

If we don't change `PYTHONUTF8` behavior, we need yet another option for opt-in 
"UTF-8 by default".

--

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39829] __len__ called twice in the list() constructor

2022-03-13 Thread Inada Naoki


Inada Naoki  added the comment:

Thanks.

--

___
Python tracker 
<https://bugs.python.org/issue39829>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39829] __len__ called twice in the list() constructor

2022-03-13 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39829>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39829] __len__ called twice in the list() constructor

2022-03-13 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 2153daf0a02a598ed5df93f2f224c1ab2a2cca0d by Crowthebird in branch 
'main':
bpo-39829: Fix `__len__()` is called twice in list() constructor (GH-31816)
https://github.com/python/cpython/commit/2153daf0a02a598ed5df93f2f224c1ab2a2cca0d


--

___
Python tracker 
<https://bugs.python.org/issue39829>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-12 Thread Inada Naoki


New submission from Inada Naoki :

Currently, `encoding="locale"` is just shortcut of 
`encoding=locale.getpreferredencoding(False)`.

`encoding="locale"` means that "locale encoding should be used here, even if 
Python default encoding is changed to UTF-8".

I am not sure that UTF-8 mode becomes the default or not.
But some user want to use UTF-8 mode to change default encoding in their Python 
environments without waiting Python default encoding changed.

So I think `encoding="locale"` should use real locale encoding (ACP on Windows) 
regardless UTF-8 mode is enabled or not.

Currently, UTF-8 mode affects to `_Py_GetLocaleEncoding()`. So it is difficult 
that make encoding="locale" ignores UTF-8 mode.
Is it safe to use `locale.getlocale(locale.LC_CTYPE)[1] or "UTF-8"`?

--
components: Unicode
messages: 415028
nosy: ezio.melotti, methane, vstinner
priority: normal
severity: normal
status: open
title: Make encoding="locale" uses locale encoding even in UTF-8 mode is 
enabled.
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47000>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39829] __len__ called twice in the list() constructor

2022-03-10 Thread Inada Naoki


Inada Naoki  added the comment:

> Changes compared here: 
> https://github.com/python/cpython/compare/main...thatbirdguythatuknownot:patch-17


Looks good to me. Would you create a pull request?

--

___
Python tracker 
<https://bugs.python.org/issue39829>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43574] Regression in overallocation for literal list initialization in v3.9+

2022-03-07 Thread Inada Naoki


Inada Naoki  added the comment:

Relating issue: https://twitter.com/nedbat/status/1489233208713437190
Current overallocation strategy is rough. We need to make it more smooth.

--
versions: +Python 3.11 -Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43574>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43574] Regression in overallocation for literal list initialization in v3.9+

2022-03-07 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue43574>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39829] __len__ called twice in the list() constructor

2022-03-07 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue39829>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46925] Document dict behavior when setting equal but not identical key

2022-03-06 Thread Inada Naoki


Inada Naoki  added the comment:

I don't know much about Java, but Java's WeakHashMap is same to Python's 
WeakKeyDictionary.

https://docs.oracle.com/javase/9/docs/api/java/util/WeakHashMap.html

"""
This class is intended primarily for use with key objects whose equals methods 
test for object identity using the == operator. Once such a key is discarded it 
can never be recreated, so it is impossible to do a lookup of that key in a 
WeakHashMap at some later time and be surprised that its entry has been 
removed. This class will work perfectly well with key objects whose equals 
methods are not based upon object identity, such as String instances. With such 
recreatable key objects, however, the automatic removal of WeakHashMap entries 
whose keys have been discarded may prove to be confusing.
"""

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23882] unittest discovery doesn't detect namespace packages when given no parameters

2022-03-06 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue23882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-05 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 2d8b764210c8de10893665aaeec8277b687975cd by Inada Naoki in branch 
'main':
bpo-46864: Deprecate PyBytesObject.ob_shash. (GH-31598)
https://github.com/python/cpython/commit/2d8b764210c8de10893665aaeec8277b687975cd


--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-05 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.

2022-03-03 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46906>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.

2022-03-03 Thread Inada Naoki


Inada Naoki  added the comment:

OK. By quick grepping, I found only msgpack and bitstruct use these API.
It is not enough number to make them public.

--

___
Python tracker 
<https://bugs.python.org/issue46906>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"

2022-03-02 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 4f74052b455a54ac736f38973693aeea2ec14116 by Inada Naoki in branch 
'main':
bpo-40116: dict: Add regression test for iteration order. (GH-31550)
https://github.com/python/cpython/commit/4f74052b455a54ac736f38973693aeea2ec14116


--

___
Python tracker 
<https://bugs.python.org/issue40116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.

2022-03-02 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29769
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31649

___
Python tracker 
<https://bugs.python.org/issue46906>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46903] Crash when setting attribute with string subclass as the name (--with-pydebug)

2022-03-02 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46906] Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.

2022-03-02 Thread Inada Naoki


New submission from Inada Naoki :

Original issue. https://github.com/msgpack/msgpack-python/issues/497

_PyFloat_(Pack|Unpack)(4|8) is very nice API for serializers like msgpack.
Converting double and float into char[] is not trivial and these APIs do it in 
very efficient way.

And these APIs don't reveal CPython internal strucutre. It just convert double 
and float into char[].

So please keep these APIs public for libraries like msgpack.

--
components: C API
messages: 414401
nosy: methane, vstinner
priority: normal
severity: normal
status: open
title: Make _PyFloat_(Pack|Unpack)(4|8) cpython API, not internal.
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46906>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-03-01 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-03-01 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c by Inada Naoki in branch 
'main':
bpo-46845: Reduce dict size when all keys are Unicode (GH-31564)
https://github.com/python/cpython/commit/9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c


--

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45373] ./configure --enable-optimizations should enable LTO

2022-03-01 Thread Inada Naoki


Inada Naoki  added the comment:

Can we use --lto=thin when availabe?
And can we not use --lto when building profiling python?

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue45373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

When removed shash:

```
## small key
$ ./python -m pyperf timeit --compare-to ../cpython/python -s 'd={b"foo":1, 
b"bar":2, b"buzz":3}' -- 'b"key" in d'
/home/inada-n/work/python/cpython/python: . 23.2 ns +- 1.7 
ns
/home/inada-n/work/python/remove-bytes-hash/python: . 40.0 
ns +- 1.5 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 23.2 ns +- 1.7 ns 
-> [/home/inada-n/work/python/remove-bytes-hash/python] 40.0 ns +- 1.5 ns: 
1.73x slower

## large key
$ ./python -m pyperf timeit --compare-to ../cpython/python -s 'd={b"foo":1, 
b"bar":2, b"buzz":3};k=b"key"*100' -- 'k in d'
/home/inada-n/work/python/cpython/python: . 22.3 ns +- 1.2 
ns
/home/inada-n/work/python/remove-bytes-hash/python: . 108 
ns +- 2 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 22.3 ns +- 1.2 ns 
-> [/home/inada-n/work/python/remove-bytes-hash/python] 108 ns +- 2 ns: 4.84x 
slower
```


I will reconsider the removal before remove the cache.
We changed code object too often. If Python 3.13 don't use so much bytes 
objects, we don't need to remove the hash to save some RAM.

--
Added file: https://bugs.python.org/file50649/remove-bytes-hash.patch

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

I added _PyDict_FromItems() to the PR.
It checks that all keys are Unicode or not before creating dict.
_PyDict_NewPresized() just returns general-purpose dict. But it isn't used from 
CPython core. It is just kept for compatibility (for Cython).

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{"k1":1, "k2":2, 
"k3":3, "k4":4, "k5":5, "k6":6}'
/home/inada-n/work/python/cpython/python: . 198 ns +- 5 ns
/home/inada-n/work/python/dict-compact/python: . 213 ns +- 
6 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 198 ns +- 5 ns -> 
[/home/inada-n/work/python/dict-compact/python] 213 ns +- 6 ns: 1.07x slower
```

Overhead of checking keys types is not so large.
Additionally, we can reduce some code from ceval.c.

--

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

> But some programs can still work with encoded bytes instead of strings. In 
> particular os.environ and os.environb are implemented as dict of bytes on 
> non-Windows.

This change doesn't affect to os.environ.

os.environ[key] does `key.encode(sys.getfilesystemencoding(), 
"surrogateescape")` internally. So the encoded key doesn't have cached hash.
On the other hand, dict (`self._data`) has own hash cache. So it don't use hash 
cached in the bytes objects.

On the other hand, this change will affect `os.environb[key]` if key is used 
repeatedly.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

In most case, first PyDict_SetItem decides which format should be used.

But _PyDict_NewPresized() can be a problem. It creates a hash table before 
inserting the first key, when 5 < (expected size) < 87382.

In CPython code base, _PyDict_NewPresized() is called from three places:

1. call.c: Building kwargs dict -- all key should be Unicode.
2. ceval.c: BUILD_MAP and BUILD_CONST_KEY_MAP -- there is no guarantee that all 
keys are Unicode.


Current pull request assumes the dict keys are unicode-only key. So building 
dict from non-Unicode keys become slower.

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: . 328 ns +- 
6 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 328 ns +- 6 ns: 1.41x slower
```

There are some approaches to fix this problem:

1. Don't use _PyDict_NewPresized() in BUILD_MAP, BUILD_CONST_KEY_MAP

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: . 276 ns +- 
1 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 276 ns +- 1 ns: 1.18x slower
```

I think this performance regression is acceptable level.

2. Add an argument `unicode` to _PyDict_NewPresized(). -- Breaks some 3rd party 
codes using internal APIs.
3. Add a new internal C API such that _PyDict_NewPresizedUnicodeKey(). -- Most 
conservative.
4. Add a new internal C API that creates dict form keys and values for extreme 
performance, like this:

// Create a new dict from keys and values.
// Items are received as `{keys[i*keys_offset]: values[i*values_offset] for i 
in range(length)}`.
// When distinct=1, this function skips checking duplicated keys.
// So pass distinct=1 unless you can guarantee that there is no duplicated keys.
PyObject *
PyDict_FromKeysAndValues(PyObject **keys, Py_ssize_t keys_offset, PyObject 
**values, Py_ssize_t values_offset, Py_ssize_t lenghh, int distincit)
{
}

--

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-02-26 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29721
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31598

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-02-26 Thread Inada Naoki


New submission from Inada Naoki :

Code objects have more and more bytes attributes for now.
To reduce the RAM by code, I want to remove ob_shash (cached hash value) from 
bytes object.

Sets and dicts have own hash cache.
Unless checking same bytes object against dicts/sets many times, this don't 
cause big performance loss.

--
components: Interpreter Core
messages: 414083
nosy: methane
priority: normal
severity: normal
status: open
title: Deprecate ob_shash in BytesObject
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-25 Thread Inada Naoki


Inada Naoki  added the comment:

>
>
> Do you propose to
> 1. Only use StringKeyDicts when non-string keys are not possible?  (Where
> would this be?)
> 2. Switch to a normal dict when a non-string key is added?  (But likely
> not switch back when the last non-string key is removed.)
> 3. Deprecate and remove the option to add non-string keys to namespace
> dicts?  (Proposed and rejected at least once as not gaining much.)
>
>
>

2. We already do such hack for key sharing dict.
And yes, deleting non string key doesn't switch back. d[0]=0; del d[0];
loop must be amortized O(1).
Only dict.clear() switches back.

--

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-24 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29686
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31564

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-24 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset ad6c7003e38a9f8bdf8d865fb5fa0f3c03690315 by Inada Naoki in branch 
'main':
bpo-46606: Remove redundant +1. (GH-31561)
https://github.com/python/cpython/commit/ad6c7003e38a9f8bdf8d865fb5fa0f3c03690315


--

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43364] Windows: Make UTF-8 mode more accessible

2022-02-24 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue43364>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-24 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +29684
pull_request: https://github.com/python/cpython/pull/31561

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"

2022-02-23 Thread Inada Naoki


Inada Naoki  added the comment:

PyDict_Keys(), PyDict_Values(), and PyDict_Items() don't respect insertion 
order too.

--

___
Python tracker 
<https://bugs.python.org/issue40116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-23 Thread Inada Naoki


New submission from Inada Naoki :

Currently, PyDictKeyEntry is 24bytes (hash, key, and value).

We can drop the hash from entry when all keys are unicode, because unicode 
objects caches hash already.

This will cause some performance regression on microbenchmark because dict need 
one more indirect access to compare hash value.

On the other hand, this will reduce some RAM usage. Additionally, unlike 
docstrings and annotations, this includes much **hot** RAM. It will make Python 
more cache efficient.

This is work in progress code: https://github.com/methane/cpython/pull/43
pypeformance result is in the PR too.

--
components: Interpreter Core
messages: 413892
nosy: Mark.Shannon, methane, rhettinger
priority: normal
severity: normal
status: open
title: dict: Use smaller entry for Unicode-key only dict.
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46845>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"

2022-02-23 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +29671
pull_request: https://github.com/python/cpython/pull/31550

___
Python tracker 
<https://bugs.python.org/issue40116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"

2022-02-23 Thread Inada Naoki


Inada Naoki  added the comment:

I found regression caused by GH-28520.

```
class C:
def __init__(self, n):
if n:
self.a = 1
self.b = 2
self.c = 3
else:
self.c = 1
self.b = 2
self.a = 3


o1 = C(True)
o2 = C(False)
print(o2.__dict__)  # {'c': 1, 'b': 2, 'a': 3}

d1 = {}
d1.update(o2.__dict__)  # {'a': 3, 'b': 2, 'c': 1}
print(d1)
```

--

___
Python tracker 
<https://bugs.python.org/issue40116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting and immortal objects

2022-02-21 Thread Inada Naoki


Inada Naoki  added the comment:

All of these optimizations should be disabled by default.

* It will cause leak when Python is embedded.
* Even for python command, it will break __del__ and weakref callbacks.

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-21 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-21 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 74127b89a8224d021fc76f679422b76510844ff9 by Inada Naoki in branch 
'main':
bpo-46606: Reduce stack usage of getgroups and setgroups (GH-31073)
https://github.com/python/cpython/commit/74127b89a8224d021fc76f679422b76510844ff9


--

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46813] Allow developer to resize the dictionary

2022-02-21 Thread Inada Naoki


Inada Naoki  added the comment:

As I commented in https://github.com/faster-cpython/ideas/discussions/288, your 
benchmark is not fair.

Include `{}` and `{}.resize(len(cases))` into the measured function.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46813>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2022-02-16 Thread Inada Naoki


Inada Naoki  added the comment:

> Generally speaking, parsing some things as decimal or datetime are schema 
> dependent.

Totally agree with this.

> In order to provide maximal flexibility it would be much nicer to have a 
> streaming interface available (like SAX for XML parsing), but that is not 
> what this is.

I think it is too difficult and complicated.
I think post-processing approach (e.g. dataclass_json, pydantic) is enough.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue29992>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting and immortal objects

2022-02-11 Thread Inada Naoki


Inada Naoki  added the comment:

I think making more objects immortal by default will reduce the gap, although I 
am not sure it can be 2%. (I guess 3% and I think it is acceptable gap.)

* Code attributes (contents of co_consts, co_names, etc...) in deep frozen 
modules.
  * only if subinterpreter shares them.
* Statically allocated strings (previously _Py_IDENTIFIER)

To reduce gap more, we need to reduce Python stack operation in ceval in some 
way.

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46688] Add sys.is_interned

2022-02-08 Thread Inada Naoki


Inada Naoki  added the comment:

Thank you, I can not find it because it is too old.

--
resolution:  -> duplicate
stage: patch review -> resolved
status: open -> closed
superseder:  -> Add sys.isinterned()

___
Python tracker 
<https://bugs.python.org/issue46688>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46688] Add sys.is_interned

2022-02-08 Thread Inada Naoki


Inada Naoki  added the comment:

I thought sys.is_interned() is needed to implement bpo-46430, but GH-30683 
looks nice to me.
I will close this issue after GH-30683 is merged.

--

___
Python tracker 
<https://bugs.python.org/issue46688>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46688] Add sys.is_interned

2022-02-08 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29397
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31227

___
Python tracker 
<https://bugs.python.org/issue46688>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46688] Add sys.is_interned

2022-02-08 Thread Inada Naoki


New submission from Inada Naoki :

deepfreeze.py needs to know the unicode object is interned.

Ref: https://bugs.python.org/issue46430

--
components: Interpreter Core
messages: 412890
nosy: methane
priority: normal
severity: normal
status: open
title: Add sys.is_interned
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46688>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46600] Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call

2022-02-02 Thread Inada Naoki


Inada Naoki  added the comment:

I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and 
-Og is so different.

We can reduce stack usage of it easily, but it is not a problem than 
_PyEval_EvalFrameDefault.
It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0.

--

___
Python tracker 
<https://bugs.python.org/issue46600>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-01 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29257
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31073

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46606] Large C stack usage of os.getgroups() and os.setgroups()

2022-02-01 Thread Inada Naoki


New submission from Inada Naoki :

I checked stack usage for bpo-46600 and found this two functions use a lot of 
stack.

os_setgroups: 262200 bytes
os_getgroups_impl: 262184 bytes

Both function has local variable like this:

gid_t grouplist[MAX_GROUPS];

MAX_GROUPS is defined as:

```
#ifdef NGROUPS_MAX
#define MAX_GROUPS NGROUPS_MAX
#else
/* defined to be 16 on Solaris7, so this should be a small number */
#define MAX_GROUPS 64
#endif
```

NGROUPS_MAX is 65536 and sizeof(gid_t) is 4 on Ubuntu 20.04, so grouplist is 
262144bytes.

It seems this grouplist is just for avoid allocation:

```
} else if (n <= MAX_GROUPS) {
/* groups will fit in existing array */
alt_grouplist = grouplist;
} else {
alt_grouplist = PyMem_New(gid_t, n);
if (alt_grouplist == NULL) {
return PyErr_NoMemory();
}
```

How about just using `#define MAX_GROUPS 64`?
Or should we remove this grouplist because os.grouplist() is not called so 
frequently?

--
components: Library (Lib)
messages: 412335
nosy: methane
priority: normal
severity: normal
status: open
title: Large C stack usage of os.getgroups() and os.setgroups()
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46600] Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call

2022-02-01 Thread Inada Naoki


Inada Naoki  added the comment:

FWIW, it seems -O0 don't merge local variables in different path or lifetime.

For example, see _Py_abspath

```
if (path[0] == '\0' || !wcscmp(path, L".")) {
   wchar_t cwd[MAXPATHLEN + 1];
   //(snip)
}
//(snip)
wchar_t cwd[MAXPATHLEN + 1];
```

wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes.
-Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it.

I don't know what is the specific optimization flag in -Og do merge local 
variable, but I think -Og is very important for _PyEval_EvalFrameDefault() 
since it has many local variables in huge switch-case statements.
-Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it.

By the way, clang 13 has `-fstack-usage` option like gcc, but clang 12 don't 
have it.
Since Ubuntu 20.04 have only clang 12, I use `-fstack-size-segment` and 
https://github.com/mvanotti/stack-sizes to get stack size.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46600>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36346] Prepare for removing the legacy Unicode C API

2022-01-28 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue36346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36346] Prepare for removing the legacy Unicode C API

2022-01-28 Thread Inada Naoki


Inada Naoki  added the comment:

No. I just waiting Python 3.11 become Bata.

--

___
Python tracker 
<https://bugs.python.org/issue36346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-26 Thread Inada Naoki


Inada Naoki  added the comment:

We do not have *fill* since Python 3.6.
There is a `dk_nentries` instead. But when `insertion_resize()` is called, 
`dk_nentries` is equal to `USABLE_FRACTION(dk_size)` (dk_size is `1 << 
dk_log2_size` for now). So it is different from *fill* in the old dict.

I chose `dk_used*3` as GROWTH_RATE because it reserves more spaces when there 
are dummies than when there is no dummy, as I described in the same comment:

> In case of dict growing without deletion, dk_size is doubled for each resize 
> as current behavior.
> When there are deletion, dk_size is growing aggressively than Python 3.3 
> (used*2 -> used*3).  And it allows dict shrinking after massive deletions.

For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new 
dk_size is:

* used = 10 (dummy=0) -> 32 (31.25%)
* used = 9 (dummy=1)  -> 32 (28.125%)
(snip)
* used = 6 (dummy=4)  -> 32 (18.75%)
* used = 5 (dummy=5)  -> 16 (31.25%)
* used = 4 (dummy=6)  -> 16 (25%)
(snip)
* used = 2 (dummy=8)  -> 8 (25%)

As you can see, dict is more sparse when there is dummy than when there is no 
dummy, except used=5/dummy=5 case.

There may be a small room for improvement, especially for `used=5/dummy=5` 
case. But I am not sure it is worth enough to use more complex GROWTH_RATE than 
used*3.
Any good idea?

--

___
Python tracker 
<https://bugs.python.org/issue33205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44723] Codec name normalization breaks custom codecs

2022-01-24 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue44723>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46464] concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used

2022-01-24 Thread Inada Naoki


Inada Naoki  added the comment:

> The only way to safely launch worker processes on demand is to spawn a worker 
> launcher process spawned prior to any thread creation that remains idle, with 
> a sole job of spawn new worker processes for us. That sounds complicated. 
> That'd be a feature. Lets go with the bugfix first.

fork is not the only way to launch worker process. We have spawn. And sapwn is 
the default for macOS since Python 3.8.

Simple reverting seems not good for macOS users, since they need to pay cost 
for both of pre-spawning and spawn.
Can't we just pre-spawn only when fork is used?

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46464>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers

2022-01-20 Thread Inada Naoki


Inada Naoki  added the comment:

> If we literally ignore the attribute, any usage of `.mapping` will be an 
> error, which basically makes the whole `.mapping` feature useless for 
> statically typed code. It also wouldn't appear in IDE autocompletions.

`.mapping` is not exist between Python 3.0~3.9. And it is not feature that is 
long awaited by many users.

See https://bugs.python.org/issue40890#msg370841
Raymond said:

  Traditionally, we do expose wrapped objects:  property() exposes fget,  
partial() exposes func, bound methods expose __func__, ChainMap() exposes maps, 
etc.
  Exposing this attribute would help with introspection, making it possible to 
write efficient functions that operate on dict views.

Type hints is very useful for application code, especially when it is large.
But introspection is very rarely used in such typed code bases. I don't think 
`.mapping` is useful for many users, like `.fget` of the property.
So adding `# type: ignore` in such lines is the "lesser evil".


> If we add it to `KeysView` and `ValuesView`, library authors will end up 
> using `.mapping` with arguments annotated as `Mapping` or `MutableMapping`, 
> not realizing it is purely a dict thing, not required from an arbitrary 
> mapping object.

It doesn't make sense at all, IMO.
If we really need `.mapping` in typeshed, we should add it to 
`KeysViewWithMapping`.
So mapping classes that don't inherit dict shouldn't be forced to implement 
`.mapping`.


> If we keep `.mapping` in dict but not anywhere else, as described already, it 
> becomes difficult to override .keys() and .values() in a dict subclass. You 
> can't just return a KeysView or a ValuesView. If that was allowed, how should 
> people annotate code that uses `.mapping`? You can't annotate with `dict`, 
> because that also allows subclasses of dict, which might not have a 
> `.mapping` attribute.

`# type: ignore`.


> Yet another option would be to expose `dict_keys` and `dict_values` somewhere 
> where they don't actually exist at runtime. This leads to code like this:
>
> from typing import Any, TYPE_CHECKING
> if TYPE_CHECKING:
> # A lie for type checkers to work.
> from something_that_doesnt_exist_at_runtime import dict_keys, dict_values
> else:
> # Runtime doesn't check type annotations anyway.
> dict_keys = Any
> dict_values = Any
>
> While this works, it isn't very pretty.

What problem this problem solve? `SortedDict.keys()` can not return `dict_keys`.
As far as I think, your motivation is making dict subclass happy with type 
checkers.
But this option doesn't make dict subclass happy at all.

--

___
Python tracker 
<https://bugs.python.org/issue46399>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers

2022-01-19 Thread Inada Naoki


Inada Naoki  added the comment:

In other words,

a. If `.keys()` in all dict subclasses must return subclass of `dict_keys`: 
`dict.keys() -> dict_keys`.
b. If `.keys().mapping` must be accessible for all dict subclasses: Add 
`.mapping` to `KeysView`.
c. If `.keys().mapping` is optional for dict subclasses: typeshed can't add 
`.mapping` to anywhere, AFAIK.

--

___
Python tracker 
<https://bugs.python.org/issue46399>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers

2022-01-19 Thread Inada Naoki


Inada Naoki  added the comment:

> I agree with Inada that not every internal type should be exposed, but I 
> would make an exception for the dict views classes due to the fact that dict 
> subclasses are much more common than subclasses of other mappings, such as 
> OrderedDict. I don't think it's *particularly* important to expose the 
> OrderedDict views classes in the same way.

I am afraid that you misread me. I used OrderedDict as one example of dict 
subclass. I didn't mean dict_(keys|items|values) shouldn't exposed because of I 
don't want to expose odict_(keys|items|values).

Anyway, OrderedDict was not good choise to explain my thought because its 
builtin type and defined in typeshed. Instead, I use 
sortedcontainers.SortedDict as example.

See 
https://github.com/grantjenks/python-sortedcontainers/blob/dff7ef79a21b3f3ceb6a19868f302f0a680aa243/sortedcontainers/sorteddict.py#L43

It is a dict subclass. It's `keys()` method returns `SortedKeysView`.
`SortedKeysView` is subclass of `collections.abc.KeysView`. But it is not 
subclass of `dict_keys`.
If `dict.keys()` in typeshed defines it returns `dict_keys`, doesn't mypy flag 
it as an "incompatible override"?

So I propose that typeshed defines that dict.keys() returns KeysView, not 
dict_keys.

Although subclass of dict is very common, it is very rare that:

* Override `keys()`, and
* Returns `super().keys()`, instead of KeysView (or list), and
* `.keys().mapping` is accessed.

It is very minor inconvinience that user need to ignore false positive for this 
very specific cases.
Or do you think this case is much more common than classes like SortedDict?

Note that dict_(keys|items|values) is implementation detail and subclassing it 
doesn't make sense.

Another option is adding more ABC or Protocol that defines `.mapping` attribute.
SortedKeysView can inherit it and implement `.mapping`.

--

___
Python tracker 
<https://bugs.python.org/issue46399>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46399] Addition of `mapping` attribute to dict views classes has inadvertently broken type-checkers

2022-01-18 Thread Inada Naoki


Inada Naoki  added the comment:

I am not happy about exposing every internal types. I prefer duck typing.

Like OrderedDict, not all dict subtypes uses `dict_keys`, `dict_views`, and 
`dict_items`.
If typeshed annotate dict.keys() returns `dict_keys`, "incompatible override" 
cano not be avoided.

I prefer:

* Keep status-quo: keys().mapping cause false positive and user need to 
suppress. This is not a big problem because `.mapping` is very rarely used.
* Or add `.mapping` to `KeysView`, `ValuesView`, and `ItemsView`. Force every 
dict subclasses to implement it.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46399>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45644] Make json.tool soak up input before opening output for writing

2022-01-18 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +methane
nosy_count: 4.0 -> 5.0
pull_requests: +28860
pull_request: https://github.com/python/cpython/pull/30659

___
Python tracker 
<https://bugs.python.org/issue45644>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29241] sys._enablelegacywindowsfsencoding() don't apply to os.fsencode and os.fsdecode

2022-01-16 Thread Inada Naoki


Inada Naoki  added the comment:

Mercurial still use it.
https://www.mercurial-scm.org/repo/hg-stable/file/tip/mercurial/pycompat.py#l113

Mercurial has plan to move filesystem name from ANSI Code Page to UTF-8, but I 
don't know about its progress.
https://www.mercurial-scm.org/wiki/WindowsUTF8Plan

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue29241>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46376] PyMapping_Check returns 1 for list

2022-01-14 Thread Inada Naoki


Inada Naoki  added the comment:

collections.abc.Mapping is fixed by https://bugs.python.org/issue43977
We can be same thing if backward compatibility allows it.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23882] unittest discovery doesn't detect namespace packages when given no parameters

2022-01-09 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 0b2b9d251374c5ed94265e28039f82b37d039e3e by Inada Naoki in branch 
'main':
bpo-23882: unittest: Drop PEP 420 support from discovery. (GH-29745)
https://github.com/python/cpython/commit/0b2b9d251374c5ed94265e28039f82b37d039e3e


--

___
Python tracker 
<https://bugs.python.org/issue23882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45661] [meta] Freeze commonly used stdlib modules.

2022-01-06 Thread Inada Naoki


Inada Naoki  added the comment:

I don't against deep freezing functools and contextlib.
But I think we should optimize and utilize zipimport or something similar, 
because we can not deep-freeze all stdlib or 3rd party libraries.

See also: 
https://github.com/faster-cpython/ideas/discussions/158#discussioncomment-1857198

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue45661>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46236] PyFunction_GetAnnotations returning Tuple vs Dict

2022-01-04 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +28615
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30409

___
Python tracker 
<https://bugs.python.org/issue46236>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46143] [docs] IO > Text Encoding info outdated

2021-12-20 Thread Inada Naoki


Inada Naoki  added the comment:

UTF-8 mode is not enabled by default. So locale encoding is still the default 
encoding.

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46143>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46085] OrderedDict iterator allocates di_result unnecessarily

2021-12-15 Thread Inada Naoki


Inada Naoki  added the comment:

Nice catch.

> if ((kind & _odict_ITER_KEYS) && (kind &_odict_ITER_VALUES))

You can reduce one branch by

```
#define _odict_ITER_ITEMS (_odict_ITER_KEYS|_odict_ITER_VALUES)
...
 if (kind & _odict_ITER_ITEMS == _odict_ITER_ITEMS)
```

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46085>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46006] [subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters

2021-12-08 Thread Inada Naoki


Inada Naoki  added the comment:

That's too bad.
We can not compare two Unicode by pointer even if both are interned anymore... 
It was a nice optimization.

--

___
Python tracker 
<https://bugs.python.org/issue46006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46006] [subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters

2021-12-08 Thread Inada Naoki


Inada Naoki  added the comment:

Should `_PyUnicode_EqualToASCIIId()` support comparing two unicode from 
different interpreter??

--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue46006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23882] unittest discovery doesn't detect namespace packages when given no parameters

2021-11-25 Thread Inada Naoki


Change by Inada Naoki :


--
versions: +Python 3.11 -Python 3.10, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue23882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23882] unittest discovery doesn't detect namespace packages when given no parameters

2021-11-24 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +27982
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/29745

___
Python tracker 
<https://bugs.python.org/issue23882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over

2021-11-16 Thread Inada Naoki


Inada Naoki  added the comment:

The another error I found is already reported as #42868.

--

___
Python tracker 
<https://bugs.python.org/issue38625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over

2021-11-16 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed
versions: +Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over

2021-11-16 Thread Inada Naoki


Inada Naoki  added the comment:

I confirmed that this bug is fixed, but I found another error.

--

___
Python tracker 
<https://bugs.python.org/issue38625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over

2021-11-16 Thread Inada Naoki


Inada Naoki  added the comment:

Is this bug fixed by #26730?

--

___
Python tracker 
<https://bugs.python.org/issue38625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45521] obmalloc radix tree typo in code

2021-10-18 Thread Inada Naoki


Inada Naoki  added the comment:

When I am trying to understand this issue, I see this segfault.

https://gist.github.com/methane/1b83e2abc6739017e0490c5f70a27b52

I am not sure this segfault is caused by this issue or not. If this is 
unrelated, I will create another issue.

--

___
Python tracker 
<https://bugs.python.org/issue45521>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45475] gzip fails to read a gzipped file (ValueError: readline of closed file)

2021-10-18 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45475>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45475] gzip fails to read a gzipped file (ValueError: readline of closed file)

2021-10-18 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 0a4c82ddd34a3578684b45b76f49cd289a08740b by Inada Naoki in branch 
'main':
bpo-45475: Revert `__iter__` optimization for GzipFile, BZ2File, and LZMAFile. 
(GH-29016)
https://github.com/python/cpython/commit/0a4c82ddd34a3578684b45b76f49cd289a08740b


--

___
Python tracker 
<https://bugs.python.org/issue45475>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   5   6   7   8   9   10   >