[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Tim Peters
[Neil Schemenauer ]
> ...
> BTW, the current radix tree doesn't even require that pools are
> aligned to POOL_SIZE.  We probably want to keep pools aligned
> because other parts of obmalloc rely on that.

obmalloc relies on it heavily.  Another radix tree could map block
addresses to all the necessary info in a current pool header, but that
could become gigantic.  For example, even a current "teensy" 256 KiB
arena can be carved into the order of 16K 16-byte blocks (the smallest
size class).  And finding a pool header now is unbeatably cheap:  just
clear the last 12 address bits, and you're done.


> Here is the matchup of the radix tree vs the current
> address_in_range() approach.
>
> - nearly the same in terms of performance.  It might depend on OS
>   and workload but based on my testing on Linux, they are very
>   close.  Would be good to do more testing but I think the radix
>   tree is not going to be faster, only slower.

People should understand that the only point to these things is to
determine whether a pointer passed to a free() or realloc() spelling
was obtained from an obmalloc pool or from the system malloc family.
So they're invoked once near the very starts of those two functions,
and that's all.

Both ways are much more expensive than finding a pool header (which is
just clearing some trailing address bits).

The current way reads an "arena index" out of the pool header, uses
that to index the file static `arenas` vector to get a pointer to an
arena descriptor, then reads the arena base address out of the
descriptor.  That;s used to determine whether the original address is
contained in the arena.  The primary change my PR makes is to read the
arena index from the start of the _page_ the address belongs instead
(the pool header address is irrelevant to this, apart from that a pool
header is aligned to the first page in a pool).

The radix tree picks bits out of the address three times to index into
a 3-level (but potentially broad) tree, ending with a node containing
info about the only two possible arenas the original address may
belong to.  Then that info is used to check.

The number of loads is essentially the same, but the multiple levels
of indexing in the tree is a teensy bit more expensive because it
requires more bit-fiddling.  I spent hours, in all, dreaming up a way
to make the _seemingly_ more complex final "so is the address in one
of those two arenas or not?" check about as cheap as the current way.
But Neil didn't see any significant timing difference after making
that change, which was mildly disappointing but not really surprising:
 arithmetic is just plain cheap compared to reading up memory.

> - radix tree uses a bit more memory overhead.  Maybe 1 or 2 MiB on a
>   64-bit OS.  The radix tree uses more as memory use goes up but it
>   is a small fraction of total used memory.  The extra memory use is
>   the main downside at this point, I think.

I'd call it the only downside.  But nobody yet has quantified how bad
it can get.


> - the radix tree doesn't read uninitialized memory.  The current
>   address_in_range() approach has worked very well but is relying on
>   some assumptions about the OS (how it maps pages into the program
>   address space).  This is the only aspect where the radix tree is
>   clearly better.  I'm not sure this matters enough to offset the
>   extra memory use.

I'm not worried about that.  The only real assumption here is that if
an OS supports some notion of "pages" at all, then for any address for
which the program has read/write access (which are the only kinds of
addresses that can be sanely passed to free/realloc), the OS allows
the same access to the entire page containing that address.

In two decades we haven't seen an exception to that yet, right?  It's
hard to imagine a HW designer thinking "I know!  Let's piss away more
transistors on finer-grained control nobody has asked for, and slow
down memory operations even more checking for that." ;-)


> - IMHO, the radix tree code is a bit simpler than Tim's
>   obmalloc-big-pool code.

Absolutely so.  There's another way to look at this:  if Vladimir
Marangozov (obmalloc's original author) had used an arena radix tree
from the start, would someone now get anywhere proposing a patch to
change it to the current scheme?  I'd expect a blizzard of -1 votes,
starting with mine ;-)

> ...
> My feeling right now is that Tim's obmalloc-big-pool is the best
> design at this point.  Using 8 KB or 16 KB pools seems to be better
> than 4 KB.  The extra complexity added by Tim's change is not so
> nice.  obmalloc is already extremely subtle and obmalloc-big-pool
> makes it more so.

Moving to bigger pools and bigger arenas are pretty much no-brainers
for us, but unless pool size is increased there's no particular reason
to pursue either approach - "ain't broke, don't fix".

Larry Hastings started a "The untuned tunable parameter ARENA_SIZE"
thread here about two years ago, where he got a blizzard of 

[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Neil Schemenauer
Here are benchmark results for 64 MB arenas and 16 kB pools.  I ran
without the --fast option and on a Linux machine in single user
mode.  The "base" columm is the obmalloc-big-pools branch with
ARENA_SIZE = 64 MB and POOL_SIZE = 16 kB.  The "radix" column is
obmalloc_radix_tree (commit 5e00f6041) with the same arena and pool
sizes.

+-+-+-+
| Benchmark   | base (16kB/64MB)| radix (16KB/64MB)   |
+=+=+=+
| 2to3| 290 ms  | 292 ms: 1.00x slower (+0%)  |
+-+-+-+
| crypto_pyaes| 114 ms  | 116 ms: 1.02x slower (+2%)  |
+-+-+-+
| django_template | 109 ms  | 106 ms: 1.03x faster (-3%)  |
+-+-+-+
| dulwich_log | 75.2 ms | 74.5 ms: 1.01x faster (-1%) |
+-+-+-+
| fannkuch| 454 ms  | 449 ms: 1.01x faster (-1%)  |
+-+-+-+
| float   | 113 ms  | 111 ms: 1.01x faster (-1%)  |
+-+-+-+
| hexiom  | 9.45 ms | 9.47 ms: 1.00x slower (+0%) |
+-+-+-+
| json_dumps  | 10.6 ms | 11.1 ms: 1.04x slower (+4%) |
+-+-+-+
| json_loads  | 24.4 us | 25.2 us: 1.03x slower (+3%) |
+-+-+-+
| logging_simple  | 8.19 us | 8.37 us: 1.02x slower (+2%) |
+-+-+-+
| mako| 15.1 ms | 15.1 ms: 1.01x slower (+1%) |
+-+-+-+
| meteor_contest  | 98.3 ms | 97.1 ms: 1.01x faster (-1%) |
+-+-+-+
| nbody   | 142 ms  | 140 ms: 1.02x faster (-2%)  |
+-+-+-+
| nqueens | 93.8 ms | 93.0 ms: 1.01x faster (-1%) |
+-+-+-+
| pickle  | 8.89 us | 8.85 us: 1.01x faster (-0%) |
+-+-+-+
| pickle_dict | 17.9 us | 18.2 us: 1.01x slower (+1%) |
+-+-+-+
| pickle_list | 2.68 us | 2.64 us: 1.01x faster (-1%) |
+-+-+-+
| pidigits| 182 ms  | 184 ms: 1.01x slower (+1%)  |
+-+-+-+
| python_startup_no_site  | 5.31 ms | 5.33 ms: 1.00x slower (+0%) |
+-+-+-+
| raytrace| 483 ms  | 476 ms: 1.02x faster (-1%)  |
+-+-+-+
| regex_compile   | 167 ms  | 169 ms: 1.01x slower (+1%)  |
+-+-+-+
| regex_dna   | 170 ms  | 171 ms: 1.01x slower (+1%)  |
+-+-+-+
| regex_effbot| 2.70 ms | 2.75 ms: 1.02x slower (+2%) |
+-+-+-+
| regex_v8| 21.1 ms | 21.3 ms: 1.01x slower (+1%) |
+-+-+-+
| scimark_fft | 368 ms  | 371 ms: 1.01x slower (+1%)  |
+-+-+-+
| scimark_monte_carlo | 103 ms  | 101 ms: 1.02x faster (-2%)  |
+-+-+-+
| scimark_sparse_mat_mult | 4.31 ms | 4.27 ms: 1.01x faster (-1%) |
+-+-+-+
| spectral_norm   | 131 ms  | 135 ms: 1.03x slower (+3%)  |
+--

[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Neil Schemenauer
On 2019-06-14, Tim Peters wrote:
> However, last I looked there Neil was still using 4 KiB obmalloc
> pools, all page-aligned.  But using much larger arenas (16 MiB, 16
> times bigger than my branch, and 64 times bigger than Python currently
> uses).

I was testing it verses your obmalloc-big-pool branch and trying to
make it a fair comparision.  You are correct: 4 KiB pools and 16 MiB
arenas.  Maybe I should test with 16 KiB pools and 16 MiB arenas.
That seems a more optimized setting for current machines and
workloads.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SUN6QZQKRPI5WQZKSBZFSLBNG4MMV3YH/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Neil Schemenauer
On 2019-06-15, Inada Naoki wrote:
> Oh, do you mean your branch doesn't have headers in each page?

That's right.  Each pool still has a header but pools can be larger
than the page size.  Tim's obmalloc-big-pool idea writes something
to the head of each page within a pool.  The radix tree doesn't need
that and actually doesn't care about OS page size.

BTW, the current radix tree doesn't even require that pools are
aligned to POOL_SIZE.  We probably want to keep pools aligned
because other parts of obmalloc rely on that.

Here is the matchup of the radix tree vs the current
address_in_range() approach.

- nearly the same in terms of performance.  It might depend on OS
  and workload but based on my testing on Linux, they are very
  close.  Would be good to do more testing but I think the radix
  tree is not going to be faster, only slower.

- radix tree uses a bit more memory overhead.  Maybe 1 or 2 MiB on a
  64-bit OS.  The radix tree uses more as memory use goes up but it
  is a small fraction of total used memory.  The extra memory use is
  the main downside at this point, I think.

- the radix tree doesn't read uninitialized memory.  The current
  address_in_range() approach has worked very well but is relying on
  some assumptions about the OS (how it maps pages into the program
  address space).  This is the only aspect where the radix tree is
  clearly better.  I'm not sure this matters enough to offset the
  extra memory use.

- IMHO, the radix tree code is a bit simpler than Tim's
  obmalloc-big-pool code.  That's not a big deal though as long as
  the code works and is well commented (which Tim's code is).

My feeling right now is that Tim's obmalloc-big-pool is the best
design at this point.  Using 8 KB or 16 KB pools seems to be better
than 4 KB.  The extra complexity added by Tim's change is not so
nice.  obmalloc is already extremely subtle and obmalloc-big-pool
makes it more so.

Regards,

Neil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZAPSJB6TOODRBRCF3T3CXMYSX3FLWDDI/


[Python-Dev] Re: Who uses libpython38.a on Windows?

2019-06-14 Thread MRAB

On 2019-06-14 21:53, Steve Dower wrote:

One of the most annoying steps in building the Windows installers is
generating the libpython38.a file. It's annoying, because it requires
having "generic enough" MinGW tools to ensure that the file is
compatible with whatever version of MinGW might be trying to build
against the regular Windows distribution.

I would like to stop shipping this file in 3.8 and instead put the steps
into the docs to show people how to generate them themselves (with the
correct version of their tools):

gendef python38.dll > tmp.def
dlltool --dllname python38.dll --def tmp.def --output-lib libpython38.a
-m i386:x86-64

(Obviously the commands themselves are not complicated if you already
have gendef and dlltool, but currently a normal CPython build system
does not have these.)

Before just doing this, I wanted to put out a request for information:

* Do you rely (or know anyone who relies) on libpython38.a on Windows?
* Are you able to add the two commands above to your build? If not, why not?

I'm able to build the regex module without it; in fact, I believe I've 
been able to do so since Python 3.5!

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7IXOTARTYJUNSCFAU3YY2VOVILC4EBY/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Tim Peters
[Inada Naoki . to Neil S]
> Oh, do you mean your branch doesn't have headers in each page?

That's probably right ;-)  Neil is using a new data structure, a radix
tree implementing a sparse set of arena addresses.  Within obmalloc
pools, which can be of any multiple-of-4KiB (on a 64-bit box) size,
every byte beyond the pool header is usable for user data.  In my
patch, there is no new data structure, but it needs to store an "arena
index" at the start of every page (every 4K bytes) within a pool.

I certainly _like_ Neil's code better.  It's clean rather than
excruciatingly tricky.  The question is whether that's enough to
justify the memory burden of an additional data structure (which can
potentially grow very large).  So I've been working with Neil to see
whether it's possible to make it faster than my branch, to give it
another selling point people actually care about ;-)

Should also note that Neil's approach never needs to read
uninitialized memory, so we could throw away decades of (e.g.)
valgrind pain caused by the current approach (which my patch builds
on).


> https://bugs.python.org/issue32846
>
> As far as I remember, this bug was caused by cache thrashing (page
> header is aligned by 4K, so cache line can conflict often.)
> Or this bug can be caused by O(N) free() which is fixed already.

I doubt that report is relevant here, but anyone is free to try it
with Neil's branch.

https://github.com/nascheme/cpython/tree/obmalloc_radix_tree

However, last I looked there Neil was still using 4 KiB obmalloc
pools, all page-aligned.  But using much larger arenas (16 MiB, 16
times bigger than my branch, and 64 times bigger than Python currently
uses).

But the `O(N) free()` fix may very well be relevant.  To my eyes,
while there was plenty of speculation in that bug report, nobody
actually dug in deep to nail a specific cause.  A quick try just now
on my branch (which includes the `O(N) free()` fix) on Terry Reedy's
simple code in that report shows much improved behavior, until I run
out of RAM.

For example, roughly 4.3 seconds to delete 40 million strings in a
set, and 9.1 to delete 80 million in a set.  Not really linear, but
very far from quadratic.  In contrast, Terry saw nearly a quadrupling
of delete time when moving from 32M to 64M strings

So more than one thing was going on there, but looks likely that the
major pain was caused by quadratic-time arena list sorting.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GXNDFA7YO6NP3IWIW4IIYX5XEIOW2FJH/


[Python-Dev] Re: radix tree arena map for obmalloc

2019-06-14 Thread Inada Naoki
Oh, do you mean your branch doesn't have headers in each page?

https://bugs.python.org/issue32846

As far as I remember, this bug was caused by cache thrashing (page
header is aligned by 4K, so cache line can conflict often.)
Or this bug can be caused by O(N) free() which is fixed already.

I'll see it in next week.

On Sat, Jun 15, 2019 at 3:54 AM Neil Schemenauer  wrote:
>
> I've been working on this idea for a couple of days.  Tim Peters has
> being helping me out and I think it has come far enough to get some
> more feedback.  It is not yet a good replacement for the current
> address_in_range() test.  However, performance wise, it is very
> close.  Tim figures we are not done optimizing it yet so maybe it
> will get better.
>
> Code is available on my github branch:
>
> https://github.com/nascheme/cpython/tree/obmalloc_radix_tree
>
> Tim's "obmalloc-big-pools" is what I have been comparing it to.  It
> seems 8 KB pools are faster than 4 KB.  I applied Tim's arena
> trashing fix (bpo-37257) to both branches.  Some rough (--fast)
> pyperformance benchmark results are below.
>
>
> +-+-+-+
> | Benchmark   | obmalloc-big-pools  | obmalloc_radix  
> |
> +=+=+=+
> | crypto_pyaes| 168 ms  | 170 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
> | hexiom  | 13.7 ms | 13.6 ms: 1.01x faster (-1%) 
> |
> +-+-+-+
> | json_dumps  | 15.9 ms | 15.6 ms: 1.02x faster (-2%) 
> |
> +-+-+-+
> | json_loads  | 36.9 us | 37.1 us: 1.01x slower (+1%) 
> |
> +-+-+-+
> | meteor_contest  | 141 ms  | 139 ms: 1.02x faster (-2%)  
> |
> +-+-+-+
> | nqueens | 137 ms  | 140 ms: 1.02x slower (+2%)  
> |
> +-+-+-+
> | pickle_dict | 26.2 us | 25.9 us: 1.01x faster (-1%) 
> |
> +-+-+-+
> | pickle_list | 3.91 us | 3.94 us: 1.01x slower (+1%) 
> |
> +-+-+-+
> | python_startup_no_site  | 8.00 ms | 7.78 ms: 1.03x faster (-3%) 
> |
> +-+-+-+
> | regex_dna   | 246 ms  | 241 ms: 1.02x faster (-2%)  
> |
> +-+-+-+
> | regex_v8| 29.6 ms | 30.0 ms: 1.01x slower (+1%) 
> |
> +-+-+-+
> | richards| 93.9 ms | 92.7 ms: 1.01x faster (-1%) 
> |
> +-+-+-+
> | scimark_fft | 525 ms  | 531 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
> | scimark_sparse_mat_mult | 6.32 ms | 6.24 ms: 1.01x faster (-1%) 
> |
> +-+-+-+
> | spectral_norm   | 195 ms  | 198 ms: 1.02x slower (+2%)  
> |
> +-+-+-+
> | sqlalchemy_imperative   | 49.5 ms | 50.5 ms: 1.02x slower (+2%) 
> |
> +-+-+-+
> | sympy_expand| 691 ms  | 695 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
> | unpickle_list   | 5.09 us | 5.32 us: 1.04x slower (+4%) 
> |
> +-+-+-+
> | xml_etree_parse | 213 ms  | 215 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
> | xml_etree_generate  | 134 ms  | 136 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
> | xml_etree_process   | 103 ms  | 104 ms: 1.01x slower (+1%)  
> |
> +-+-+-+
>
> Not significant (34): 2to3; chameleon; chaos; deltablue;
> django_template; dulwich_log; fannkuch; float; go; html5lib;
> logging_format

[Python-Dev] Who uses libpython38.a on Windows?

2019-06-14 Thread Steve Dower
One of the most annoying steps in building the Windows installers is 
generating the libpython38.a file. It's annoying, because it requires 
having "generic enough" MinGW tools to ensure that the file is 
compatible with whatever version of MinGW might be trying to build 
against the regular Windows distribution.


I would like to stop shipping this file in 3.8 and instead put the steps 
into the docs to show people how to generate them themselves (with the 
correct version of their tools):


gendef python38.dll > tmp.def
dlltool --dllname python38.dll --def tmp.def --output-lib libpython38.a 
-m i386:x86-64


(Obviously the commands themselves are not complicated if you already 
have gendef and dlltool, but currently a normal CPython build system 
does not have these.)


Before just doing this, I wanted to put out a request for information:

* Do you rely (or know anyone who relies) on libpython38.a on Windows?
* Are you able to add the two commands above to your build? If not, why not?

Thanks,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BYU35PWDNJ54COLNCFCSY3MCFYPF4KUK/


[Python-Dev] radix tree arena map for obmalloc

2019-06-14 Thread Neil Schemenauer
I've been working on this idea for a couple of days.  Tim Peters has
being helping me out and I think it has come far enough to get some
more feedback.  It is not yet a good replacement for the current
address_in_range() test.  However, performance wise, it is very
close.  Tim figures we are not done optimizing it yet so maybe it
will get better.

Code is available on my github branch:

https://github.com/nascheme/cpython/tree/obmalloc_radix_tree

Tim's "obmalloc-big-pools" is what I have been comparing it to.  It
seems 8 KB pools are faster than 4 KB.  I applied Tim's arena
trashing fix (bpo-37257) to both branches.  Some rough (--fast)
pyperformance benchmark results are below.


+-+-+-+
| Benchmark   | obmalloc-big-pools  | obmalloc_radix  |
+=+=+=+
| crypto_pyaes| 168 ms  | 170 ms: 1.01x slower (+1%)  |
+-+-+-+
| hexiom  | 13.7 ms | 13.6 ms: 1.01x faster (-1%) |
+-+-+-+
| json_dumps  | 15.9 ms | 15.6 ms: 1.02x faster (-2%) |
+-+-+-+
| json_loads  | 36.9 us | 37.1 us: 1.01x slower (+1%) |
+-+-+-+
| meteor_contest  | 141 ms  | 139 ms: 1.02x faster (-2%)  |
+-+-+-+
| nqueens | 137 ms  | 140 ms: 1.02x slower (+2%)  |
+-+-+-+
| pickle_dict | 26.2 us | 25.9 us: 1.01x faster (-1%) |
+-+-+-+
| pickle_list | 3.91 us | 3.94 us: 1.01x slower (+1%) |
+-+-+-+
| python_startup_no_site  | 8.00 ms | 7.78 ms: 1.03x faster (-3%) |
+-+-+-+
| regex_dna   | 246 ms  | 241 ms: 1.02x faster (-2%)  |
+-+-+-+
| regex_v8| 29.6 ms | 30.0 ms: 1.01x slower (+1%) |
+-+-+-+
| richards| 93.9 ms | 92.7 ms: 1.01x faster (-1%) |
+-+-+-+
| scimark_fft | 525 ms  | 531 ms: 1.01x slower (+1%)  |
+-+-+-+
| scimark_sparse_mat_mult | 6.32 ms | 6.24 ms: 1.01x faster (-1%) |
+-+-+-+
| spectral_norm   | 195 ms  | 198 ms: 1.02x slower (+2%)  |
+-+-+-+
| sqlalchemy_imperative   | 49.5 ms | 50.5 ms: 1.02x slower (+2%) |
+-+-+-+
| sympy_expand| 691 ms  | 695 ms: 1.01x slower (+1%)  |
+-+-+-+
| unpickle_list   | 5.09 us | 5.32 us: 1.04x slower (+4%) |
+-+-+-+
| xml_etree_parse | 213 ms  | 215 ms: 1.01x slower (+1%)  |
+-+-+-+
| xml_etree_generate  | 134 ms  | 136 ms: 1.01x slower (+1%)  |
+-+-+-+
| xml_etree_process   | 103 ms  | 104 ms: 1.01x slower (+1%)  |
+-+-+-+

Not significant (34): 2to3; chameleon; chaos; deltablue;
django_template; dulwich_log; fannkuch; float; go; html5lib;
logging_format; logging_silent; logging_simple; mako; nbody;
pathlib; pickle; pidigits; python_startup; raytrace; regex_compile;
regex_effbot; scimark_lu; scimark_monte_carlo; scimark_sor;
sqlalchemy_declarative; sqlite_synth; sympy_integrate; sympy_sum;
sympy_str; telco; unpack_sequence; unpickle; xml_etree_iterparse
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/a

[Python-Dev] Summary of Python tracker Issues

2019-06-14 Thread Python tracker

ACTIVITY SUMMARY (2019-06-07 - 2019-06-14)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7022 (+15)
  closed 42013 (+70)
  total  49035 (+85)

Open issues with patches: 2826 


Issues opened (54)
==

#36607: asyncio.all_tasks() crashes if asyncio is used in multiple thr
https://bugs.python.org/issue36607  reopened by asvetlov

#36888: Create a way to check that the parent process is alive for dea
https://bugs.python.org/issue36888  reopened by vstinner

#37136: Travis CI: Documentation tests fails with Sphinx 2.1
https://bugs.python.org/issue37136  reopened by njs

#37200: PyType_GenericAlloc might over-allocate memory
https://bugs.python.org/issue37200  opened by nascheme

#37201: fix test_distutils failures for Windows ARM64
https://bugs.python.org/issue37201  opened by Paul Monson

#37205: time.perf_counter() is not system-wide on Windows, in disagree
https://bugs.python.org/issue37205  opened by kh90909

#37206: Incorrect application of Argument Clinic to dict.pop()
https://bugs.python.org/issue37206  opened by rhettinger

#37207: Use PEP 590 vectorcall to speed up calls to range(), list() an
https://bugs.python.org/issue37207  opened by Mark.Shannon

#37208: Weird exception behaviour in ProcessPoolExecutor
https://bugs.python.org/issue37208  opened by Iceflower

#37209: Add what's new entries for pickle enhancements
https://bugs.python.org/issue37209  opened by pitrou

#37211: obmalloc:  eliminate limit on pool size
https://bugs.python.org/issue37211  opened by tim.peters

#37212: ordered keyword arguments in unittest.mock.call repr and error
https://bugs.python.org/issue37212  opened by xtreak

#37214: Add new EncodingWarning warning category: emitted when the loc
https://bugs.python.org/issue37214  opened by vstinner

#37218: Default hmac.new() digestmod has not been removed from documen
https://bugs.python.org/issue37218  opened by Alex.Willmer

#37220: test_idle crash on Windows 2.7 when run with -R:
https://bugs.python.org/issue37220  opened by zach.ware

#37221: PyCode_New API change breaks backwards compatibility policy
https://bugs.python.org/issue37221  opened by ncoghlan

#37222: urllib missing voidresp breaks CacheFTPHandler
https://bugs.python.org/issue37222  opened by danh

#37224: test__xxsubinterpreters failed on AMD64 Windows8.1 Refleaks 3.
https://bugs.python.org/issue37224  opened by vstinner

#37225: Document BaseException constructor
https://bugs.python.org/issue37225  opened by Hong Xu

#37226: Asyncio Fatal Error on SSL Transport - IndexError Deque Index 
https://bugs.python.org/issue37226  opened by ben.brown

#37228: UDP sockets created by create_datagram_endpoint() allow by def
https://bugs.python.org/issue37228  opened by Jukka Väisänen

#37231: Optimize calling special methods
https://bugs.python.org/issue37231  opened by jdemeyer

#37232: Parallel compilation fails because of low ulimit.
https://bugs.python.org/issue37232  opened by kulikjak

#37233: Use _PY_FASTCALL_SMALL_STACK for method_vectorcall
https://bugs.python.org/issue37233  opened by jdemeyer

#37235: urljoin behavior unclear/not following RFC 3986
https://bugs.python.org/issue37235  opened by Matthew Kenigsberg

#37236: fix test_complex for Windows arm64
https://bugs.python.org/issue37236  opened by Paul Monson

#37237: python 2.16 from source on Ubuntu 18.04
https://bugs.python.org/issue37237  opened by Jilguero ostras

#37242: sub-process  would be terminated when registered finalizers ar
https://bugs.python.org/issue37242  opened by mrqianjinsi

#37243: test_sendfile in asyncio crashes when os.sendfile() is not sup
https://bugs.python.org/issue37243  opened by Michael.Felt

#37244: test_multiprocessing_forkserver: test_resource_tracker() faile
https://bugs.python.org/issue37244  opened by vstinner

#37245: Azure Pipeline 3.8 CI: multiple tests hung and timed out on ma
https://bugs.python.org/issue37245  opened by vstinner

#37246: http.cookiejar.DefaultCookiePolicy should use current timestam
https://bugs.python.org/issue37246  opened by xtreak

#37247: swap distutils build_ext and build_py commands to allow proper
https://bugs.python.org/issue37247  opened by jlvandenhout

#37248: support conversion of `func(**{} if a else b)`
https://bugs.python.org/issue37248  opened by Shen Han

#37250: C files generated by Cython set tp_print to NULL: PyTypeObject
https://bugs.python.org/issue37250  opened by vstinner

#37251: Mocking a MagicMock with a function spec results in an AsyncMo
https://bugs.python.org/issue37251  opened by jcline

#37252: devpoll test failures on Solaris
https://bugs.python.org/issue37252  opened by kulikjak

#37254: POST large file to server (using http.server.CGIHTTPRequestHan
https://bugs.python.org/issue37254  opened by shajianrui

#37256: urllib.request.Request documentation erroneously refers to the
https://bugs.python.org/issue3

[Python-Dev] Re: PyAPI_FUNC() is needed to private APIs?

2019-06-14 Thread Jeroen Demeyer

On 2019-06-13 18:03, Inada Naoki wrote:

We don't provide method calling API which uses optimization same to
LOAD_METHOD.   Which may be like this:

/* methname is Unicode, nargs > 0, and args[0] is self. */
PyObject_VectorCallMethod(PyObject *methname, PyObject **args,
Py_ssize_t nargs, PyObject *kwds)


I agree that this would be useful. Minor nitpick: we spell "Vectorcall" 
with a lower-case "c".


There should also be a _Py_Identifier variant _PyObject_VectorcallMethodId

The implementation should be like vectorcall_method from 
Objects/typeobject.c except that _PyObject_GetMethod should be used 
instead of lookup_method() (the difference is that the code for special 
methods like __add__ only looks at the attributes of the type, not the 
instance).



(Would you try adding this?  Or may I?)


Or course you may. Just let me know if you're working on it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FLF74RH3XO4BYOTW2CRRD2GO23P2YUOO/