[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-05 Thread Mark Shannon




On 03/06/2020 8:49 pm, Pablo Galindo Salgado wrote:

Just some comments on the GC stuff as I added them myself.


Shouldn't GC track *all* objects?
No, extension types need to opt-in to the garbage collector and if so, 
implement the interface.


When you say "GC" I think you mean the backup cycle breaker.
But that's not what it means to me, or in general. GC includes reference 
counting and applies to all objects.


Naming is hard, which is why proper review is important.




Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing

internal implementation details for no good reason.

In python, there is gc.is_tracked() in Python 3.1 and the GC module 
already exposes a lot
of GC functionality since many versions ago. This just allows the same 
calls that you can do

in Python using the C-API.


Just because something is exposed in Python doesn't mean an extra 
function has to be added to the C API. More general functions like 
`PyObject_Call()` exist already to call any Python object.





What is the purpose of PyObject_GC_IsFinalized()?


Because some objects may have been resurrected and this allows you to 
know if a given object
has already been finalized. This can help to gather advance GC stats, to 
control some tricky situations
with finalizers and the gc in C extensions or just to know all objects 
that are being resurrected. Note that
an equivalent gc.is_finalized() was added in 3.8 as well to query this 
information from Python in the GC module

and this call just allows you to do the same from the C-API.

Cheers,
Pablo

On Wed, 3 Jun 2020 at 18:26, Mark Shannon > wrote:


Hi Victor,

On 03/06/2020 2:42 pm, Victor Stinner wrote:
 > Hi,
 >
 > In Python 3.9, I *removed* dozens of functions from the *public* C
 > API, or moved them to the "internal" C API:
 > https://docs.python.org/dev/whatsnew/3.9.html#id3
 >
 > For a few internal C API, I replaced PyAPI_FUNC() with extern to
 > ensure that they cannot be used outside CPython code base: Python 3.9
 > is now built with -fvisibility=hidden on compilers supporting it
(like
 > GCC and clang).
 >
 > I also *added* a bunch of *new* "getter" or "setter" functions to the
 > public C API for my project of hiding implementation details, like
 > making structures opaque:
 > https://docs.python.org/dev/whatsnew/3.9.html#id1

Adding "setters" is generally a bad idea.
"getters" can be computed if the underlying field disappears, but the
same may not be true for setters if the relation is not one-to-one.
I don't think there are any new setters in 3.9, so it's not an
immediate
problem.

 >
 > For example, I added PyThreadState_GetInterpreter() which replaces
 > "tstate->interp", to prepare C extensions for an opaque PyThreadState
 > structure.

`PyThreadState_GetInterpreter()` can't replace `tstate->interp` for two
reasons.
1. There is no way to stop third party C code accessing the
internals of
data structures. We can warn them not to, but that's all.
2. The internal layout of C structures has never been part of the API,
with arguably two exceptions; the PyTypeObject struct and the
`ob_refcnt` field of PyObject.

 >
 > The other 4 new Python 3.9 functions:
 >
 > * PyObject_CallNoArgs(): "most efficient way to call a callable
Python
 > object without any argument"
 > * PyModule_AddType(): "adding a type to a module". I hate the
 > PyObject_AddObject() function which steals a reference on success.
 > * PyObject_GC_IsTracked() and PyObject_GC_IsFinalized(): "query if
 > Python objects are being currently tracked or have been already
 > finalized by the garbage collector respectively": functions requested
 > in bpo-40241.
 >
 > Would you mind to elaborate why you consider that these functions
must
 > not be added to Python 3.9?

I'm not saying that no C functions should be added to the API. I am
saying that none should be added without a PEP or proper review.

Addressing the four function you list.

PyObject_CallNoArgs() seems harmless.
Rationalizing the call API has merit, but PyObject_CallNoArgs()
leads to PyObject_CallOneArg(), PyObject_CallTwoArgs(), etc. and an
even
larger API.

PyModule_AddType(). This seems perfectly reasonable, although if it
is a
straight replacement for another function, that other function
should be
deprecated.

PyObject_GC_IsTracked(). I don't like this.
Shouldn't GC track *all* objects?
Even if it were named PyObject_Cycle_GC_IsTracked() it would be
exposing
internal implementation details for no good reason. A cycle GC that
doesn't "track" individual objects, but treats all objects the same
could be more efficient. In which case, what would this mean?

What is the purpose of PyObject_GC_IsFinalized()?
   

[Python-Dev] Should we be making so many changes in pursuit of PEP 554?

2020-06-05 Thread Mark Shannon

Hi,

There have been a lot of changes both to the C API and to internal 
implementations to allow multiple interpreters in a single O/S process.


These changes cause backwards compatibility changes, have a negative 
performance impact, and cause a lot of churn.


While I'm in favour of PEP 554, or some similar model for parallelism in 
Python, I am opposed to the changes we are currently making to support it.



What are sub-interpreters?
--

A sub-interpreter is a logically independent Python process which 
supports inter-interpreter communication built on shared memory and 
channels. Passing of Python objects is supported, but only by copying, 
not by reference. Data can be shared via buffers.



How can they be implemented to support parallelism?
---

There are two obvious options.
a) Many sub-interpreters in a single O/S process. I will call this the 
many-to-one model (many interpreters in one O/S process).
b) One sub-interpreter per O/S process. This is what we currently have 
for multiprocessing. I will call this the one-to-one model (one 
interpreter in one O/S process).


There seems to be an assumption amongst those working on PEP 554 that 
the many-to-one model is the only way to support sub-interpreters that 
can execute in parallel.

This isn't true. The one-to-one model has many advantages.


Advantages of the one-to-one model
--

1. It's less bug prone. It is much easier to reason about code working 
in a single address space. Most code assumes


2. It's more secure. Separate O/S processes provide a much stronger 
boundary between interpreters. This is why some browsers use separate 
processes for browser tabs.


3. It can be implemented on top of the multiprocessing module, for 
testing. A more efficient implementation can be developed once 
sub-interpreters prove useful.


4. The required changes should have no negative performance impact.

5. Third party modules should continue to work as they do now.

6. It takes much less work :)


Performance
---

Creating O/S processes is usually considered to be slow. Whilst 
processes are undoubtedly slower to create than threads, the absolute 
time to create a process is small; well under 1ms on linux.


Creating a new sub-interpreter typically requires importing quite a few 
modules before any useful work can be done.
The time spent doing these imports will dominate the time to create an 
O/S process or thread.


If sub-interpreters are to be used for parallelism, there is no need to 
have many more sub-interpreters than CPU cores, so the overhead should 
be small. For additional concurrency, threads or coroutines can be used.


The one-to-one model is faster as it uses the hardware for interpreter
separation, whereas the many-to-one model must use software.
Process separation by the hardware virtual memory system has zero cost.
Separation done in software needs extra memory reads when doing 
allocation or deallocation.


Overall, for any interpreter that runs for a second or more, it is 
likely that the one-to-one model would be faster.



Timings of multiprocessing & threads on my machine (6-core 2019 laptop)
---

#Threads

def foo():
pass

def spawn_and_join(count):
threads = [ Thread(target=foo, args=()) for _ in range(count) ]
for t in threads:
t.start()
for t in threads:
t.join()

spawn_and_join(1000)

# Processes

def spawn_and_join(count):
processes = [ Process(target=foo, args=()) for _ in range(count) ]
for p in processes:
p.start()
for p in processes:
p.join()

spawn_and_join(1000)

Wall clock time for threads:
86ms. Less than 0.1ms per thread.

Wall clock time for processes:
370ms. Less than 0.4ms per process.

Processes are slower, but plenty fast enough.


Cheers,
Mark.




___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5YNWDIYECDQDYQ7IFYJS6K5HUDUAWTT6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Should we be making so many changes in pursuit of PEP 554?

2020-06-05 Thread Edwin Zimmerman
> Hi,
> 
> There have been a lot of changes both to the C API and to internal
> implementations to allow multiple interpreters in a single O/S process.
> 
> These changes cause backwards compatibility changes, have a negative
> performance impact, and cause a lot of churn.
> 
> While I'm in favour of PEP 554, or some similar model for parallelism in
> Python, I am opposed to the changes we are currently making to support it.
> 
> 
> What are sub-interpreters?
> --
> 
> A sub-interpreter is a logically independent Python process which
> supports inter-interpreter communication built on shared memory and
> channels. Passing of Python objects is supported, but only by copying,
> not by reference. Data can be shared via buffers.
> 
> 
> How can they be implemented to support parallelism?
> ---
> 
> There are two obvious options.
> a) Many sub-interpreters in a single O/S process. I will call this the
> many-to-one model (many interpreters in one O/S process).
> b) One sub-interpreter per O/S process. This is what we currently have
> for multiprocessing. I will call this the one-to-one model (one
> interpreter in one O/S process).
> 
> There seems to be an assumption amongst those working on PEP 554 that
> the many-to-one model is the only way to support sub-interpreters that
> can execute in parallel.
> This isn't true. The one-to-one model has many advantages.
> 
> 
> Advantages of the one-to-one model
> --
> 
> 1. It's less bug prone. It is much easier to reason about code working
> in a single address space. Most code assumes

I'm curious where reasoning about address spaces comes into writing Python 
code?  I can't say that address space has ever been a
concern to me when coding in Python.

> 2. It's more secure. Separate O/S processes provide a much stronger
> boundary between interpreters. This is why some browsers use separate
> processes for browser tabs.
> 
> 3. It can be implemented on top of the multiprocessing module, for
> testing. A more efficient implementation can be developed once
> sub-interpreters prove useful.
> 
> 4. The required changes should have no negative performance impact.
> 
> 5. Third party modules should continue to work as they do now.
> 
> 6. It takes much less work :)
> 
> 
> Performance
> ---
> 
> Creating O/S processes is usually considered to be slow. Whilst
> processes are undoubtedly slower to create than threads, the absolute
> time to create a process is small; well under 1ms on linux.
> 
> Creating a new sub-interpreter typically requires importing quite a few
> modules before any useful work can be done.
> The time spent doing these imports will dominate the time to create an
> O/S process or thread.
> If sub-interpreters are to be used for parallelism, there is no need to
> have many more sub-interpreters than CPU cores, so the overhead should
> be small. For additional concurrency, threads or coroutines can be used.
> 
> The one-to-one model is faster as it uses the hardware for interpreter
> separation, whereas the many-to-one model must use software.
> Process separation by the hardware virtual memory system has zero cost.
> Separation done in software needs extra memory reads when doing
> allocation or deallocation.
> 
> Overall, for any interpreter that runs for a second or more, it is
> likely that the one-to-one model would be faster.
> 
> 
> Timings of multiprocessing & threads on my machine (6-core 2019 laptop)
> ---
> 
> #Threads
> 
> def foo():
>  pass
> 
> def spawn_and_join(count):
>  threads = [ Thread(target=foo, args=()) for _ in range(count) ]
>  for t in threads:
>  t.start()
>  for t in threads:
>  t.join()
> 
> spawn_and_join(1000)
> 
> # Processes
> 
> def spawn_and_join(count):
>  processes = [ Process(target=foo, args=()) for _ in range(count) ]
>  for p in processes:
>  p.start()
>  for p in processes:
>  p.join()
> 
> spawn_and_join(1000)
> 
> Wall clock time for threads:
> 86ms. Less than 0.1ms per thread.
> 
> Wall clock time for processes:
> 370ms. Less than 0.4ms per process.
> 
> Processes are slower, but plenty fast enough.
> 
> 
> Cheers,
> Mark.
> 
> 
> 
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-
> d...@python.org/message/5YNWDIYECDQDYQ7IFYJS6K5HUDUAWTT6/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
ht

[Python-Dev] Re: Summary of Python tracker Issues

2020-06-05 Thread Eric Snow
On Fri, May 29, 2020 at 12:16 PM Python tracker  wrote:
> ACTIVITY SUMMARY (2020-05-22 - 2020-05-29)
> Python tracker at https://bugs.python.org/
>
> To view or respond to any of the issues listed below, click on the issue.
> Do NOT respond to this message.
>
> Issues counts and deltas:
>   open7487 ( +9)
>   closed 45080 (+80)
>   total  52567 (+89)
>
> ...

How hard would it be to add PRs (in the same way) to this weekly report?

Also, where is the script for this hosted and where is the source repo
(if any)?  it might be helpful to have a link back to that info,
perhaps somewhere in the devguide.

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WCUONFB5MRSC6LHWT442QBF7FBN7TGQJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should we be making so many changes in pursuit of PEP 554?

2020-06-05 Thread Ethan Furman

On 06/05/2020 07:32 AM, Mark Shannon wrote:


3. It can be implemented on top of the multiprocessing module, for testing. A 
more efficient implementation can be developed once sub-interpreters prove 
useful.


Isn't part of the impetus for in-process sub-interpreters the 
Python-embedded-in-language-X use-case?  Isn't multiprocessing a poor solution 
then?

--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BEGS7PQANGGSDLIPE7NIGNEGIQWNGSL3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should we be making so many changes in pursuit of PEP 554?

2020-06-05 Thread Eric V. Smith

On 6/5/2020 11:11 AM, Edwin Zimmerman wrote:

Advantages of the one-to-one model
--

1. It's less bug prone. It is much easier to reason about code working
in a single address space. Most code assumes

I'm curious where reasoning about address spaces comes into writing Python 
code?  I can't say that address space has ever been a
concern to me when coding in Python.


I don't know enough about Python code with subinterpreters to comment 
there. But for the C code that makes up much of CPython: it's very 
difficult to inspect code and know you aren't accidentally sharing 
objects between interpreters.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZETIVQERA6Q37BTTOIWYXBKXO432OR2L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-05 Thread Eric Snow
On Wed, Jun 3, 2020 at 7:12 AM Mark Shannon  wrote:
> The size of the C API, as measured by `git grep PyAPI_FUNC | wc -l` has
> been steadily increasing over the last few releases.
>
> 3.5 1237
> 3.6 1304
> 3.7 1408
> 3.8 1478
> 3.9 1518
>
>
> For reference the 2.7 branch has "only" 973 functions

It isn't as bad as that.  Here I'm only looking at PyAPI_FUNC under
Include/.  From 3.5 to master the *public* C-API has increased by 71
functions (and the "private"/internal C-API by 189).  "Private" is
functions starting with "_" and

VER TOT PUB + "_"
2.7   932  (752 + 178)
3.5  1181 (846 + 320)
3.6  1247 (851 + 380)
3.7  1350 (875 + 460 + 13 internal)
3.8  1424 (908 + 422 + 79 internal)
3.9  1447 (917 + 403 + 110 internal)
m1443 (917 + 401 + 108 internal)

(This does not count changes in the number of macros, which may have
gone down...or not.)

FWIW, relative to the "cpython" API split that happened in 3.8 (and
"internal" in 3.7):

VERtotal  Include/*.h   Include/cpython/*.h
Include/internal/*.h
2.7  932  932  (752 + 178)   -   -
3.51181 1181 (846 + 320)-   -
3.612471247 (851 + 380)-   -
3.713501350 (875 + 460)-13 (0 + 13)
3.814241050 (800 + 249)   295 (108 + 173) 79 (0 + 79)
3.91447  944 (789 + 153)   393 (128 + 250)   110 (105 + 5)
m  1443  941 (789 + 150)   394 (128 + 251)   108 (103 + 5)

Here's the "command" I ran:

for pat in 'Include/' 'Include/*.h' 'Include/cpython/*.h'
'Include/internal/*.h'; do
  echo " -- $pat --"
  echo $(git grep 'PyAPI_FUNC(' -- $pat | wc -l) '('$(git grep
'PyAPI_FUNC(.*) [^_]' -- $pat | wc -l) '+' $(git grep 'PyAPI_FUNC(.*)
[_]' -- $pat | wc -l)')'
done


> Every one of these functions represents a maintenance burden.
> Removing them is painful and takes a lot of effort, but adding them is
> done casually, without a PEP or, in many cases, even a review.

I agree with regards to the public C-API, particularly the stable API.

> We need to address what to do about the C API in the long term, but for
> now can we just stop making it larger? Please.
>
> Also, can we remove all the new API functions added in 3.9 before the
> release and it is too late?

In 3.9 we have added 9 functions to the public C-API and removed 19
from the "private" C-API.  The "internal" C-API grew by 31, but I
don't see the point in changing any of those.

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FHC5SWV6JTF4FQ4TZWLHVEJ5S22KPBFM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Summary of Python tracker Issues

2020-06-05 Thread Python tracker

ACTIVITY SUMMARY (2020-05-29 - 2020-06-05)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7444 (-43)
  closed 45180 (+100)
  total  52624 (+57)

Open issues with patches: 3008 


Issues opened (49)
==

#25782: CPython hangs on error __context__ set to the error itself
https://bugs.python.org/issue25782  reopened by serhiy.storchaka

#37483: Add PyObject_CallOneArg()
https://bugs.python.org/issue37483  reopened by vstinner

#40679: show class name in method invocation TypeError
https://bugs.python.org/issue40679  reopened by vstinner

#40805: Can no longer patch flask.g
https://bugs.python.org/issue40805  reopened by cjw296

#40821: os.getlogin() not working
https://bugs.python.org/issue40821  opened by Manickaraja Kumarappan

#40822: Drop support for SQLite pre 3.7.15
https://bugs.python.org/issue40822  opened by erlendaasland

#40823: Don't use obsolete unittest.makeSuite() in sqlite3 tests
https://bugs.python.org/issue40823  opened by erlendaasland

#40824: Unexpected errors in __iter__ are masked in "in" and the opera
https://bugs.python.org/issue40824  opened by serhiy.storchaka

#40825: Add a "strict" parameter to csv.writer and csv.DictWriter
https://bugs.python.org/issue40825  opened by eric.smith

#40826: PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Re
https://bugs.python.org/issue40826  opened by Jelle Zijlstra

#40827: os.readlink should support getting the target's printname in W
https://bugs.python.org/issue40827  opened by eryksun

#40828: shared memory problems with multiprocessing.Pool
https://bugs.python.org/issue40828  opened by trapezoid677

#40830: Certain uses of dictionary unpacking raise TypeError
https://bugs.python.org/issue40830  opened by Kodiologist

#40832: hi param in bisect module should not accept negative values
https://bugs.python.org/issue40832  opened by samuel72

#40833: Clarify docstring of Path.rename
https://bugs.python.org/issue40833  opened by cool-RR

#40834: sending str via channel caused truncate on last character
https://bugs.python.org/issue40834  opened by asaka

#40835: Incorrect handling for msgctxt in msgfmt.py
https://bugs.python.org/issue40835  opened by da1910

#40836: logging.fatal() and logging.Logger.fatal() should raise a Depr
https://bugs.python.org/issue40836  opened by remi.lapeyre

#40837: email.utils.encode_rfc2231(string, None, None) returns broken 
https://bugs.python.org/issue40837  opened by spaceone

#40838: inspect.getsourcefile documentation doesn't mention it can ret
https://bugs.python.org/issue40838  opened by pekka.klarck

#40840: lzma.h file not found building on macOS
https://bugs.python.org/issue40840  opened by jaraco

#40841: Provide mimetypes.sniff API as stdlib
https://bugs.python.org/issue40841  opened by corona10

#40842: _Pickler_CommitFrame() always returns 0 and its return code is
https://bugs.python.org/issue40842  opened by remi.lapeyre

#40843: tarfile: ignore_zeros = True exceedingly slow on a sparse tar 
https://bugs.python.org/issue40843  opened by mxmlnkn

#40846: Misleading line in documentation
https://bugs.python.org/issue40846  opened by J Arun Mani

#40847: New parser considers empty line following a backslash to be a 
https://bugs.python.org/issue40847  opened by adamwill

#40849: Expose X509_V_FLAG_PARTIAL_CHAIN ssl flag
https://bugs.python.org/issue40849  opened by l0x

#40851: subprocess.Popen: impossible to show console window when shell
https://bugs.python.org/issue40851  opened by akdor1154

#40854: [Patch] Allow overriding sys.platlibdir
https://bugs.python.org/issue40854  opened by smani

#40855: statistics.stdev ignore xbar argument
https://bugs.python.org/issue40855  opened by Folket

#40856: IDLE line numbering should be light gray
https://bugs.python.org/issue40856  opened by rhettinger

#40857: tempfile.TemporaryDirectory() context manager can fail to prop
https://bugs.python.org/issue40857  opened by granchester

#40858: ntpath.realpath fails for broken symlinks with rooted target p
https://bugs.python.org/issue40858  opened by eryksun

#40859: Update Windows build to use xz-5.2.5
https://bugs.python.org/issue40859  opened by Ma Lin

#40860: Exception in multiprocessing/context.py under load
https://bugs.python.org/issue40860  opened by Arkady M

#40861: On Windows, liblzma is always built without optimization
https://bugs.python.org/issue40861  opened by nnemkin

#40862: argparse.BooleanOptionalAction accept and silently discard its
https://bugs.python.org/issue40862  opened by remi.lapeyre

#40864: spec_set/autospec/spec seems to not be reading attributes defi
https://bugs.python.org/issue40864  opened by efagerberg

#40866: Use PyModule_AddType() in posix module initialisation
https://bugs.python.org/issue40866  opened by erlendaasland

#40867: Remove unused include in Module/_randommodule.c
https://bugs.python.org/issue4086