[Python-Dev] Should we be making so many changes in pursuit of PEP 554?

Edwin Zimmerman Fri, 05 Jun 2020 08:19:14 -0700

> Hi,
> 
> There have been a lot of changes both to the C API and to internal
> implementations to allow multiple interpreters in a single O/S process.
> 
> These changes cause backwards compatibility changes, have a negative
> performance impact, and cause a lot of churn.
> 
> While I'm in favour of PEP 554, or some similar model for parallelism in
> Python, I am opposed to the changes we are currently making to support it.
> 
> 
> What are sub-interpreters?
> --------------------------
> 
> A sub-interpreter is a logically independent Python process which
> supports inter-interpreter communication built on shared memory and
> channels. Passing of Python objects is supported, but only by copying,
> not by reference. Data can be shared via buffers.
> 
> 
> How can they be implemented to support parallelism?
> ---------------------------------------------------
> 
> There are two obvious options.
> a) Many sub-interpreters in a single O/S process. I will call this the
> many-to-one model (many interpreters in one O/S process).
> b) One sub-interpreter per O/S process. This is what we currently have
> for multiprocessing. I will call this the one-to-one model (one
> interpreter in one O/S process).
> 
> There seems to be an assumption amongst those working on PEP 554 that
> the many-to-one model is the only way to support sub-interpreters that
> can execute in parallel.
> This isn't true. The one-to-one model has many advantages.
> 
> 
> Advantages of the one-to-one model
> ----------------------------------
> 
> 1. It's less bug prone. It is much easier to reason about code working
> in a single address space. Most code assumes


I'm curious where reasoning about address spaces comes into writing Python 
code?  I can't say that address space has ever been a
concern to me when coding in Python.

> 2. It's more secure. Separate O/S processes provide a much stronger
> boundary between interpreters. This is why some browsers use separate
> processes for browser tabs.
> 
> 3. It can be implemented on top of the multiprocessing module, for
> testing. A more efficient implementation can be developed once
> sub-interpreters prove useful.
> 
> 4. The required changes should have no negative performance impact.
> 
> 5. Third party modules should continue to work as they do now.
> 
> 6. It takes much less work :)
> 
> 
> Performance
> -----------
> 
> Creating O/S processes is usually considered to be slow. Whilst
> processes are undoubtedly slower to create than threads, the absolute
> time to create a process is small; well under 1ms on linux.
> 
> Creating a new sub-interpreter typically requires importing quite a few
> modules before any useful work can be done.
> The time spent doing these imports will dominate the time to create an
> O/S process or thread.
> If sub-interpreters are to be used for parallelism, there is no need to
> have many more sub-interpreters than CPU cores, so the overhead should
> be small. For additional concurrency, threads or coroutines can be used.
> 
> The one-to-one model is faster as it uses the hardware for interpreter
> separation, whereas the many-to-one model must use software.
> Process separation by the hardware virtual memory system has zero cost.
> Separation done in software needs extra memory reads when doing
> allocation or deallocation.
> 
> Overall, for any interpreter that runs for a second or more, it is
> likely that the one-to-one model would be faster.
> 
> 
> Timings of multiprocessing & threads on my machine (6-core 2019 laptop)
> -----------------------------------------------------------------------
> 
> #Threads
> 
> def foo():
>      pass
> 
> def spawn_and_join(count):
>      threads = [ Thread(target=foo, args=()) for _ in range(count) ]
>      for t in threads:
>          t.start()
>      for t in threads:
>          t.join()
> 
> spawn_and_join(1000)
> 
> # Processes
> 
> def spawn_and_join(count):
>      processes = [ Process(target=foo, args=()) for _ in range(count) ]
>      for p in processes:
>          p.start()
>      for p in processes:
>          p.join()
> 
> spawn_and_join(1000)
> 
> Wall clock time for threads:
> 86ms. Less than 0.1ms per thread.
> 
> Wall clock time for processes:
> 370ms. Less than 0.4ms per process.
> 
> Processes are slower, but plenty fast enough.
> 
> 
> Cheers,
> Mark.
> 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-
> d...@python.org/message/5YNWDIYECDQDYQ7IFYJS6K5HUDUAWTT6/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ASE7D7Q62WL22QNC4XMUGI56HNGOWRWH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Should we be making so many changes in pursuit of PEP 554?

Reply via email to