[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-08 Thread Nathaniel Smith
On Fri, May 8, 2020 at 12:30 AM Sebastian Krause wrote: > > Guido van Rossum wrote: > > Is there some kind of optimized communication possible yet between > > subinterpreters? (Otherwise I still worry that it's no better than > > subprocesses -- and it could be worse because when one > >

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-08 Thread Sebastian Krause
Guido van Rossum wrote: > Is there some kind of optimized communication possible yet between > subinterpreters? (Otherwise I still worry that it's no better than > subprocesses -- and it could be worse because when one > subinterpreter experiences a hard crash or runs out of memory, all > others

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Victor Stinner
Le mer. 6 mai 2020 à 22:10, Serhiy Storchaka a écrit : > I am wondering how much 3.9 will be slower than 3.8 in single-thread > single-interpreter mode after getting rid of all process-wide singletons > and caches (Py_None, Py_True, Py_NonImplemented. small integers, > strings, tuples,

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Gregory P. Smith
On Wed, May 6, 2020 at 1:14 PM Serhiy Storchaka wrote: > 06.05.20 00:46, Victor Stinner пише: > > Subinterpreters and multiprocessing have basically the same speed on > > this benchmark. > > It does not look like there are some advantages of subinterpreters > against multiprocessing. > There is

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Cody Piersall
On Tue, May 5, 2020 at 6:44 PM Joseph Jenne via Python-Dev wrote: > > I'm seeing a drop in performance of both multiprocess and subinterpreter > based runs in the 8-CPU case, where performance drops by about half > despite having enough logical CPUs, while the other cases scale quite > well. Is

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Eric Snow
On Thu, May 7, 2020 at 2:50 AM Emily Bowman wrote: > While large object copies are fairly fast -- I wouldn't say trivial, a > gigabyte copy will introduce noticeable lag when processing enough of them -- > the flip side of having large objects is that you want to avoid having so > many copies

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Emily Bowman
On Wed, May 6, 2020 at 12:36 PM Nathaniel Smith wrote: > > Sure, zero cost is always better than some cost, I'm not denying that > :-). What I'm trying to understand is whether the difference is > meaningful enough to justify subinterpreters' increased complexity, > fragility, and ecosystem

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-07 Thread Paul Moore
On Thu, 7 May 2020 at 01:34, Cameron Simpson wrote: > Maybe I'm missing something, but the example that comes to my mind is > embedding a Python interpreter in an existing nonPython programme. > > My pet one-day-in-the-future example is mutt, whose macro language is... > crude. And mutt is

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Cameron Simpson
On 06May2020 23:05, Serhiy Storchaka wrote: 06.05.20 00:46, Victor Stinner пише: Subinterpreters and multiprocessing have basically the same speed on this benchmark. It does not look like there are some advantages of subinterpreters against multiprocessing. Maybe I'm missing something,

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Serhiy Storchaka
06.05.20 00:46, Victor Stinner пише: Subinterpreters and multiprocessing have basically the same speed on this benchmark. It does not look like there are some advantages of subinterpreters against multiprocessing. I am wondering how much 3.9 will be slower than 3.8 in single-thread

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Nathaniel Smith
On Wed, May 6, 2020 at 10:03 AM Antoine Pitrou wrote: > > On Tue, 5 May 2020 18:59:34 -0700 > Nathaniel Smith wrote: > > On Tue, May 5, 2020 at 3:47 PM Guido van Rossum wrote: > > > > > > This sounds like a significant milestone! > > > > > > Is there some kind of optimized communication

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Nathaniel Smith
On Wed, May 6, 2020 at 5:41 AM Victor Stinner wrote: > > Hi Nathaniel, > > Le mer. 6 mai 2020 à 04:00, Nathaniel Smith a écrit : > > As far as I understand it, the subinterpreter folks have given up on > > optimized passing of objects, and are only hoping to do optimized > > (zero-copy) passing

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Barry Scott
> On 5 May 2020, at 23:40, Guido van Rossum wrote: > > Is there some kind of optimized communication possible yet between > subinterpreters? (Otherwise I still worry that it's no better than > subprocesses -- and it could be worse because when one subinterpreter > experiences a hard crash

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Antoine Pitrou
On Tue, 5 May 2020 18:59:34 -0700 Nathaniel Smith wrote: > On Tue, May 5, 2020 at 3:47 PM Guido van Rossum wrote: > > > > This sounds like a significant milestone! > > > > Is there some kind of optimized communication possible yet between > > subinterpreters? (Otherwise I still worry that it's

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Guido van Rossum
Okay, an image is appearing. It sounds like GIL-free subinterpreters may one day shine because IPC is faster and simpler within one process than between multiple processes. This is not exactly what I got from PEP 554 but it is sufficient for me to have confidence in the project. On Wed, May 6,

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Victor Stinner
Hi Nathaniel, Le mer. 6 mai 2020 à 04:00, Nathaniel Smith a écrit : > As far as I understand it, the subinterpreter folks have given up on > optimized passing of objects, and are only hoping to do optimized > (zero-copy) passing of raw memory buffers. I think that you misunderstood the PEP 554.

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-06 Thread Emily Bowman
Main memory bus or cache contention? Integer execution ports full? Throttling? VTune is useful to find out where the bottleneck is, things like that tend to happen when you start loading every logical core. On Tue, May 5, 2020 at 4:45 PM Joseph Jenne via Python-Dev < python-dev@python.org>

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-05 Thread Nathaniel Smith
On Tue, May 5, 2020 at 3:47 PM Guido van Rossum wrote: > > This sounds like a significant milestone! > > Is there some kind of optimized communication possible yet between > subinterpreters? (Otherwise I still worry that it's no better than > subprocesses -- and it could be worse because when

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-05 Thread Joseph Jenne via Python-Dev
I'm seeing a drop in performance of both multiprocess and subinterpreter based runs in the 8-CPU case, where performance drops by about half despite having enough logical CPUs, while the other cases scale quite well. Is there some issue with python multiprocessing/subinterpreters on the same

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-05 Thread Guido van Rossum
This sounds like a significant milestone! Is there some kind of optimized communication possible yet between subinterpreters? (Otherwise I still worry that it's no better than subprocesses -- and it could be worse because when one subinterpreter experiences a hard crash or runs out of memory, all

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

2020-05-05 Thread Brett Cannon
Just to be clear, this is executing the **same** workload in parallel, **not** trying to parallelize factorial. E.g. the 8 CPU calculation is calculating 50,000! 8 separate times and not calculating 50,000! once by spreading the work across 8 CPUs. This measurement is still showing parallel