Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 14 September 2017 at 08:25, Nick Coghlanwrote: > On 13 September 2017 at 20:45, Koos Zevenhoven wrote: >> It's still just *an* interpreter that happens to run __main__. And who says >> it even needs to be the only one? > > Koos, I've asked multiple times now for you to describe the practical > user benefits you believe will come from dispensing with the existing > notion of a main interpreter (which is *not* something PEP 554 has > created - the main interpreter already exists at the implementation > level, PEP 554 just makes that fact visible at the Python level). Eric addressed this in the latest update, and took the view that since it's a question the can be deferred, it's one that should be deferred, in line with the overall "minimal enabling infrastructure" philosophy of the PEP. On thinking about it further, I believe this may also intersect with some open questions I have around the visibility of *thread* objects across interpreters - the real runtime constraint at the implementation level is the fact that we need a main *thread* in order to sensibly manage the way signal handling works across different platforms, and that's where we may get into trouble if we allow arbitrary subinterpreters to run in the main thread, and accept and process signals directly. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 13 September 2017 at 14:10, Nathaniel Smithwrote: > Subinterpreters are basically an attempt to reimplement the OS's > process isolation in user-space, right? Not really, they're more an attempt to make something resembling Rust's memory model available to Python programs - having the default behaviour be "memory is not shared", but having the choice to share when you want to be entirely an application level decision, without getting into the kind of complexity needed to deliberately break operating system level process isolation. The difference is that where Rust was able to do that on a per-thread basis and rely on their borrow checker for enforcement of memory ownership, for PEP 554, we're proposing to do it on a per-interpreter basis, and rely on runtime object space partitioning (where Python objects and the memory allocators are *not* shared between interpreters) to keep things separated from each other. That's why memoryview is such a key part of making the proposal interesting: it's what lets us relatively easily poke holes in the object level partitioning between interpreters and provide zero-copy messaging passing without having to share any regular reference counts between interpreters (which in turn is what makes it plausible that we may eventually be able to switch to a true GIL-per-interpreter model, with only a few cross-interpreter locks for operations like accessing the list of interpreters itself). Right now, the closest equivalent to this programming model that Python offers is to combine threads with queue.Queue, and it requires a lot of programming discipline to ensure that you don't access an object again once you've submitted to a queue. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 13 September 2017 at 20:45, Koos Zevenhovenwrote: > On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlan wrote: >> >> On 13 September 2017 at 00:35, Koos Zevenhoven wrote:> >> >> > I don't see how the situation benefits from calling something the "main >> > interpreter". Subinterpreters can be a way to take something >> > non-thread-safe >> > and make it thread-safe, because in an interpreter-per-thread scheme, >> > most >> > of the state, like module globals, are thread-local. (Well, this doesn't >> > help for async concurrency, but anyway.) >> >> "The interpreter that runs __main__" is never going to go away as a >> concept for the regular CPython CLI. > > > It's still just *an* interpreter that happens to run __main__. And who says > it even needs to be the only one? Koos, I've asked multiple times now for you to describe the practical user benefits you believe will come from dispensing with the existing notion of a main interpreter (which is *not* something PEP 554 has created - the main interpreter already exists at the implementation level, PEP 554 just makes that fact visible at the Python level). If you can't come up with a meaningful user benefit that would arise from removing it, then please just let the matter drop. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlanwrote: > On 13 September 2017 at 00:35, Koos Zevenhoven wrote:> > > I don't see how the situation benefits from calling something the "main > > interpreter". Subinterpreters can be a way to take something > non-thread-safe > > and make it thread-safe, because in an interpreter-per-thread scheme, > most > > of the state, like module globals, are thread-local. (Well, this doesn't > > help for async concurrency, but anyway.) > > "The interpreter that runs __main__" is never going to go away as a > concept for the regular CPython CLI. > It's still just *an* interpreter that happens to run __main__. And who says it even needs to be the only one? > > Right now, its also a restriction even for applications like mod_wsgi, > since the GIL state APIs always register C created threads with the > main interpreter. > > >> That's OK - it just means we'll aim to make as many > >> things as possible implicitly subinterpreter-friendly, and for > >> everything else, we'll aim to minimise the adjustments needed to > >> *make* things subinterpreter friendly. > >> > > > > And that's exactly what I'm after here! > > No, you're after deliberately making the proposed API > non-representative of how the reference implementation actually works > because of a personal aesthetic preference rather than asking yourself > what the practical benefit of hiding the existence of the main > interpreter would be. > > The fact is that the main interpreter *is* special (just as the main > thread is special), and your wishing that things were otherwise won't > magically make it so. > I'm not questioning whether the main interpreter is special, or whether the interpreters may differ from each other. I'm questioning the whole concept of "main interpreter". People should not care about which interpreter is "the main ONE". They should care about what properties an interpreter has. That's not aesthetics. Just look at, e.g. the _decimal/_pydecimal examples in this thread. > I'm mostly just worried about the `get_main()` function. Maybe it should > be > > called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first > > running interpreter just introduce itself to its children? And if that's > too > > much to ask, maybe there could be a `get_parent()` function, which would > > give you the interpreter that spawned the current subinterpreter. > > If the embedding application never calls > "_Py_ConfigureMainInterpreter", then get_main() could conceivably > return None. However, we don't expose that as a public API yet, so for > the time being, Py_Initialize() will always call it, and hence there > will always be a main interpreter (even in things like mod_wsgi). > > You don't need to remove _Py_ConfigureMainInterpreter. Just make sure you don't try to smuggle it into the status quo of the possibly upcoming new stdlib module. Who knows what the function does anyway, let alone what it might or might not do in the future. Of course that doesn't mean that there couldn't be ways to configure an interpreter, but coupling that with a concept of a "main interpreter", as you suggest, doesn't seem to make any sense. And surely the code that creates a new interpreter should know if it wants the new interpreter to start with `__name__ == "__main__"` or `__name__ == "__just_any__", if there is a choice. ––Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Tue, Sep 12, 2017 at 1:46 PM, Eric Snowwrote: > On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith wrote: >> On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow >> wrote: >>> My concern is that this is a chicken-and-egg problem. The situation >>> won't improve until subinterpreters are more readily available. >> >> Okay, but you're assuming that "more libraries work well with >> subinterpreters" is in fact an improvement. I'm asking you to convince >> me of that :-). Are there people saying "oh, if only subinterpreters >> had a Python API and less weird interactions with C extensions, I >> could do "? So far they haven't exactly taken the >> world by storm... > > The problem is that most people don't know about the feature. And > even if they do, using it requires writing a C-extension, which most > people aren't comfortable doing. > >>> Other than C globals, is there some other issue? >> >> That's the main one I'm aware of, yeah, though I haven't looked into it >> closely. > > Oh, good. I haven't missed something. :) Do you know how often > subinterpreter support is a problem for users? I was under the > impression from your earlier statements that this is a recurring issue > but my understanding from mod_wsgi is that it isn't that common. It looks like we've been averaging one bug report every ~6 months for the last 3 years: https://github.com/numpy/numpy/issues?utf8=%E2%9C%93=is%3Aissue%20subinterpreter%20OR%20subinterpreters They mostly come from Jep, not mod_wsgi. (Possibly because Jep has some built-in numpy integration.) I don't know how many people file bugs versus just living with it or finding some workaround. I suspect for mod_wsgi in particular they probably switch to something else -- it's not like there's any shortage of WSGI servers that avoid these problems. And for Jep there are prominent warnings to expect problems and suggesting workarounds: https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions >> I guess I would be much more confident in the possibilities here if >> you could give: >> >> - some hand-wavy sketch for how subinterpreter A could call a function >> that as originally defined in subinterpreter B without the GIL, which >> seems like a precondition for sharing user-defined classes > > (Before I respond, note that this is way outside the scope of the PEP. > The merit of subinterpreters extends beyond any benefits of running > sans-GIL, though that is my main goal. I've been updating the PEP to > (hopefully) better communicate the utility of subinterpreters.) Subinterpreters are basically an attempt to reimplement the OS's process isolation in user-space, right? Classic trade-off where we accept added complexity and fragility in the hopes of gaining some speed? I just looked at the PEP again, and I'm afraid I still don't understand what the benefits are unless we can remove the GIL and somehow get a speedup over processes. Implementing CSP is a neat idea, but you could do it with subprocesses too. AFAICT you could implement the whole subinterpreters module API with subprocesses on 3.6, and it'd be multi-core and have perfect extension module support. > Code objects are immutable so that part should be relatively > straight-forward. There's the question of closures and default > arguments that would have to be resolved. However, those are things > that would need to be supported anyway in a world where we want to > pass functions and user-defined types between interpreters. Doing so > will be a gradual process of starting with immutable non-container > builtin types and expanding out from there to other immutable types, > including user-defined ones. I tried arguing that code objects were immutable to the PyPy devs too :-). The problem is that to call a function you need both its __code__, which is immutable, and its __globals__, which is emphatically not. The __globals__ thing means that if you start from an average function you can often follow pointers to reach every other global object (e.g. if the function uses regular expressions, you can probably reach any module by doing func.__globals__["re"].sys.modules[...]). You might hope that you could somehow restrict this, but I can't think of any way that's really useful :-(. > > Note that sharing mutable objects between interpreters would be a > pretty advanced usage (i.e. opt-in shared state vs. threading's > share-everything). If it proves desirable then we'd sort that out > then. However, I don't see that as a more than an esoteric feature > relative to subinterpreters. > > In my mind, the key advantage of being able to share more (immutable) > objects, including user-defined types, between interpreters is in the > optimization opportunities. But even if we can add new language features for "freezing" user-defined objects, then their .__class__ will still be mutable, their methods will still have mutable
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 13 September 2017 at 00:35, Koos Zevenhovenwrote: > On Tue, Sep 12, 2017 at 1:40 PM, Nick Coghlan wrote: >> >> On 11 September 2017 at 18:02, Koos Zevenhoven wrote: >> > On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan >> > wrote: >> >> The line between it and the "CPython Runtime" is fuzzy for both >> >> practical and historical reasons, but the regular Python CLI will >> >> always have a "first created, last destroyed" main interpreter, simply >> >> because we don't really gain anything significant from eliminating it >> >> as a concept. >> > >> > I fear that emphasizing the main interpreter will lead to all kinds of >> > libraries/programs that somehow unnecessarily rely on some or all tasks >> > being performed in the main interpreter. Then you'll have a hard time >> > running two of them in parallel in the same process, because you don't >> > have >> > two main interpreters. >> >> You don't need to fear this scenario, since it's a description of the >> status quo (and it's the primary source of overstated claims about >> subinterpreters being "fundamentally broken"). >> > > Well, if that's true, it's hardly a counter-argument to what I said. Anyway, > there is no status quo about what is proposed in the PEP. Yes, there is, since subinterpreters are an existing feature of the CPython implementation. What's new in the PEP is the idea of giving that feature a Python level API so that it's available to regular Python programs, rather than only being available to embedding applications that choose to use it (e.g. mod_wsgi). > And as long as the existing APIs are preserved, why not make the new one > less susceptible to overstated fundamental brokenness? Having a privileged main interpreter isn't fundamentally broken, since you aren't going to run __main__ in more than one interpreter, just as you don't run __main__ in more than one thread (and multiprocessing deliberately avoids running the "if __name__ == '__main__'" sections of it in more than one process). >> So no, not everything will be subinterpreter-friendly, just as not >> everything in Python is thread-safe, and not everything is portable >> across platforms. > > I don't see how the situation benefits from calling something the "main > interpreter". Subinterpreters can be a way to take something non-thread-safe > and make it thread-safe, because in an interpreter-per-thread scheme, most > of the state, like module globals, are thread-local. (Well, this doesn't > help for async concurrency, but anyway.) "The interpreter that runs __main__" is never going to go away as a concept for the regular CPython CLI. Right now, its also a restriction even for applications like mod_wsgi, since the GIL state APIs always register C created threads with the main interpreter. >> That's OK - it just means we'll aim to make as many >> things as possible implicitly subinterpreter-friendly, and for >> everything else, we'll aim to minimise the adjustments needed to >> *make* things subinterpreter friendly. >> > > And that's exactly what I'm after here! No, you're after deliberately making the proposed API non-representative of how the reference implementation actually works because of a personal aesthetic preference rather than asking yourself what the practical benefit of hiding the existence of the main interpreter would be. The fact is that the main interpreter *is* special (just as the main thread is special), and your wishing that things were otherwise won't magically make it so. > I'm mostly just worried about the `get_main()` function. Maybe it should be > called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first > running interpreter just introduce itself to its children? And if that's too > much to ask, maybe there could be a `get_parent()` function, which would > give you the interpreter that spawned the current subinterpreter. If the embedding application never calls "_Py_ConfigureMainInterpreter", then get_main() could conceivably return None. However, we don't expose that as a public API yet, so for the time being, Py_Initialize() will always call it, and hence there will always be a main interpreter (even in things like mod_wsgi). Whether we invest significant effort in making configuring the main interpreter genuinely optional is still an open question - since most applications are free to just not use the main interpreter for code execution if they don't want to, we haven't found a real world use case that would benefit meaningfully from its non-existence (just as the vast majority of applications don't care about the various ways in which the main thread that runs Py_Initialize() and Py_Finalize() is given special treatment, and for those that do, they're free to avoid using it). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Sun, Sep 10, 2017 at 12:14 PM, Antoine Pitrouwrote: > What could improve performance significantly would be to share objects > without any form of marshalling; but it's not obvious it's possible in > the subinterpreters model *if* it also tries to remove the GIL. Yep. This is one of the main challenges relative to the goal of fully utilizing multiple cores. -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Sun, Sep 10, 2017 at 7:52 AM, Koos Zevenhovenwrote: > I assume the concept of a main interpreter is inherited from the previous > levels of support in the C API, but what exactly is the significance of > being "the main interpreter"? Instead, could they just all be > subinterpreters of the same Python process (or whatever the right wording > would be)? > > It might also be helpful if the PEP had a short description of what are > considered subinterpreters and how they differ from threads of the same > interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of > the previously available concepts. However, as this would be a new module, I > don't think there's any need to blindly copy the previous design, regardless > of how well the design may have served its purpose at the time. I've updated the PEP to be more instructive. I've also dropped the "get_main()" function from the PEP. -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
Yep. See http://bugs.python.org/issue10915 and http://bugs.python.org/issue15751. The issue of C-extension support for subinterpreters is, of course, a critical one here. At the very least, incompatible modules should be able to opt out of subinterpreter support. I've updated the PEP to discuss this. -eric On Sun, Sep 10, 2017 at 3:18 AM, Ronald Oussorenwrote: > >> On 8 Sep 2017, at 05:11, Eric Snow wrote: > >> On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith wrote: >> >>> Numpy is the one I'm >>> most familiar with: when we get subinterpreter bugs we close them >>> wontfix, because supporting subinterpreters properly would require >>> non-trivial auditing, add overhead for non-subinterpreter use cases, >>> and benefit a tiny tiny fraction of our users. >> >> The main problem of which I'm aware is C globals in libraries and >> extension modules. PEPs 489 and 3121 are meant to help but I know >> that there is at least one major situation which is still a blocker >> for multi-interpreter-safe module state. Other than C globals, is >> there some other issue? > > There’s also the PyGilState_* API that doesn't support multiple interpreters. > > The issue there is that callbacks from external libraries back into python > need to use the correct subinterpreter. > > Ronald ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smithwrote: > On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow wrote: >> My concern is that this is a chicken-and-egg problem. The situation >> won't improve until subinterpreters are more readily available. > > Okay, but you're assuming that "more libraries work well with > subinterpreters" is in fact an improvement. I'm asking you to convince > me of that :-). Are there people saying "oh, if only subinterpreters > had a Python API and less weird interactions with C extensions, I > could do "? So far they haven't exactly taken the > world by storm... The problem is that most people don't know about the feature. And even if they do, using it requires writing a C-extension, which most people aren't comfortable doing. >> Other than C globals, is there some other issue? > > That's the main one I'm aware of, yeah, though I haven't looked into it > closely. Oh, good. I haven't missed something. :) Do you know how often subinterpreter support is a problem for users? I was under the impression from your earlier statements that this is a recurring issue but my understanding from mod_wsgi is that it isn't that common. >> I'm fine with Nick's idea about making this a "provisional" module. >> Would that be enough to ease your concern here? > > Potentially, yeah -- basically I'm fine with anything that doesn't end > up looking like python-dev telling everyone "subinterpreters are the > future! go forth and yell at any devs who don't support them!". Great! I'm also looking at the possibility of adding a mechanism for extension modules to opt out of subinterpreter support (using PEP 489 ModuleDef slots). However, I'd rather wait on that if making the PEP provisional is sufficient. > What do you think the criteria for graduating to non-provisional > status should be, in this case? Consensus among the (Dutch?) core devs that subinterpreters are worth keeping in the stdlib and that we've smoothed out any rough parts in the module. > I guess I would be much more confident in the possibilities here if > you could give: > > - some hand-wavy sketch for how subinterpreter A could call a function > that as originally defined in subinterpreter B without the GIL, which > seems like a precondition for sharing user-defined classes (Before I respond, note that this is way outside the scope of the PEP. The merit of subinterpreters extends beyond any benefits of running sans-GIL, though that is my main goal. I've been updating the PEP to (hopefully) better communicate the utility of subinterpreters.) Code objects are immutable so that part should be relatively straight-forward. There's the question of closures and default arguments that would have to be resolved. However, those are things that would need to be supported anyway in a world where we want to pass functions and user-defined types between interpreters. Doing so will be a gradual process of starting with immutable non-container builtin types and expanding out from there to other immutable types, including user-defined ones. Note that sharing mutable objects between interpreters would be a pretty advanced usage (i.e. opt-in shared state vs. threading's share-everything). If it proves desirable then we'd sort that out then. However, I don't see that as a more than an esoteric feature relative to subinterpreters. In my mind, the key advantage of being able to share more (immutable) objects, including user-defined types, between interpreters is in the optimization opportunities. It would allow us to avoid instantiating the same object in each interpreter. That said, the way I imagine it I wouldn't consider such an optimization to be very user-facing so it doesn't impact the PEP. The user-facing part would be the expanded set of immutable objects interpreters could pass back and forth, and expanding that set won't require any changes to the API in the PEP. > - some hand-wavy sketch for how refcounting will work for objects > shared between multiple subinterpreters without the GIL, without > majorly impacting single-thread performance (I actually forgot about > this problem in my last email, because PyPy has already solved this > part!) (same caveat as above) There are a number of approaches that may work. One is to give each interpreter its own allocator and GC. Another is to mark shared objects such that they never get GC'ed. Another is to allow objects to exist only in one interpreter at a time. Similarly, object ownership (per interpreter) could help. Asynchronous refcounting could be an option. That's only some of the possible approaches. I expect that at least one of them will be suitable. However, the first step is to get the multi-interpreter support out there. Then we can tackle the problem of optimization and multi-core utilization. FWIW, the biggest complexity is actually in synchronizing the sharing strategy across the inter-interpreter boundary
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Tue, Sep 12, 2017 at 6:30 PM, Koos Zevenhovenwrote: > On Tue, Sep 12, 2017 at 5:53 PM, Stefan Krah wrote: > >> On Tue, Sep 12, 2017 at 05:35:34PM +0300, Koos Zevenhoven wrote: >> > I don't see how the situation benefits from calling something the "main >> > interpreter". Subinterpreters can be a way to take something >> > non-thread-safe and make it thread-safe, because in an >> > interpreter-per-thread scheme, most of the state, like module globals, >> are >> > thread-local. (Well, this doesn't help for async concurrency, but >> anyway.) >> >> You could have a privileged C extension that is only imported in the main >> interpreter: >> >> >> if get_current_interp() is main_interp(): >> from _decimal import * >> else: >> from _pydecimal import * >> >> >> > Oops.. it should of course be "by_this_process", not "by_other_process" (fixed below). > Or it could be first-come first-served: > > if is_imported_by_ > this > _process("_decimal"): > > > from _pydecimal import * > else > from _decimal import * > > ––Koos > > > > -- > + Koos Zevenhoven + http://twitter.com/k7hoven + > -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 11 September 2017 at 18:02, Koos Zevenhovenwrote: > On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan wrote: >> The line between it and the "CPython Runtime" is fuzzy for both >> practical and historical reasons, but the regular Python CLI will >> always have a "first created, last destroyed" main interpreter, simply >> because we don't really gain anything significant from eliminating it >> as a concept. > > I fear that emphasizing the main interpreter will lead to all kinds of > libraries/programs that somehow unnecessarily rely on some or all tasks > being performed in the main interpreter. Then you'll have a hard time > running two of them in parallel in the same process, because you don't have > two main interpreters. You don't need to fear this scenario, since it's a description of the status quo (and it's the primary source of overstated claims about subinterpreters being "fundamentally broken"). So no, not everything will be subinterpreter-friendly, just as not everything in Python is thread-safe, and not everything is portable across platforms. That's OK - it just means we'll aim to make as many things as possible implicitly subinterpreter-friendly, and for everything else, we'll aim to minimise the adjustments needed to *make* things subinterpreter friendly. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlanwrote: > On 11 September 2017 at 00:52, Koos Zevenhoven wrote: > > On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow > > wrote: > > [...] > > > >> > >> get_main(): > >> > >>Return the main interpreter. > >> > > > > I assume the concept of a main interpreter is inherited from the previous > > levels of support in the C API, but what exactly is the significance of > > being "the main interpreter"? Instead, could they just all be > > subinterpreters of the same Python process (or whatever the right wording > > would be)? > > The main interpreter is ultimately responsible for the actual process > global state: standard streams, signal handlers, dynamically linked > libraries, __main__ module, etc. > > Hmm. It is not clear, for instance, why a signal handler could not be owned by an interpreter that wasn't the first one started. Or, if a non-main process imports a module from a dynamically linked library, does it delegate that to the main interpreter? And do sys.stdout et al. not exist in the other interpreters? The line between it and the "CPython Runtime" is fuzzy for both > practical and historical reasons, but the regular Python CLI will > always have a "first created, last destroyed" main interpreter, simply > because we don't really gain anything significant from eliminating it > as a concept. > I fear that emphasizing the main interpreter will lead to all kinds of libraries/programs that somehow unnecessarily rely on some or all tasks being performed in the main interpreter. Then you'll have a hard time running two of them in parallel in the same process, because you don't have two main interpreters. -- Koos PS. There's a saying... something like "always say never" ;) > By contrast, embedding applications that *don't* have a __main__ > module, and already manage most process global state themselves > without the assistance of the CPython Runtime can already get pretty > close to just having a pool of peer subinterpreters, and will > presumably be able to get closer over time as the subinterpreter > support becomes more robust. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 11 September 2017 at 00:52, Koos Zevenhovenwrote: > On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow > wrote: > [...] > >> >> get_main(): >> >>Return the main interpreter. >> > > I assume the concept of a main interpreter is inherited from the previous > levels of support in the C API, but what exactly is the significance of > being "the main interpreter"? Instead, could they just all be > subinterpreters of the same Python process (or whatever the right wording > would be)? The main interpreter is ultimately responsible for the actual process global state: standard streams, signal handlers, dynamically linked libraries, __main__ module, etc. The line between it and the "CPython Runtime" is fuzzy for both practical and historical reasons, but the regular Python CLI will always have a "first created, last destroyed" main interpreter, simply because we don't really gain anything significant from eliminating it as a concept. By contrast, embedding applications that *don't* have a __main__ module, and already manage most process global state themselves without the assistance of the CPython Runtime can already get pretty close to just having a pool of peer subinterpreters, and will presumably be able to get closer over time as the subinterpreter support becomes more robust. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, 7 Sep 2017 21:08:48 -0700 Nathaniel Smithwrote: > > Awesome, thanks for bringing numbers into my wooly-headed theorizing :-). > > On my laptop I actually get a worse result from your benchmark: 531 ms > for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah, > transferring data between processes with multiprocessing is slow. > > This is odd, though, because on the same machine, using socat to send > 1 GiB between processes using a unix domain socket runs at 2 GB/s: When using local communication, the raw IPC cost is often minor compared to whatever Python does with the data (parse it, dispatch tasks around, etc.) except when the data is really huge. Local communications on Linux can easily reach several GB/s (even using TCP to localhost). Here is a Python script with reduced overhead to measure it -- as opposed to e.g. a full-fledged event loop: https://gist.github.com/pitrou/d809618359915967ffc44b1ecfc2d2ad > I don't know why multiprocessing is so slow -- maybe there's a good > reason, maybe not. Be careful to measure actual bandwidth, not round-trip latency, however. > But the reason isn't that IPC is intrinsically > slow, and subinterpreters aren't going to automatically be 5x faster > because they can use memcpy. What could improve performance significantly would be to share objects without any form of marshalling; but it's not obvious it's possible in the subinterpreters model *if* it also tries to remove the GIL. You can see it readily with concurrent.futures, when comparing ThreadPoolExecutor and ProcessPoolExecutor: >>> import concurrent.futures as cf ...:tp = cf.ThreadPoolExecutor(4) ...:pp = cf.ProcessPoolExecutor(4) ...:x = b"x" * (100 * 1024**2) ...:def identity(x): return x ...: >>> y = list(tp.map(identity, [x] * 10)) # warm up >>> len(y) 10 >>> y = list(pp.map(identity, [x] * 10)) # warm up >>> len(y) 10 >>> %timeit y = list(tp.map(identity, [x] * 10)) 638 µs ± 71.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) >>> %timeit y = list(pp.map(identity, [x] * 10)) 1.99 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) On this trivial case you're really gaining a lot using a thread pool... Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 9:26 PM, Eric Snowwrote: [...] > get_main(): > >Return the main interpreter. > > I assume the concept of a main interpreter is inherited from the previous levels of support in the C API, but what exactly is the significance of being "the main interpreter"? Instead, could they just all be subinterpreters of the same Python process (or whatever the right wording would be)? It might also be helpful if the PEP had a short description of what are considered subinterpreters and how they differ from threads of the same interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of the previously available concepts. However, as this would be a new module, I don't think there's any need to blindly copy the previous design, regardless of how well the design may have served its purpose at the time. -- Koos [*] For instance regarding the role of the glo... local interpreter locks (LILs) ;) -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
> On 8 Sep 2017, at 05:11, Eric Snowwrote: > On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith wrote: > >> Numpy is the one I'm >> most familiar with: when we get subinterpreter bugs we close them >> wontfix, because supporting subinterpreters properly would require >> non-trivial auditing, add overhead for non-subinterpreter use cases, >> and benefit a tiny tiny fraction of our users. > > The main problem of which I'm aware is C globals in libraries and > extension modules. PEPs 489 and 3121 are meant to help but I know > that there is at least one major situation which is still a blocker > for multi-interpreter-safe module state. Other than C globals, is > there some other issue? There’s also the PyGilState_* API that doesn't support multiple interpreters. The issue there is that callbacks from external libraries back into python need to use the correct subinterpreter. Ronald ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 8:11 PM, Eric Snowwrote: > First of all, thanks for the feedback and encouragement! Responses > in-line below. I hope it's helpful! More responses in-line as well. > On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith wrote: >> My concern about this is the same as it was last time -- the work >> looks neat, but right now, almost no-one uses subinterpreters >> (basically it's Jep and mod_wsgi and that's it?), and therefore many >> packages get away with ignoring subinterpreters. > > My concern is that this is a chicken-and-egg problem. The situation > won't improve until subinterpreters are more readily available. Okay, but you're assuming that "more libraries work well with subinterpreters" is in fact an improvement. I'm asking you to convince me of that :-). Are there people saying "oh, if only subinterpreters had a Python API and less weird interactions with C extensions, I could do "? So far they haven't exactly taken the world by storm... >> Numpy is the one I'm >> most familiar with: when we get subinterpreter bugs we close them >> wontfix, because supporting subinterpreters properly would require >> non-trivial auditing, add overhead for non-subinterpreter use cases, >> and benefit a tiny tiny fraction of our users. > > The main problem of which I'm aware is C globals in libraries and > extension modules. PEPs 489 and 3121 are meant to help but I know > that there is at least one major situation which is still a blocker > for multi-interpreter-safe module state. Other than C globals, is > there some other issue? That's the main one I'm aware of, yeah, though I haven't looked into it closely. >> If we add a friendly python-level API like this, then we're committing >> to this being a part of Python for the long term and encouraging >> people to use it, which puts pressure on downstream packages to do >> that work... but it's still not clear whether any benefits will >> actually materialize. > > I'm fine with Nick's idea about making this a "provisional" module. > Would that be enough to ease your concern here? Potentially, yeah -- basically I'm fine with anything that doesn't end up looking like python-dev telling everyone "subinterpreters are the future! go forth and yell at any devs who don't support them!". What do you think the criteria for graduating to non-provisional status should be, in this case? [snip] >> So the only case I can see where I'd expect subinterpreters to make >> communication dramatically more efficient is if you have a "deeply >> immutable" type >> [snip] >> However, it seems impossible to support user-defined deeply-immutable >> types in Python: >> [snip] > > I agree that it is currently not an option. That is part of the > exercise. There are a number of possible solutions to explore once we > get to that point. However, this PEP isn't about that. I'm confident > enough about the possibilities that I'm comfortable with moving > forward here. I guess I would be much more confident in the possibilities here if you could give: - some hand-wavy sketch for how subinterpreter A could call a function that as originally defined in subinterpreter B without the GIL, which seems like a precondition for sharing user-defined classes - some hand-wavy sketch for how refcounting will work for objects shared between multiple subinterpreters without the GIL, without majorly impacting single-thread performance (I actually forgot about this problem in my last email, because PyPy has already solved this part!) These are the two problems where I find it most difficult to have faith. [snip] >> I hope so -- a really useful subinterpreter multi-core stor[y] would be >> awesome. > > Agreed! Thanks for the encouragement. :) Thanks for attempting such an ambitious project :-). -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklinwrote: > Those numbers were for common use in Python tools and reflected my anecdotal > experience at the time with normal Python tools. I'm sure that there are > mechanisms to achieve faster speeds than what I experienced. That being > said, here is a small example. > > > In [1]: import multiprocessing > In [2]: data = b'0' * 1 # 100 MB > In [3]: from toolz import identity > In [4]: pool = multiprocessing.Pool() > In [5]: %time _ = pool.apply_async(identity, (data,)).get() > CPU times: user 76 ms, sys: 64 ms, total: 140 ms > Wall time: 252 ms > > This is about 400MB/s for a roundtrip Awesome, thanks for bringing numbers into my wooly-headed theorizing :-). On my laptop I actually get a worse result from your benchmark: 531 ms for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah, transferring data between processes with multiprocessing is slow. This is odd, though, because on the same machine, using socat to send 1 GiB between processes using a unix domain socket runs at 2 GB/s: # terminal 1 ~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock "SYSTEM:pv -W > /dev/null" 1.00GiB 0:00:00 [1.89GiB/s] [<=> ] # terminal 2 ~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024" UNIX:/tmp/unix.sock 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s (Notice that the pv output is in GiB/s and the dd output is in GB/s. 1.89 GiB/s = 2.03 GB/s, so they actually agree.) On my system, Python allocates + copies memory at 2.2 GB/s, so bulk byte-level IPC is within 10% of within-process bulk copying: # same 100 MB bytestring as above In [7]: bytearray_data = bytearray(data) In [8]: %timeit bytearray_data.copy() 45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [9]: 0.100 / 0.0453 # GB / seconds Out[9]: 2.207505518763797 I don't know why multiprocessing is so slow -- maybe there's a good reason, maybe not. But the reason isn't that IPC is intrinsically slow, and subinterpreters aren't going to automatically be 5x faster because they can use memcpy. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 12:44 PM, Paul Moorewrote: > On 7 September 2017 at 20:14, Eric Snow wrote: >> I didn't include such a queue in this proposal because I wanted to >> keep it as focused as possible. I'll add a note to the PEP about >> this. > > This all sounds very reasonable. Thanks for the clarification. Hmm. Now I'm starting to think some form of basic queue would be important enough to include in the PEP. I'll see if that feeling holds tomorrow. -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
First of all, thanks for the feedback and encouragement! Responses in-line below. -eric On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smithwrote: > My concern about this is the same as it was last time -- the work > looks neat, but right now, almost no-one uses subinterpreters > (basically it's Jep and mod_wsgi and that's it?), and therefore many > packages get away with ignoring subinterpreters. My concern is that this is a chicken-and-egg problem. The situation won't improve until subinterpreters are more readily available. > Numpy is the one I'm > most familiar with: when we get subinterpreter bugs we close them > wontfix, because supporting subinterpreters properly would require > non-trivial auditing, add overhead for non-subinterpreter use cases, > and benefit a tiny tiny fraction of our users. The main problem of which I'm aware is C globals in libraries and extension modules. PEPs 489 and 3121 are meant to help but I know that there is at least one major situation which is still a blocker for multi-interpreter-safe module state. Other than C globals, is there some other issue? > If we add a friendly python-level API like this, then we're committing > to this being a part of Python for the long term and encouraging > people to use it, which puts pressure on downstream packages to do > that work... but it's still not clear whether any benefits will > actually materialize. I'm fine with Nick's idea about making this a "provisional" module. Would that be enough to ease your concern here? > I've actually argued with the PyPy devs to try to convince them to add > subinterpreter support as part of their experiments with GIL-removal, > because I think the semantics would genuinely be nicer to work with > than raw threads, but they're convinced that it's impossible to make > this work. Or more precisely, they think you could make it work in > theory, but that it would be impossible to make it meaningfully more > efficient than using multiple processes. I want them to be wrong, but > I have to admit I can't see a way to make it work either... Yikes! Given the people involved I don't find that to be a good sign. Nevertheless, I still consider my ultimate goals to be tractable and will press forward. At each step thus far, the effort has led to improvements that extend beyond subinterpreters and multi-core. I see that trend continuing for the entirety of the project. Even if my final goal is not realized, the result will still be significantly net positive...and I still think it will still work out. :) > If this is being justified by the multicore use case, and specifically > by the theory that having two interpreters in the same process will > allow for more efficient communication than two interpreters in two > different processes, then... why should we believe that that's > actually possible? I want your project to succeed, but if it's going > to fail then it seems better if it fails before we commit to exposing > new APIs. The project is partly about performance. However, it's also particularly about offering a alternative concurrency model with an implementation that can run in multiple threads simultaneously in the same process. On Thu, Sep 7, 2017 at 5:15 PM, Nathaniel Smith wrote: > The slow case is passing > complicated objects between processes, and it's slow because pickle > has to walk the object graph to serialize it, and walking the object > graph is slow. Copying object graphs between subinterpreters has the > same problem. The initial goal is to support passing only strings between interpreters. Later efforts will involve investigating approaches to efficiently and safely passing other objects. > So the only case I can see where I'd expect subinterpreters to make > communication dramatically more efficient is if you have a "deeply > immutable" type > [snip] > However, it seems impossible to support user-defined deeply-immutable > types in Python: > [snip] I agree that it is currently not an option. That is part of the exercise. There are a number of possible solutions to explore once we get to that point. However, this PEP isn't about that. I'm confident enough about the possibilities that I'm comfortable with moving forward here. > I guess the other case where subprocesses lose to "real" threads is > startup time on Windows. But starting a subinterpreter is also much > more expensive than starting a thread, once you take into account the > cost of loading the application's modules into the new interpreter. In > both cases you end up needing some kind of process/subinterpreter pool > or cache to amortize that cost. Interpreter startup costs (and optimization strategies) are another aspect of the project which deserve attention. However, we'll worry about that after the core functionality has been achieved. > Obviously I'm committing the cardinal sin of trying to guess about > performance based on theory instead of measurement, so
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
Those numbers were for common use in Python tools and reflected my anecdotal experience at the time with normal Python tools. I'm sure that there are mechanisms to achieve faster speeds than what I experienced. That being said, here is a small example. In [1]: import multiprocessing In [2]: data = b'0' * 1 # 100 MB In [3]: from toolz import identity In [4]: pool = multiprocessing.Pool() In [5]: %time _ = pool.apply_async(identity, (data,)).get() CPU times: user 76 ms, sys: 64 ms, total: 140 ms Wall time: 252 ms This is about 400MB/s for a roundtrip On Thu, Sep 7, 2017 at 9:00 PM, Stephan Hoyerwrote: > On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith wrote: > >> On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan wrote: >> > The gist of the idea is that with subinterpreters, your starting point >> > is multiprocessing-style isolation (i.e. you have to use pickle to >> > transfer data between subinterpreters), but you're actually running in >> > a shared-memory threading context from the operating system's >> > perspective, so you don't need to rely on mmap to share memory over a >> > non-streaming interface. >> >> The challenge is that streaming bytes between processes is actually >> really fast -- you don't really need mmap for that. (Maybe this was >> important for X11 back in the 1980s, but a lot has changed since then >> :-).) And if you want to use pickle and multiprocessing to send, say, >> a single big numpy array between processes, that's also really fast, >> because it's basically just a few memcpy's. The slow case is passing >> complicated objects between processes, and it's slow because pickle >> has to walk the object graph to serialize it, and walking the object >> graph is slow. Copying object graphs between subinterpreters has the >> same problem. >> > > This doesn't match up with my (somewhat limited) experience. For example, > in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is > about 10x slower than a memory copy: > http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth > > This makes a considerable difference when building a system do to parallel > data analytics in Python (e.g., on NumPy arrays), which is exactly what > Matthew has been working on for the past few years. > > I'm sure there are other ways to avoid this expensive IPC without using > sub-interpreters, e.g., by using a tool like Plasma ( > http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/). > But I'm skeptical of your assessment that the current multiprocessing > approach is fast enough. > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smithwrote: > On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan wrote: > > The gist of the idea is that with subinterpreters, your starting point > > is multiprocessing-style isolation (i.e. you have to use pickle to > > transfer data between subinterpreters), but you're actually running in > > a shared-memory threading context from the operating system's > > perspective, so you don't need to rely on mmap to share memory over a > > non-streaming interface. > > The challenge is that streaming bytes between processes is actually > really fast -- you don't really need mmap for that. (Maybe this was > important for X11 back in the 1980s, but a lot has changed since then > :-).) And if you want to use pickle and multiprocessing to send, say, > a single big numpy array between processes, that's also really fast, > because it's basically just a few memcpy's. The slow case is passing > complicated objects between processes, and it's slow because pickle > has to walk the object graph to serialize it, and walking the object > graph is slow. Copying object graphs between subinterpreters has the > same problem. > This doesn't match up with my (somewhat limited) experience. For example, in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is about 10x slower than a memory copy: http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth This makes a considerable difference when building a system do to parallel data analytics in Python (e.g., on NumPy arrays), which is exactly what Matthew has been working on for the past few years. I'm sure there are other ways to avoid this expensive IPC without using sub-interpreters, e.g., by using a tool like Plasma ( http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/). But I'm skeptical of your assessment that the current multiprocessing approach is fast enough. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlanwrote: > On 7 September 2017 at 15:48, Nathaniel Smith wrote: >> I've actually argued with the PyPy devs to try to convince them to add >> subinterpreter support as part of their experiments with GIL-removal, >> because I think the semantics would genuinely be nicer to work with >> than raw threads, but they're convinced that it's impossible to make >> this work. Or more precisely, they think you could make it work in >> theory, but that it would be impossible to make it meaningfully more >> efficient than using multiple processes. I want them to be wrong, but >> I have to admit I can't see a way to make it work either... > > The gist of the idea is that with subinterpreters, your starting point > is multiprocessing-style isolation (i.e. you have to use pickle to > transfer data between subinterpreters), but you're actually running in > a shared-memory threading context from the operating system's > perspective, so you don't need to rely on mmap to share memory over a > non-streaming interface. The challenge is that streaming bytes between processes is actually really fast -- you don't really need mmap for that. (Maybe this was important for X11 back in the 1980s, but a lot has changed since then :-).) And if you want to use pickle and multiprocessing to send, say, a single big numpy array between processes, that's also really fast, because it's basically just a few memcpy's. The slow case is passing complicated objects between processes, and it's slow because pickle has to walk the object graph to serialize it, and walking the object graph is slow. Copying object graphs between subinterpreters has the same problem. So the only case I can see where I'd expect subinterpreters to make communication dramatically more efficient is if you have a "deeply immutable" type: one where not only are its instances immutable, but all objects reachable from those instances are also guaranteed to be immutable. So like, a tuple except that when you instantiate it it validates that all of its elements are also marked as deeply immutable, and errors out if not. Then when you go to send this between subinterpreters, you can tell by checking the type of the root object that the whole graph is immutable, so you don't need to walk it yourself. However, it seems impossible to support user-defined deeply-immutable types in Python: types and functions are themselves mutable and hold tons of references to other potentially mutable objects via __mro__, __globals__, __weakrefs__, etc. etc., so even if a user-defined instance can be made logically immutable it's still going to hold references to mutable things. So the one case where subinterpreters win is if you have a really big and complicated set of nested pseudo-tuples of ints and strings and you're bottlenecked on passing it between interpreters. Maybe frozendicts too. Is that enough to justify the whole endeavor? It seems dubious to me. I guess the other case where subprocesses lose to "real" threads is startup time on Windows. But starting a subinterpreter is also much more expensive than starting a thread, once you take into account the cost of loading the application's modules into the new interpreter. In both cases you end up needing some kind of process/subinterpreter pool or cache to amortize that cost. Obviously I'm committing the cardinal sin of trying to guess about performance based on theory instead of measurement, so maybe I'm wrong. Or maybe there's some deviously clever trick I'm missing. I hope so -- a really useful subinterpreter multi-core store would be awesome. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 7 September 2017 at 15:48, Nathaniel Smithwrote: > I've actually argued with the PyPy devs to try to convince them to add > subinterpreter support as part of their experiments with GIL-removal, > because I think the semantics would genuinely be nicer to work with > than raw threads, but they're convinced that it's impossible to make > this work. Or more precisely, they think you could make it work in > theory, but that it would be impossible to make it meaningfully more > efficient than using multiple processes. I want them to be wrong, but > I have to admit I can't see a way to make it work either... The gist of the idea is that with subinterpreters, your starting point is multiprocessing-style isolation (i.e. you have to use pickle to transfer data between subinterpreters), but you're actually running in a shared-memory threading context from the operating system's perspective, so you don't need to rely on mmap to share memory over a non-streaming interface. It's also definitely the case that to make this viable, we'd need to provide fast subinterpreter friendly alternatives to C globals for use by extension modules (otherwise adding subinterpreter compatibility will be excessively painful), and PEP 550 is likely to be helpful there. Personally, I think it would make sense to add the module under PEP 411 provisional status, and make it's continued existence as a public API contingent on actually delivering on the "lower overhead multi-core support than multiprocessing" goal (even if it only delivers on that front on Windows, where process creation is more expensive and there's no fork() equivalent). However, I'd also be entirely happy with our adding it as a private "_subinterpreters" API for testing & experimentation purposes (see https://bugs.python.org/issue30439 ), and reconsidering introducing it as a public API after there's more concrete evidence as to what can actually be achieved based on it. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 11:26 AM, Eric Snowwrote: > Hi all, > > As part of the multi-core work I'm proposing the addition of the > "interpreters" module to the stdlib. This will expose the existing > subinterpreters C-API to Python code. I've purposefully kept the API > simple. Please let me know what you think. My concern about this is the same as it was last time -- the work looks neat, but right now, almost no-one uses subinterpreters (basically it's Jep and mod_wsgi and that's it?), and therefore many packages get away with ignoring subinterpreters. Numpy is the one I'm most familiar with: when we get subinterpreter bugs we close them wontfix, because supporting subinterpreters properly would require non-trivial auditing, add overhead for non-subinterpreter use cases, and benefit a tiny tiny fraction of our users. If we add a friendly python-level API like this, then we're committing to this being a part of Python for the long term and encouraging people to use it, which puts pressure on downstream packages to do that work... but it's still not clear whether any benefits will actually materialize. I've actually argued with the PyPy devs to try to convince them to add subinterpreter support as part of their experiments with GIL-removal, because I think the semantics would genuinely be nicer to work with than raw threads, but they're convinced that it's impossible to make this work. Or more precisely, they think you could make it work in theory, but that it would be impossible to make it meaningfully more efficient than using multiple processes. I want them to be wrong, but I have to admit I can't see a way to make it work either... If this is being justified by the multicore use case, and specifically by the theory that having two interpreters in the same process will allow for more efficient communication than two interpreters in two different processes, then... why should we believe that that's actually possible? I want your project to succeed, but if it's going to fail then it seems better if it fails before we commit to exposing new APIs. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 1:14 PM, Sebastian Krausewrote: > How is the GIL situation with subinterpreters these days, is the > long-term goal still "solving multi-core Python", i.e. using > multiple CPU cores from within the same process? Or is it mainly > used for isolation? The GIL is still process-global. The goal is indeed to change this to support actual multi-core parallelism. However, the benefits of interpreter isolation are certainly a win otherwise. :) -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 12:44 PM, Paul Moorewrote: > Ah, OK. so if I create a new interpreter, none of the classes, > functions, or objects defined in my calling code will exist within the > target interpreter? That makes sense, but I'd missed that nuance from > the description. Again, this is probably worth noting in the PEP. I'll make sure the PEP is more clear about this. > > And for the record, based on that one fact, I'm perfectly OK with the > initial API being string-only. Great! :) -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
Eric Snowwrote: > 1. add a basic queue class for passing objects between interpreters > * only support strings at first (though Nick pointed out we could > fall back to pickle or marshal for unsupported objects) > 2. implement CSP on top of subinterpreters > 3. expand the queue's supported types > 4. add something like Interpreter.call() How is the GIL situation with subinterpreters these days, is the long-term goal still "solving multi-core Python", i.e. using multiple CPU cores from within the same process? Or is it mainly used for isolation? Sebastian ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 7 September 2017 at 20:14, Eric Snowwrote: > On Thu, Sep 7, 2017 at 11:52 AM, Paul Moore wrote: >> Is there any reason why passing a callable and args is unsafe, and/or >> difficult? Naively, I'd assume that >> >> interp.call('f(a)') >> >> would be precisely as safe as >> >> interp.call(f, a) > > The problem for now is with sharing objects between interpreters. The > simplest safe approach currently is to restrict execution to source > strings. Then there are no complications. Interpreter.call() makes > sense but I'd like to wait until we get feel for how subinterpreters > get used and until we address some of the issues with object passing. Ah, OK. so if I create a new interpreter, none of the classes, functions, or objects defined in my calling code will exist within the target interpreter? That makes sense, but I'd missed that nuance from the description. Again, this is probably worth noting in the PEP. And for the record, based on that one fact, I'm perfectly OK with the initial API being string-only. > FWIW, here are what I see as the next steps for subinterpreters in the stdlib: > > 1. add a basic queue class for passing objects between interpreters > * only support strings at first (though Nick pointed out we could > fall back to pickle or marshal for unsupported objects) > 2. implement CSP on top of subinterpreters > 3. expand the queue's supported types > 4. add something like Interpreter.call() > > I didn't include such a queue in this proposal because I wanted to > keep it as focused as possible. I'll add a note to the PEP about > this. This all sounds very reasonable. Thanks for the clarification. Paul ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On Thu, Sep 7, 2017 at 11:52 AM, Paul Moorewrote: > The only quibble I have is that I'd prefer it if we had a > run(callable, *args, **kwargs) method. Either instead of, or as well > as, the run(string) one here. > > Is there any reason why passing a callable and args is unsafe, and/or > difficult? Naively, I'd assume that > > interp.call('f(a)') > > would be precisely as safe as > > interp.call(f, a) The problem for now is with sharing objects between interpreters. The simplest safe approach currently is to restrict execution to source strings. Then there are no complications. Interpreter.call() makes sense but I'd like to wait until we get feel for how subinterpreters get used and until we address some of the issues with object passing. FWIW, here are what I see as the next steps for subinterpreters in the stdlib: 1. add a basic queue class for passing objects between interpreters * only support strings at first (though Nick pointed out we could fall back to pickle or marshal for unsupported objects) 2. implement CSP on top of subinterpreters 3. expand the queue's supported types 4. add something like Interpreter.call() I didn't include such a queue in this proposal because I wanted to keep it as focused as possible. I'll add a note to the PEP about this. > > Am I missing something? Name visibility or scoping issues come to mind > as possible complications I'm not seeing. At the least, if we don't > want a callable-and-args form yet, a note in the PEP explaining why > it's been omitted would be worthwhile. I'll add a note to the PEP. Thanks for pointing this out. :) -eric ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
On 7 September 2017 at 19:26, Eric Snowwrote: > As part of the multi-core work I'm proposing the addition of the > "interpreters" module to the stdlib. This will expose the existing > subinterpreters C-API to Python code. I've purposefully kept the API > simple. Please let me know what you think. Looks good. I agree with the idea of keeping the interface simple in the first instance - we can easily add extra functionality later, but removing stuff (or worse still, finding that stuff we thought was OK but had missed corner cases of was broken) is much harder. >run(code): > > Run the provided Python code in the interpreter, in the current > OS thread. Supported code: source text. The only quibble I have is that I'd prefer it if we had a run(callable, *args, **kwargs) method. Either instead of, or as well as, the run(string) one here. Is there any reason why passing a callable and args is unsafe, and/or difficult? Naively, I'd assume that interp.call('f(a)') would be precisely as safe as interp.call(f, a) Am I missing something? Name visibility or scoping issues come to mind as possible complications I'm not seeing. At the least, if we don't want a callable-and-args form yet, a note in the PEP explaining why it's been omitted would be worthwhile. Paul ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code
Hi all, As part of the multi-core work I'm proposing the addition of the "interpreters" module to the stdlib. This will expose the existing subinterpreters C-API to Python code. I've purposefully kept the API simple. Please let me know what you think. -eric https://www.python.org/dev/peps/pep-0554/ https://github.com/python/peps/blob/master/pep-0554.rst https://github.com/python/cpython/pull/1748 https://github.com/python/cpython/pull/1802 https://github.com/ericsnowcurrently/cpython/tree/high-level-interpreters-module ** PEP: 554 Title: Multiple Interpreters in the Stdlib Author: Eric SnowStatus: Draft Type: Standards Track Content-Type: text/x-rst Created: 2017-09-05 Python-Version: 3.7 Post-History: Abstract This proposal introduces the stdlib "interpreters" module. It exposes the basic functionality of subinterpreters that exists in the C-API. Rationale = Running code in multiple interpreters provides a useful level of isolation within the same process. This can be leveraged in number of ways. Furthermore, subinterpreters provide a well-defined framework in which such isolation may extended. CPython has supported subinterpreters, with increasing levels of support, since version 1.5. While the feature has the potential to be a powerful tool, subinterpreters have suffered from neglect because they are not available directly from Python. Exposing the existing functionality in the stdlib will help reverse the situation. Proposal The "interpreters" module will be added to the stdlib. It will provide a high-level interface to subinterpreters and wrap the low-level "_interpreters" module. The proposed API is inspired by the threading module. The module provides the following functions: enumerate(): Return a list of all existing interpreters. get_current(): Return the currently running interpreter. get_main(): Return the main interpreter. create(): Initialize a new Python interpreter and return it. The interpreter will be created in the current thread and will remain idle until something is run in it. The module also provides the following class: Interpreter(id): id: The interpreter's ID (read-only). is_running(): Return whether or not the interpreter is currently running. destroy(): Finalize and destroy the interpreter. run(code): Run the provided Python code in the interpreter, in the current OS thread. Supported code: source text. Copyright = This document has been placed in the public domain. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/