Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-14 Thread Nick Coghlan
On 14 September 2017 at 08:25, Nick Coghlan  wrote:
> On 13 September 2017 at 20:45, Koos Zevenhoven  wrote:
>> It's still just *an* interpreter that happens to run __main__. And who says
>> it even needs to be the only one?
>
> Koos, I've asked multiple times now for you to describe the practical
> user benefits you believe will come from dispensing with the existing
> notion of a main interpreter (which is *not* something PEP 554 has
> created - the main interpreter already exists at the implementation
> level, PEP 554 just makes that fact visible at the Python level).

Eric addressed this in the latest update, and took the view that since
it's a question the can be deferred, it's one that should be deferred,
in line with the overall "minimal enabling infrastructure" philosophy
of the PEP.

On thinking about it further, I believe this may also intersect with
some open questions I have around the visibility of *thread* objects
across interpreters - the real runtime constraint at the
implementation level is the fact that we need a main *thread* in order
to sensibly manage the way signal handling works across different
platforms, and that's where we may get into trouble if we allow
arbitrary subinterpreters to run in the main thread, and accept and
process signals directly.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-13 Thread Nick Coghlan
On 13 September 2017 at 14:10, Nathaniel Smith  wrote:
> Subinterpreters are basically an attempt to reimplement the OS's
> process isolation in user-space, right?

Not really, they're more an attempt to make something resembling
Rust's memory model available to Python programs - having the default
behaviour be "memory is not shared", but having the choice to share
when you want to be entirely an application level decision, without
getting into the kind of complexity needed to deliberately break
operating system level process isolation.

The difference is that where Rust was able to do that on a per-thread
basis and rely on their borrow checker for enforcement of memory
ownership, for PEP 554, we're proposing to do it on a per-interpreter
basis, and rely on runtime object space partitioning (where Python
objects and the memory allocators are *not* shared between
interpreters) to keep things separated from each other.

That's why memoryview is such a key part of making the proposal
interesting: it's what lets us relatively easily poke holes in the
object level partitioning between interpreters and provide zero-copy
messaging passing without having to share any regular reference counts
between interpreters (which in turn is what makes it plausible that we
may eventually be able to switch to a true GIL-per-interpreter model,
with only a few cross-interpreter locks for operations like accessing
the list of interpreters itself).

Right now, the closest equivalent to this programming model that
Python offers is to combine threads with queue.Queue, and it requires
a lot of programming discipline to ensure that you don't access an
object again once you've submitted to a queue.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-13 Thread Nick Coghlan
On 13 September 2017 at 20:45, Koos Zevenhoven  wrote:
> On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlan  wrote:
>>
>> On 13 September 2017 at 00:35, Koos Zevenhoven  wrote:>
>>
>> > I don't see how the situation benefits from calling something the "main
>> > interpreter". Subinterpreters can be a way to take something
>> > non-thread-safe
>> > and make it thread-safe, because in an interpreter-per-thread scheme,
>> > most
>> > of the state, like module globals, are thread-local. (Well, this doesn't
>> > help for async concurrency, but anyway.)
>>
>> "The interpreter that runs __main__" is never going to go away as a
>> concept for the regular CPython CLI.
>
>
> It's still just *an* interpreter that happens to run __main__. And who says
> it even needs to be the only one?

Koos, I've asked multiple times now for you to describe the practical
user benefits you believe will come from dispensing with the existing
notion of a main interpreter (which is *not* something PEP 554 has
created - the main interpreter already exists at the implementation
level, PEP 554 just makes that fact visible at the Python level).

If you can't come up with a meaningful user benefit that would arise
from removing it, then please just let the matter drop.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-13 Thread Koos Zevenhoven
On Wed, Sep 13, 2017 at 6:14 AM, Nick Coghlan  wrote:

> On 13 September 2017 at 00:35, Koos Zevenhoven  wrote:>
>
> I don't see how the situation benefits from calling something the "main
> > interpreter". Subinterpreters can be a way to take something
> non-thread-safe
> > and make it thread-safe, because in an interpreter-per-thread scheme,
> most
> > of the state, like module globals, are thread-local. (Well, this doesn't
> > help for async concurrency, but anyway.)
>
> "The interpreter that runs __main__" is never going to go away as a
> concept for the regular CPython CLI.
>

It's still just *an* interpreter that happens to run __main__. And who says
it even needs to be the only one?


>
> Right now, its also a restriction even for applications like mod_wsgi,
> since the GIL state APIs always register C created threads with the
> main interpreter.
>
> >> That's OK - it just means we'll aim to make as many
> >> things as possible implicitly subinterpreter-friendly, and for
> >> everything else, we'll aim to minimise the adjustments needed to
> >> *make* things subinterpreter friendly.
> >>
> >
> > And that's exactly what I'm after here!
>
> No, you're after deliberately making the proposed API
> non-representative of how the reference implementation actually works
> because of a personal aesthetic preference rather than asking yourself
> what the practical benefit of hiding the existence of the main
> interpreter would be.
>
> The fact is that the main interpreter *is* special (just as the main
> thread is special), and your wishing that things were otherwise won't
> magically make it so.
>

​I'm not questioning whether the main interpreter is special, or whether
the interpreters may differ from each other. I'm questioning the whole
concept of "main interpreter". People should not care about which
interpreter is "the main ONE". They should care about what properties an
interpreter has. That's not aesthetics. Just look at, e.g. the
_decimal/_pydecimal examples in this thread.


> I'm mostly just worried about the `get_main()` function. Maybe it should
> be
> > called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first
> > running interpreter just introduce itself to its children? And if that's
> too
> > much to ask, maybe there could be a `get_parent()` function, which would
> > give you the interpreter that spawned the current subinterpreter.
>
> If the embedding application never calls
> "_Py_ConfigureMainInterpreter", then get_main() could conceivably
> return None. However, we don't expose that as a public API yet, so for
> the time being, Py_Initialize() will always call it, and hence there
> will always be a main interpreter (even in things like mod_wsgi).
>
>
You don't need to remove _Py_ConfigureMainInterpreter. Just make sure you
don't try to smuggle it into the status quo of the possibly upcoming new
stdlib module. Who knows what the function does anyway, let alone what it
might or might not do in the future.

Of course that doesn't mean that there couldn't be ways to configure an
interpreter, but coupling that with a concept of a "main interpreter", as
you suggest, doesn't seem to make any sense. And surely the code that
creates a new interpreter should know if it wants the new interpreter to
start with `__name__ == "__main__"` or `__name__ == "__just_any__", if
there is a choice.

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Nathaniel Smith
On Tue, Sep 12, 2017 at 1:46 PM, Eric Snow  wrote:
> On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith  wrote:
>> On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow  
>> wrote:
>>> My concern is that this is a chicken-and-egg problem.  The situation
>>> won't improve until subinterpreters are more readily available.
>>
>> Okay, but you're assuming that "more libraries work well with
>> subinterpreters" is in fact an improvement. I'm asking you to convince
>> me of that :-). Are there people saying "oh, if only subinterpreters
>> had a Python API and less weird interactions with C extensions, I
>> could do "? So far they haven't exactly taken the
>> world by storm...
>
> The problem is that most people don't know about the feature.  And
> even if they do, using it requires writing a C-extension, which most
> people aren't comfortable doing.
>
>>> Other than C globals, is there some other issue?
>>
>> That's the main one I'm aware of, yeah, though I haven't looked into it 
>> closely.
>
> Oh, good.  I haven't missed something. :)  Do you know how often
> subinterpreter support is a problem for users?  I was under the
> impression from your earlier statements that this is a recurring issue
> but my understanding from mod_wsgi is that it isn't that common.

It looks like we've been averaging one bug report every ~6 months for
the last 3 years:

https://github.com/numpy/numpy/issues?utf8=%E2%9C%93=is%3Aissue%20subinterpreter%20OR%20subinterpreters

They mostly come from Jep, not mod_wsgi. (Possibly because Jep has
some built-in numpy integration.) I don't know how many people file
bugs versus just living with it or finding some workaround. I suspect
for mod_wsgi in particular they probably switch to something else --
it's not like there's any shortage of WSGI servers that avoid these
problems. And for Jep there are prominent warnings to expect problems
and suggesting workarounds:
  https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions

>> I guess I would be much more confident in the possibilities here if
>> you could give:
>>
>> - some hand-wavy sketch for how subinterpreter A could call a function
>> that as originally defined in subinterpreter B without the GIL, which
>> seems like a precondition for sharing user-defined classes
>
> (Before I respond, note that this is way outside the scope of the PEP.
> The merit of subinterpreters extends beyond any benefits of running
> sans-GIL, though that is my main goal.  I've been updating the PEP to
> (hopefully) better communicate the utility of subinterpreters.)

Subinterpreters are basically an attempt to reimplement the OS's
process isolation in user-space, right? Classic trade-off where we
accept added complexity and fragility in the hopes of gaining some
speed? I just looked at the PEP again, and I'm afraid I still don't
understand what the benefits are unless we can remove the GIL and
somehow get a speedup over processes. Implementing CSP is a neat idea,
but you could do it with subprocesses too. AFAICT you could implement
the whole subinterpreters module API with subprocesses on 3.6, and
it'd be multi-core and have perfect extension module support.

> Code objects are immutable so that part should be relatively
> straight-forward.  There's the question of closures and default
> arguments that would have to be resolved.  However, those are things
> that would need to be supported anyway in a world where we want to
> pass functions and user-defined types between interpreters.  Doing so
> will be a gradual process of starting with immutable non-container
> builtin types and expanding out from there to other immutable types,
> including user-defined ones.

I tried arguing that code objects were immutable to the PyPy devs too
:-). The problem is that to call a function you need both its
__code__, which is immutable, and its __globals__, which is
emphatically not. The __globals__ thing means that if you start from
an average function you can often follow pointers to reach every other
global object (e.g. if the function uses regular expressions, you can
probably reach any module by doing
func.__globals__["re"].sys.modules[...]). You might hope that you
could somehow restrict this, but I can't think of any way that's
really useful :-(.

>
> Note that sharing mutable objects between interpreters would be a
> pretty advanced usage (i.e. opt-in shared state vs. threading's
> share-everything).  If it proves desirable then we'd sort that out
> then.  However, I don't see that as a more than an esoteric feature
> relative to subinterpreters.
>
> In my mind, the key advantage of being able to share more (immutable)
> objects, including user-defined types, between interpreters is in the
> optimization opportunities.

But even if we can add new language features for "freezing"
user-defined objects, then their .__class__ will still be mutable,
their methods will still have mutable 

Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Nick Coghlan
On 13 September 2017 at 00:35, Koos Zevenhoven  wrote:
> On Tue, Sep 12, 2017 at 1:40 PM, Nick Coghlan  wrote:
>>
>> On 11 September 2017 at 18:02, Koos Zevenhoven  wrote:
>> > On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan 
>> > wrote:
>> >> The line between it and the "CPython Runtime" is fuzzy for both
>> >> practical and historical reasons, but the regular Python CLI will
>> >> always have a "first created, last destroyed" main interpreter, simply
>> >> because we don't really gain anything significant from eliminating it
>> >> as a concept.
>> >
>> > I fear that emphasizing the main interpreter will lead to all kinds of
>> > libraries/programs that somehow unnecessarily rely on some or all tasks
>> > being performed in the main interpreter. Then you'll have a hard time
>> > running two of them in parallel in the same process, because you don't
>> > have
>> > two main interpreters.
>>
>> You don't need to fear this scenario, since it's a description of the
>> status quo (and it's the primary source of overstated claims about
>> subinterpreters being "fundamentally broken").
>>
>
> Well, if that's true, it's hardly a counter-argument to what I said. Anyway,
> there is no status quo about what is proposed in the PEP.

Yes, there is, since subinterpreters are an existing feature of the
CPython implementation.

What's new in the PEP is the idea of giving that feature a Python
level API so that it's available to regular Python programs, rather
than only being available to embedding applications that choose to use
it (e.g. mod_wsgi).

> And as long as the existing APIs are preserved, why not make the new one
> less susceptible to overstated fundamental brokenness?

Having a privileged main interpreter isn't fundamentally broken, since
you aren't going to run __main__ in more than one interpreter, just as
you don't run __main__ in more than one thread (and multiprocessing
deliberately avoids running the "if __name__ == '__main__'" sections
of it in more than one process).

>> So no, not everything will be subinterpreter-friendly, just as not
>> everything in Python is thread-safe, and not everything is portable
>> across platforms.
>
> I don't see how the situation benefits from calling something the "main
> interpreter". Subinterpreters can be a way to take something non-thread-safe
> and make it thread-safe, because in an interpreter-per-thread scheme, most
> of the state, like module globals, are thread-local. (Well, this doesn't
> help for async concurrency, but anyway.)

"The interpreter that runs __main__" is never going to go away as a
concept for the regular CPython CLI.

Right now, its also a restriction even for applications like mod_wsgi,
since the GIL state APIs always register C created threads with the
main interpreter.

>> That's OK - it just means we'll aim to make as many
>> things as possible implicitly subinterpreter-friendly, and for
>> everything else, we'll aim to minimise the adjustments needed to
>> *make* things subinterpreter friendly.
>>
>
> And that's exactly what I'm after here!

No, you're after deliberately making the proposed API
non-representative of how the reference implementation actually works
because of a personal aesthetic preference rather than asking yourself
what the practical benefit of hiding the existence of the main
interpreter would be.

The fact is that the main interpreter *is* special (just as the main
thread is special), and your wishing that things were otherwise won't
magically make it so.

> I'm mostly just worried about the `get_main()` function. Maybe it should be
> called `asdfjaosjnoijb()`, so people wouldn't use it. Can't the first
> running interpreter just introduce itself to its children? And if that's too
> much to ask, maybe there could be a `get_parent()` function, which would
> give you the interpreter that spawned the current subinterpreter.

If the embedding application never calls
"_Py_ConfigureMainInterpreter", then get_main() could conceivably
return None. However, we don't expose that as a public API yet, so for
the time being, Py_Initialize() will always call it, and hence there
will always be a main interpreter (even in things like mod_wsgi).

Whether we invest significant effort in making configuring the main
interpreter genuinely optional is still an open question - since most
applications are free to just not use the main interpreter for code
execution if they don't want to, we haven't found a real world use
case that would benefit meaningfully from its non-existence (just as
the vast majority of applications don't care about the various ways in
which the main thread that runs Py_Initialize() and Py_Finalize() is
given special treatment, and for those that do, they're free to avoid
using it).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list

Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Eric Snow
On Sun, Sep 10, 2017 at 12:14 PM, Antoine Pitrou  wrote:
> What could improve performance significantly would be to share objects
> without any form of marshalling; but it's not obvious it's possible in
> the subinterpreters model *if* it also tries to remove the GIL.

Yep.  This is one of the main challenges relative to the goal of fully
utilizing multiple cores.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Eric Snow
On Sun, Sep 10, 2017 at 7:52 AM, Koos Zevenhoven  wrote:
> I assume the concept of a main interpreter is inherited from the previous
> levels of support in the C API, but what exactly is the significance of
> being "the main interpreter"? Instead, could they just all be
> subinterpreters of the same Python process (or whatever the right wording
> would be)?
>
> It might also be helpful if the PEP had a short description of what are
> considered subinterpreters and how they differ from threads of the same
> interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of
> the previously available concepts. However, as this would be a new module, I
> don't think there's any need to blindly copy the previous design, regardless
> of how well the design may have served its purpose at the time.

I've updated the PEP to be more instructive.  I've also dropped the
"get_main()" function from the PEP.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Eric Snow
Yep.  See http://bugs.python.org/issue10915 and
http://bugs.python.org/issue15751.  The issue of C-extension support
for subinterpreters is, of course, a critical one here.  At the very
least, incompatible modules should be able to opt out of
subinterpreter support.  I've updated the PEP to discuss this.

-eric

On Sun, Sep 10, 2017 at 3:18 AM, Ronald Oussoren  wrote:
>
>> On 8 Sep 2017, at 05:11, Eric Snow  wrote:
>
>> On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith  wrote:
>>
>>> Numpy is the one I'm
>>> most familiar with: when we get subinterpreter bugs we close them
>>> wontfix, because supporting subinterpreters properly would require
>>> non-trivial auditing, add overhead for non-subinterpreter use cases,
>>> and benefit a tiny tiny fraction of our users.
>>
>> The main problem of which I'm aware is C globals in libraries and
>> extension modules.  PEPs 489 and 3121 are meant to help but I know
>> that there is at least one major situation which is still a blocker
>> for multi-interpreter-safe module state.  Other than C globals, is
>> there some other issue?
>
> There’s also the PyGilState_* API that doesn't support multiple interpreters.
>
> The issue there is that callbacks from external libraries back into python
> need to use the correct subinterpreter.
>
> Ronald
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Eric Snow
On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith  wrote:
> On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow  wrote:
>> My concern is that this is a chicken-and-egg problem.  The situation
>> won't improve until subinterpreters are more readily available.
>
> Okay, but you're assuming that "more libraries work well with
> subinterpreters" is in fact an improvement. I'm asking you to convince
> me of that :-). Are there people saying "oh, if only subinterpreters
> had a Python API and less weird interactions with C extensions, I
> could do "? So far they haven't exactly taken the
> world by storm...

The problem is that most people don't know about the feature.  And
even if they do, using it requires writing a C-extension, which most
people aren't comfortable doing.

>> Other than C globals, is there some other issue?
>
> That's the main one I'm aware of, yeah, though I haven't looked into it 
> closely.

Oh, good.  I haven't missed something. :)  Do you know how often
subinterpreter support is a problem for users?  I was under the
impression from your earlier statements that this is a recurring issue
but my understanding from mod_wsgi is that it isn't that common.

>> I'm fine with Nick's idea about making this a "provisional" module.
>> Would that be enough to ease your concern here?
>
> Potentially, yeah -- basically I'm fine with anything that doesn't end
> up looking like python-dev telling everyone "subinterpreters are the
> future! go forth and yell at any devs who don't support them!".

Great!  I'm also looking at the possibility of adding a mechanism for
extension modules to opt out of subinterpreter support (using PEP 489
ModuleDef slots).  However, I'd rather wait on that if making the PEP
provisional is sufficient.

> What do you think the criteria for graduating to non-provisional
> status should be, in this case?

Consensus among the (Dutch?) core devs that subinterpreters are worth
keeping in the stdlib and that we've smoothed out any rough parts in
the module.

> I guess I would be much more confident in the possibilities here if
> you could give:
>
> - some hand-wavy sketch for how subinterpreter A could call a function
> that as originally defined in subinterpreter B without the GIL, which
> seems like a precondition for sharing user-defined classes

(Before I respond, note that this is way outside the scope of the PEP.
The merit of subinterpreters extends beyond any benefits of running
sans-GIL, though that is my main goal.  I've been updating the PEP to
(hopefully) better communicate the utility of subinterpreters.)

Code objects are immutable so that part should be relatively
straight-forward.  There's the question of closures and default
arguments that would have to be resolved.  However, those are things
that would need to be supported anyway in a world where we want to
pass functions and user-defined types between interpreters.  Doing so
will be a gradual process of starting with immutable non-container
builtin types and expanding out from there to other immutable types,
including user-defined ones.

Note that sharing mutable objects between interpreters would be a
pretty advanced usage (i.e. opt-in shared state vs. threading's
share-everything).  If it proves desirable then we'd sort that out
then.  However, I don't see that as a more than an esoteric feature
relative to subinterpreters.

In my mind, the key advantage of being able to share more (immutable)
objects, including user-defined types, between interpreters is in the
optimization opportunities.  It would allow us to avoid instantiating
the same object in each interpreter.  That said, the way I imagine it
I wouldn't consider such an optimization to be very user-facing so it
doesn't impact the PEP.  The user-facing part would be the expanded
set of immutable objects interpreters could pass back and forth, and
expanding that set won't require any changes to the API in the PEP.

> - some hand-wavy sketch for how refcounting will work for objects
> shared between multiple subinterpreters without the GIL, without
> majorly impacting single-thread performance (I actually forgot about
> this problem in my last email, because PyPy has already solved this
> part!)

(same caveat as above)

There are a number of approaches that may work.  One is to give each
interpreter its own allocator and GC.  Another is to mark shared
objects such that they never get GC'ed.  Another is to allow objects
to exist only in one interpreter at a time.  Similarly, object
ownership (per interpreter) could help.  Asynchronous refcounting
could be an option.  That's only some of the possible approaches.  I
expect that at least one of them will be suitable.  However, the first
step is to get the multi-interpreter support out there.  Then we can
tackle the problem of optimization and multi-core utilization.

FWIW, the biggest complexity is actually in synchronizing the sharing
strategy across the inter-interpreter boundary 

Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Koos Zevenhoven
On Tue, Sep 12, 2017 at 6:30 PM, Koos Zevenhoven  wrote:

> On Tue, Sep 12, 2017 at 5:53 PM, Stefan Krah  wrote:
>
>> On Tue, Sep 12, 2017 at 05:35:34PM +0300, Koos Zevenhoven wrote:
>> > I don't see how the situation benefits from calling something the "main
>> > interpreter". Subinterpreters can be a way to take something
>> > non-thread-safe and make it thread-safe, because in an
>> > interpreter-per-thread scheme, most of the state, like module globals,
>> are
>> > thread-local. (Well, this doesn't help for async concurrency, but
>> anyway.)
>>
>> You could have a privileged C extension that is only imported in the main
>> interpreter:
>>
>>
>> if get_current_interp() is main_interp():
>> from _decimal import *
>> else:
>> from _pydecimal import *
>>
>>
>>
>
​​Oops.. it should of course be "by_this_process", not "by_other_process"
(fixed below).​​



> ​Or it could be first-come first-served:
>
> if is_imported_by_
> ​this
> _process("_decimal"):
> ​​
>
>
from _pydecimal import *
> else
> from _decimal import *
>
> ––Koos
>
>
>
> --
> + Koos Zevenhoven + http://twitter.com/k7hoven +
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Nick Coghlan
On 11 September 2017 at 18:02, Koos Zevenhoven  wrote:
> On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan  wrote:
>> The line between it and the "CPython Runtime" is fuzzy for both
>> practical and historical reasons, but the regular Python CLI will
>> always have a "first created, last destroyed" main interpreter, simply
>> because we don't really gain anything significant from eliminating it
>> as a concept.
>
> I fear that emphasizing the main interpreter will lead to all kinds of
> libraries/programs that somehow unnecessarily rely on some or all tasks
> being performed in the main interpreter. Then you'll have a hard time
> running two of them in parallel in the same process, because you don't have
> two main interpreters.

You don't need to fear this scenario, since it's a description of the
status quo (and it's the primary source of overstated claims about
subinterpreters being "fundamentally broken").

So no, not everything will be subinterpreter-friendly, just as not
everything in Python is thread-safe, and not everything is portable
across platforms. That's OK - it just means we'll aim to make as many
things as possible implicitly subinterpreter-friendly, and for
everything else, we'll aim to minimise the adjustments needed to
*make* things subinterpreter friendly.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-11 Thread Koos Zevenhoven
On Mon, Sep 11, 2017 at 8:32 AM, Nick Coghlan  wrote:

> On 11 September 2017 at 00:52, Koos Zevenhoven  wrote:
> > On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow 
> > wrote:
> > [...]
> >
> >>
> >> get_main():
> >>
> >>Return the main interpreter.
> >>
> >
> > I assume the concept of a main interpreter is inherited from the previous
> > levels of support in the C API, but what exactly is the significance of
> > being "the main interpreter"? Instead, could they just all be
> > subinterpreters of the same Python process (or whatever the right wording
> > would be)?
>
> The main interpreter is ultimately responsible for the actual process
> global state: standard streams, signal handlers, dynamically linked
> libraries, __main__ module, etc.
>
>
​Hmm. It is not clear, for instance, why a signal handler could not be
owned by an interpreter that wasn't the first one started.​ Or, if a
non-main process imports a module from a dynamically linked library, does
it delegate that to the main interpreter? And do sys.stdout et al. not
exist in the other interpreters?

The line between it and the "CPython Runtime" is fuzzy for both
> practical and historical reasons, but the regular Python CLI will
> always have a "first created, last destroyed" main interpreter, simply
> because we don't really gain anything significant from eliminating it
> as a concept.
>

I fear that emphasizing the main interpreter will lead to all kinds of
libraries/programs that somehow unnecessarily rely on some or all tasks
being performed in the main interpreter. Then you'll have a hard time
running two of them in parallel in the same process, because you don't have
two main interpreters.

-- Koos
​​
​PS. There's a saying... something like "always say never" ;)
​

​


> By contrast, embedding applications that *don't* have a __main__
> module, and already manage most process global state themselves
> without the assistance of the CPython Runtime can already get pretty
> close to just having a pool of peer subinterpreters, and will
> presumably be able to get closer over time as the subinterpreter
> support becomes more robust.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-10 Thread Nick Coghlan
On 11 September 2017 at 00:52, Koos Zevenhoven  wrote:
> On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow 
> wrote:
> [...]
>
>>
>> get_main():
>>
>>Return the main interpreter.
>>
>
> I assume the concept of a main interpreter is inherited from the previous
> levels of support in the C API, but what exactly is the significance of
> being "the main interpreter"? Instead, could they just all be
> subinterpreters of the same Python process (or whatever the right wording
> would be)?

The main interpreter is ultimately responsible for the actual process
global state: standard streams, signal handlers, dynamically linked
libraries, __main__ module, etc.

The line between it and the "CPython Runtime" is fuzzy for both
practical and historical reasons, but the regular Python CLI will
always have a "first created, last destroyed" main interpreter, simply
because we don't really gain anything significant from eliminating it
as a concept.

By contrast, embedding applications that *don't* have a __main__
module, and already manage most process global state themselves
without the assistance of the CPython Runtime can already get pretty
close to just having a pool of peer subinterpreters, and will
presumably be able to get closer over time as the subinterpreter
support becomes more robust.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-10 Thread Antoine Pitrou
On Thu, 7 Sep 2017 21:08:48 -0700
Nathaniel Smith  wrote:
> 
> Awesome, thanks for bringing numbers into my wooly-headed theorizing :-).
> 
> On my laptop I actually get a worse result from your benchmark: 531 ms
> for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah,
> transferring data between processes with multiprocessing is slow.
> 
> This is odd, though, because on the same machine, using socat to send
> 1 GiB between processes using a unix domain socket runs at 2 GB/s:

When using local communication, the raw IPC cost is often minor
compared to whatever Python does with the data (parse it, dispatch
tasks around, etc.) except when the data is really huge.

Local communications on Linux can easily reach several GB/s (even using
TCP to localhost).  Here is a Python script with reduced overhead to
measure it -- as opposed to e.g. a full-fledged event loop:
https://gist.github.com/pitrou/d809618359915967ffc44b1ecfc2d2ad

> I don't know why multiprocessing is so slow -- maybe there's a good
> reason, maybe not.

Be careful to measure actual bandwidth, not round-trip latency, however.

> But the reason isn't that IPC is intrinsically
> slow, and subinterpreters aren't going to automatically be 5x faster
> because they can use memcpy.

What could improve performance significantly would be to share objects
without any form of marshalling; but it's not obvious it's possible in
the subinterpreters model *if* it also tries to remove the GIL.

You can see it readily with concurrent.futures, when comparing
ThreadPoolExecutor and ProcessPoolExecutor:

>>> import concurrent.futures as cf
...:tp = cf.ThreadPoolExecutor(4)
...:pp = cf.ProcessPoolExecutor(4)
...:x = b"x" * (100 * 1024**2)
...:def identity(x): return x
...:
>>> y = list(tp.map(identity, [x] * 10))  # warm up
>>> len(y)
10
>>> y = list(pp.map(identity, [x] * 10))  # warm up
>>> len(y)
10
>>> %timeit y = list(tp.map(identity, [x] * 10))
638 µs ± 71.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit y = list(pp.map(identity, [x] * 10))
1.99 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On this trivial case you're really gaining a lot using a thread pool...

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-10 Thread Koos Zevenhoven
On Thu, Sep 7, 2017 at 9:26 PM, Eric Snow 
wrote:
​[...]​


> get_main():
>
>Return the main interpreter.
>
>
​I assume the concept of a main interpreter is inherited from the previous
levels of support in the C API, but what exactly is the significance of
being "the main interpreter"? Instead, could they just all be
subinterpreters of the same Python process (or whatever the right wording
would be)?​

It might also be helpful if the PEP had a short description of what are
considered subinterpreters and how they differ from threads of the same
interpreter [*]. Currently, the PEP seems to rely heavily on knowledge of
the previously available concepts. However, as this would be a new module,
I don't think there's any need to blindly copy the previous design,
regardless of how well the design may have served its purpose at the time.

-- Koos

[*] For instance regarding the role of the glo... local interpreter locks
(LILs) ;)


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-10 Thread Ronald Oussoren

> On 8 Sep 2017, at 05:11, Eric Snow  wrote:

> On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith  wrote:
> 
>> Numpy is the one I'm
>> most familiar with: when we get subinterpreter bugs we close them
>> wontfix, because supporting subinterpreters properly would require
>> non-trivial auditing, add overhead for non-subinterpreter use cases,
>> and benefit a tiny tiny fraction of our users.
> 
> The main problem of which I'm aware is C globals in libraries and
> extension modules.  PEPs 489 and 3121 are meant to help but I know
> that there is at least one major situation which is still a blocker
> for multi-interpreter-safe module state.  Other than C globals, is
> there some other issue?

There’s also the PyGilState_* API that doesn't support multiple interpreters.

The issue there is that callbacks from external libraries back into python
need to use the correct subinterpreter. 

Ronald
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-08 Thread Nathaniel Smith
On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow  wrote:
> First of all, thanks for the feedback and encouragement!  Responses
> in-line below.

I hope it's helpful! More responses in-line as well.

> On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith  wrote:
>> My concern about this is the same as it was last time -- the work
>> looks neat, but right now, almost no-one uses subinterpreters
>> (basically it's Jep and mod_wsgi and that's it?), and therefore many
>> packages get away with ignoring subinterpreters.
>
> My concern is that this is a chicken-and-egg problem.  The situation
> won't improve until subinterpreters are more readily available.

Okay, but you're assuming that "more libraries work well with
subinterpreters" is in fact an improvement. I'm asking you to convince
me of that :-). Are there people saying "oh, if only subinterpreters
had a Python API and less weird interactions with C extensions, I
could do "? So far they haven't exactly taken the
world by storm...

>> Numpy is the one I'm
>> most familiar with: when we get subinterpreter bugs we close them
>> wontfix, because supporting subinterpreters properly would require
>> non-trivial auditing, add overhead for non-subinterpreter use cases,
>> and benefit a tiny tiny fraction of our users.
>
> The main problem of which I'm aware is C globals in libraries and
> extension modules.  PEPs 489 and 3121 are meant to help but I know
> that there is at least one major situation which is still a blocker
> for multi-interpreter-safe module state.  Other than C globals, is
> there some other issue?

That's the main one I'm aware of, yeah, though I haven't looked into it closely.

>> If we add a friendly python-level API like this, then we're committing
>> to this being a part of Python for the long term and encouraging
>> people to use it, which puts pressure on downstream packages to do
>> that work... but it's still not clear whether any benefits will
>> actually materialize.
>
> I'm fine with Nick's idea about making this a "provisional" module.
> Would that be enough to ease your concern here?

Potentially, yeah -- basically I'm fine with anything that doesn't end
up looking like python-dev telling everyone "subinterpreters are the
future! go forth and yell at any devs who don't support them!".

What do you think the criteria for graduating to non-provisional
status should be, in this case?

[snip]
>> So the only case I can see where I'd expect subinterpreters to make
>> communication dramatically more efficient is if you have a "deeply
>> immutable" type
>> [snip]
>> However, it seems impossible to support user-defined deeply-immutable
>> types in Python:
>> [snip]
>
> I agree that it is currently not an option.  That is part of the
> exercise.  There are a number of possible solutions to explore once we
> get to that point.  However, this PEP isn't about that.  I'm confident
> enough about the possibilities that I'm comfortable with moving
> forward here.

I guess I would be much more confident in the possibilities here if
you could give:

- some hand-wavy sketch for how subinterpreter A could call a function
that as originally defined in subinterpreter B without the GIL, which
seems like a precondition for sharing user-defined classes

- some hand-wavy sketch for how refcounting will work for objects
shared between multiple subinterpreters without the GIL, without
majorly impacting single-thread performance (I actually forgot about
this problem in my last email, because PyPy has already solved this
part!)

These are the two problems where I find it most difficult to have faith.

[snip]
>> I hope so -- a really useful subinterpreter multi-core stor[y] would be
>> awesome.
>
> Agreed!  Thanks for the encouragement. :)

Thanks for attempting such an ambitious project :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Nathaniel Smith
On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklin  wrote:
> Those numbers were for common use in Python tools and reflected my anecdotal
> experience at the time with normal Python tools.  I'm sure that there are
> mechanisms to achieve faster speeds than what I experienced.  That being
> said, here is a small example.
>
>
> In [1]: import multiprocessing
> In [2]: data = b'0' * 1  # 100 MB
> In [3]: from toolz import identity
> In [4]: pool = multiprocessing.Pool()
> In [5]: %time _ = pool.apply_async(identity, (data,)).get()
> CPU times: user 76 ms, sys: 64 ms, total: 140 ms
> Wall time: 252 ms
>
> This is about 400MB/s for a roundtrip

Awesome, thanks for bringing numbers into my wooly-headed theorizing :-).

On my laptop I actually get a worse result from your benchmark: 531 ms
for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah,
transferring data between processes with multiprocessing is slow.

This is odd, though, because on the same machine, using socat to send
1 GiB between processes using a unix domain socket runs at 2 GB/s:

# terminal 1
~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock
"SYSTEM:pv -W > /dev/null"
1.00GiB 0:00:00 [1.89GiB/s] [<=>   ]

# terminal 2
~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024"
UNIX:/tmp/unix.sock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s

(Notice that the pv output is in GiB/s and the dd output is in GB/s.
1.89 GiB/s = 2.03 GB/s, so they actually agree.)

On my system, Python allocates + copies memory at 2.2 GB/s, so bulk
byte-level IPC is within 10% of within-process bulk copying:

# same 100 MB bytestring as above
In [7]: bytearray_data = bytearray(data)

In [8]: %timeit bytearray_data.copy()
45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: 0.100 / 0.0453  # GB / seconds
Out[9]: 2.207505518763797

I don't know why multiprocessing is so slow -- maybe there's a good
reason, maybe not. But the reason isn't that IPC is intrinsically
slow, and subinterpreters aren't going to automatically be 5x faster
because they can use memcpy.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
On Thu, Sep 7, 2017 at 12:44 PM, Paul Moore  wrote:
> On 7 September 2017 at 20:14, Eric Snow  wrote:
>> I didn't include such a queue in this proposal because I wanted to
>> keep it as focused as possible.  I'll add a note to the PEP about
>> this.
>
> This all sounds very reasonable. Thanks for the clarification.

Hmm.  Now I'm starting to think some form of basic queue would be
important enough to include in the PEP.  I'll see if that feeling
holds tomorrow.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
First of all, thanks for the feedback and encouragement!  Responses
in-line below.

-eric


On Thu, Sep 7, 2017 at 3:48 PM, Nathaniel Smith  wrote:
> My concern about this is the same as it was last time -- the work
> looks neat, but right now, almost no-one uses subinterpreters
> (basically it's Jep and mod_wsgi and that's it?), and therefore many
> packages get away with ignoring subinterpreters.

My concern is that this is a chicken-and-egg problem.  The situation
won't improve until subinterpreters are more readily available.

> Numpy is the one I'm
> most familiar with: when we get subinterpreter bugs we close them
> wontfix, because supporting subinterpreters properly would require
> non-trivial auditing, add overhead for non-subinterpreter use cases,
> and benefit a tiny tiny fraction of our users.

The main problem of which I'm aware is C globals in libraries and
extension modules.  PEPs 489 and 3121 are meant to help but I know
that there is at least one major situation which is still a blocker
for multi-interpreter-safe module state.  Other than C globals, is
there some other issue?

> If we add a friendly python-level API like this, then we're committing
> to this being a part of Python for the long term and encouraging
> people to use it, which puts pressure on downstream packages to do
> that work... but it's still not clear whether any benefits will
> actually materialize.

I'm fine with Nick's idea about making this a "provisional" module.
Would that be enough to ease your concern here?

> I've actually argued with the PyPy devs to try to convince them to add
> subinterpreter support as part of their experiments with GIL-removal,
> because I think the semantics would genuinely be nicer to work with
> than raw threads, but they're convinced that it's impossible to make
> this work. Or more precisely, they think you could make it work in
> theory, but that it would be impossible to make it meaningfully more
> efficient than using multiple processes. I want them to be wrong, but
> I have to admit I can't see a way to make it work either...

Yikes!  Given the people involved I don't find that to be a good sign.
Nevertheless, I still consider my ultimate goals to be tractable and
will press forward.  At each step thus far, the effort has led to
improvements that extend beyond subinterpreters and multi-core.  I see
that trend continuing for the entirety of the project.  Even if my
final goal is not realized, the result will still be significantly net
positive...and I still think it will still work out. :)

> If this is being justified by the multicore use case, and specifically
> by the theory that having two interpreters in the same process will
> allow for more efficient communication than two interpreters in two
> different processes, then... why should we believe that that's
> actually possible? I want your project to succeed, but if it's going
> to fail then it seems better if it fails before we commit to exposing
> new APIs.

The project is partly about performance.  However, it's also
particularly about offering a alternative concurrency model with an
implementation that can run in multiple threads simultaneously in the
same process.

On Thu, Sep 7, 2017 at 5:15 PM, Nathaniel Smith  wrote:
> The slow case is passing
> complicated objects between processes, and it's slow because pickle
> has to walk the object graph to serialize it, and walking the object
> graph is slow. Copying object graphs between subinterpreters has the
> same problem.

The initial goal is to support passing only strings between
interpreters.  Later efforts will involve investigating approaches to
efficiently and safely passing other objects.

> So the only case I can see where I'd expect subinterpreters to make
> communication dramatically more efficient is if you have a "deeply
> immutable" type
> [snip]
> However, it seems impossible to support user-defined deeply-immutable
> types in Python:
> [snip]

I agree that it is currently not an option.  That is part of the
exercise.  There are a number of possible solutions to explore once we
get to that point.  However, this PEP isn't about that.  I'm confident
enough about the possibilities that I'm comfortable with moving
forward here.

> I guess the other case where subprocesses lose to "real" threads is
> startup time on Windows. But starting a subinterpreter is also much
> more expensive than starting a thread, once you take into account the
> cost of loading the application's modules into the new interpreter. In
> both cases you end up needing some kind of process/subinterpreter pool
> or cache to amortize that cost.

Interpreter startup costs (and optimization strategies) are another
aspect of the project which deserve attention.  However, we'll worry
about that after the core functionality has been achieved.

> Obviously I'm committing the cardinal sin of trying to guess about
> performance based on theory instead of measurement, so 

Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Matthew Rocklin
Those numbers were for common use in Python tools and reflected my
anecdotal experience at the time with normal Python tools.  I'm sure that
there are mechanisms to achieve faster speeds than what I experienced.
That being said, here is a small example.


In [1]: import multiprocessing
In [2]: data = b'0' * 1  # 100 MB
In [3]: from toolz import identity
In [4]: pool = multiprocessing.Pool()
In [5]: %time _ = pool.apply_async(identity, (data,)).get()
CPU times: user 76 ms, sys: 64 ms, total: 140 ms
Wall time: 252 ms

This is about 400MB/s for a roundtrip


On Thu, Sep 7, 2017 at 9:00 PM, Stephan Hoyer  wrote:

> On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith  wrote:
>
>> On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan  wrote:
>> > The gist of the idea is that with subinterpreters, your starting point
>> > is multiprocessing-style isolation (i.e. you have to use pickle to
>> > transfer data between subinterpreters), but you're actually running in
>> > a shared-memory threading context from the operating system's
>> > perspective, so you don't need to rely on mmap to share memory over a
>> > non-streaming interface.
>>
>> The challenge is that streaming bytes between processes is actually
>> really fast -- you don't really need mmap for that. (Maybe this was
>> important for X11 back in the 1980s, but a lot has changed since then
>> :-).) And if you want to use pickle and multiprocessing to send, say,
>> a single big numpy array between processes, that's also really fast,
>> because it's basically just a few memcpy's. The slow case is passing
>> complicated objects between processes, and it's slow because pickle
>> has to walk the object graph to serialize it, and walking the object
>> graph is slow. Copying object graphs between subinterpreters has the
>> same problem.
>>
>
> This doesn't match up with my (somewhat limited) experience. For example,
> in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is
> about 10x slower than a memory copy:
> http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth
>
> This makes a considerable difference when building a system do to parallel
> data analytics in Python (e.g., on NumPy arrays), which is exactly what
> Matthew has been working on for the past few years.
>
> I'm sure there are other ways to avoid this expensive IPC without using
> sub-interpreters, e.g., by using a tool like Plasma (
> http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/).
> But I'm skeptical of your assessment that the current multiprocessing
> approach is fast enough.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Stephan Hoyer
On Thu, Sep 7, 2017 at 5:15 PM Nathaniel Smith  wrote:

> On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan  wrote:
> > The gist of the idea is that with subinterpreters, your starting point
> > is multiprocessing-style isolation (i.e. you have to use pickle to
> > transfer data between subinterpreters), but you're actually running in
> > a shared-memory threading context from the operating system's
> > perspective, so you don't need to rely on mmap to share memory over a
> > non-streaming interface.
>
> The challenge is that streaming bytes between processes is actually
> really fast -- you don't really need mmap for that. (Maybe this was
> important for X11 back in the 1980s, but a lot has changed since then
> :-).) And if you want to use pickle and multiprocessing to send, say,
> a single big numpy array between processes, that's also really fast,
> because it's basically just a few memcpy's. The slow case is passing
> complicated objects between processes, and it's slow because pickle
> has to walk the object graph to serialize it, and walking the object
> graph is slow. Copying object graphs between subinterpreters has the
> same problem.
>

This doesn't match up with my (somewhat limited) experience. For example,
in this table of bandwidth estimates from Matthew Rocklin (CCed), IPC is
about 10x slower than a memory copy:
http://matthewrocklin.com/blog/work/2015/12/29/data-bandwidth

This makes a considerable difference when building a system do to parallel
data analytics in Python (e.g., on NumPy arrays), which is exactly what
Matthew has been working on for the past few years.

I'm sure there are other ways to avoid this expensive IPC without using
sub-interpreters, e.g., by using a tool like Plasma (
http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/).
But I'm skeptical of your assessment that the current multiprocessing
approach is fast enough.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Nathaniel Smith
On Thu, Sep 7, 2017 at 4:23 PM, Nick Coghlan  wrote:
> On 7 September 2017 at 15:48, Nathaniel Smith  wrote:
>> I've actually argued with the PyPy devs to try to convince them to add
>> subinterpreter support as part of their experiments with GIL-removal,
>> because I think the semantics would genuinely be nicer to work with
>> than raw threads, but they're convinced that it's impossible to make
>> this work. Or more precisely, they think you could make it work in
>> theory, but that it would be impossible to make it meaningfully more
>> efficient than using multiple processes. I want them to be wrong, but
>> I have to admit I can't see a way to make it work either...
>
> The gist of the idea is that with subinterpreters, your starting point
> is multiprocessing-style isolation (i.e. you have to use pickle to
> transfer data between subinterpreters), but you're actually running in
> a shared-memory threading context from the operating system's
> perspective, so you don't need to rely on mmap to share memory over a
> non-streaming interface.

The challenge is that streaming bytes between processes is actually
really fast -- you don't really need mmap for that. (Maybe this was
important for X11 back in the 1980s, but a lot has changed since then
:-).) And if you want to use pickle and multiprocessing to send, say,
a single big numpy array between processes, that's also really fast,
because it's basically just a few memcpy's. The slow case is passing
complicated objects between processes, and it's slow because pickle
has to walk the object graph to serialize it, and walking the object
graph is slow. Copying object graphs between subinterpreters has the
same problem.

So the only case I can see where I'd expect subinterpreters to make
communication dramatically more efficient is if you have a "deeply
immutable" type: one where not only are its instances immutable, but
all objects reachable from those instances are also guaranteed to be
immutable. So like, a tuple except that when you instantiate it it
validates that all of its elements are also marked as deeply
immutable, and errors out if not. Then when you go to send this
between subinterpreters, you can tell by checking the type of the root
object that the whole graph is immutable, so you don't need to walk it
yourself.

However, it seems impossible to support user-defined deeply-immutable
types in Python: types and functions are themselves mutable and hold
tons of references to other potentially mutable objects via __mro__,
__globals__, __weakrefs__, etc. etc., so even if a user-defined
instance can be made logically immutable it's still going to hold
references to mutable things. So the one case where subinterpreters
win is if you have a really big and complicated set of nested
pseudo-tuples of ints and strings and you're bottlenecked on passing
it between interpreters. Maybe frozendicts too. Is that enough to
justify the whole endeavor? It seems dubious to me.

I guess the other case where subprocesses lose to "real" threads is
startup time on Windows. But starting a subinterpreter is also much
more expensive than starting a thread, once you take into account the
cost of loading the application's modules into the new interpreter. In
both cases you end up needing some kind of process/subinterpreter pool
or cache to amortize that cost.

Obviously I'm committing the cardinal sin of trying to guess about
performance based on theory instead of measurement, so maybe I'm
wrong. Or maybe there's some deviously clever trick I'm missing. I
hope so -- a really useful subinterpreter multi-core store would be
awesome.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Nick Coghlan
On 7 September 2017 at 15:48, Nathaniel Smith  wrote:
> I've actually argued with the PyPy devs to try to convince them to add
> subinterpreter support as part of their experiments with GIL-removal,
> because I think the semantics would genuinely be nicer to work with
> than raw threads, but they're convinced that it's impossible to make
> this work. Or more precisely, they think you could make it work in
> theory, but that it would be impossible to make it meaningfully more
> efficient than using multiple processes. I want them to be wrong, but
> I have to admit I can't see a way to make it work either...

The gist of the idea is that with subinterpreters, your starting point
is multiprocessing-style isolation (i.e. you have to use pickle to
transfer data between subinterpreters), but you're actually running in
a shared-memory threading context from the operating system's
perspective, so you don't need to rely on mmap to share memory over a
non-streaming interface.

It's also definitely the case that to make this viable, we'd need to
provide fast subinterpreter friendly alternatives to C globals for use
by extension modules (otherwise adding subinterpreter compatibility
will be excessively painful), and PEP 550 is likely to be helpful
there.

Personally, I think it would make sense to add the module under PEP
411 provisional status, and make it's continued existence as a public
API contingent on actually delivering on the "lower overhead
multi-core support than multiprocessing" goal (even if it only
delivers on that front on Windows, where process creation is more
expensive and there's no fork() equivalent).

However, I'd also be entirely happy with our adding it as a private
"_subinterpreters" API for testing & experimentation purposes (see
https://bugs.python.org/issue30439 ), and reconsidering introducing it
as a public API after there's more concrete evidence as to what can
actually be achieved based on it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Nathaniel Smith
On Thu, Sep 7, 2017 at 11:26 AM, Eric Snow  wrote:
> Hi all,
>
> As part of the multi-core work I'm proposing the addition of the
> "interpreters" module to the stdlib.  This will expose the existing
> subinterpreters C-API to Python code.  I've purposefully kept the API
> simple.  Please let me know what you think.

My concern about this is the same as it was last time -- the work
looks neat, but right now, almost no-one uses subinterpreters
(basically it's Jep and mod_wsgi and that's it?), and therefore many
packages get away with ignoring subinterpreters. Numpy is the one I'm
most familiar with: when we get subinterpreter bugs we close them
wontfix, because supporting subinterpreters properly would require
non-trivial auditing, add overhead for non-subinterpreter use cases,
and benefit a tiny tiny fraction of our users.

If we add a friendly python-level API like this, then we're committing
to this being a part of Python for the long term and encouraging
people to use it, which puts pressure on downstream packages to do
that work... but it's still not clear whether any benefits will
actually materialize.

I've actually argued with the PyPy devs to try to convince them to add
subinterpreter support as part of their experiments with GIL-removal,
because I think the semantics would genuinely be nicer to work with
than raw threads, but they're convinced that it's impossible to make
this work. Or more precisely, they think you could make it work in
theory, but that it would be impossible to make it meaningfully more
efficient than using multiple processes. I want them to be wrong, but
I have to admit I can't see a way to make it work either...

If this is being justified by the multicore use case, and specifically
by the theory that having two interpreters in the same process will
allow for more efficient communication than two interpreters in two
different processes, then... why should we believe that that's
actually possible? I want your project to succeed, but if it's going
to fail then it seems better if it fails before we commit to exposing
new APIs.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
On Thu, Sep 7, 2017 at 1:14 PM, Sebastian Krause  wrote:
> How is the GIL situation with subinterpreters these days, is the
> long-term goal still "solving multi-core Python", i.e. using
> multiple CPU cores from within the same process? Or is it mainly
> used for isolation?

The GIL is still process-global.  The goal is indeed to change this to
support actual multi-core parallelism.  However, the benefits of
interpreter isolation are certainly a win otherwise. :)

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
On Thu, Sep 7, 2017 at 12:44 PM, Paul Moore  wrote:
> Ah, OK. so if I create a new interpreter, none of the classes,
> functions, or objects defined in my calling code will exist within the
> target interpreter? That makes sense, but I'd missed that nuance from
> the description. Again, this is probably worth noting in the PEP.

I'll make sure the PEP is more clear about this.

>
> And for the record, based on that one fact, I'm perfectly OK with the
> initial API being string-only.

Great! :)

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Sebastian Krause
Eric Snow  wrote:
> 1. add a basic queue class for passing objects between interpreters
> * only support strings at first (though Nick pointed out we could
> fall back to pickle or marshal for unsupported objects)
> 2. implement CSP on top of subinterpreters
> 3. expand the queue's supported types
> 4. add something like Interpreter.call()

How is the GIL situation with subinterpreters these days, is the
long-term goal still "solving multi-core Python", i.e. using
multiple CPU cores from within the same process? Or is it mainly
used for isolation?

Sebastian
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Paul Moore
On 7 September 2017 at 20:14, Eric Snow  wrote:
> On Thu, Sep 7, 2017 at 11:52 AM, Paul Moore  wrote:
>> Is there any reason why passing a callable and args is unsafe, and/or
>> difficult? Naively, I'd assume that
>>
>> interp.call('f(a)')
>>
>> would be precisely as safe as
>>
>> interp.call(f, a)
>
> The problem for now is with sharing objects between interpreters.  The
> simplest safe approach currently is to restrict execution to source
> strings.  Then there are no complications.  Interpreter.call() makes
> sense but I'd like to wait until we get feel for how subinterpreters
> get used and until we address some of the issues with object passing.

Ah, OK. so if I create a new interpreter, none of the classes,
functions, or objects defined in my calling code will exist within the
target interpreter? That makes sense, but I'd missed that nuance from
the description. Again, this is probably worth noting in the PEP.

And for the record, based on that one fact, I'm perfectly OK with the
initial API being string-only.

> FWIW, here are what I see as the next steps for subinterpreters in the stdlib:
>
> 1. add a basic queue class for passing objects between interpreters
> * only support strings at first (though Nick pointed out we could
> fall back to pickle or marshal for unsupported objects)
> 2. implement CSP on top of subinterpreters
> 3. expand the queue's supported types
> 4. add something like Interpreter.call()
>
> I didn't include such a queue in this proposal because I wanted to
> keep it as focused as possible.  I'll add a note to the PEP about
> this.

This all sounds very reasonable. Thanks for the clarification.
Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
On Thu, Sep 7, 2017 at 11:52 AM, Paul Moore  wrote:
> The only quibble I have is that I'd prefer it if we had a
> run(callable, *args, **kwargs) method. Either instead of, or as well
> as, the run(string) one here.
>
> Is there any reason why passing a callable and args is unsafe, and/or
> difficult? Naively, I'd assume that
>
> interp.call('f(a)')
>
> would be precisely as safe as
>
> interp.call(f, a)

The problem for now is with sharing objects between interpreters.  The
simplest safe approach currently is to restrict execution to source
strings.  Then there are no complications.  Interpreter.call() makes
sense but I'd like to wait until we get feel for how subinterpreters
get used and until we address some of the issues with object passing.

FWIW, here are what I see as the next steps for subinterpreters in the stdlib:

1. add a basic queue class for passing objects between interpreters
* only support strings at first (though Nick pointed out we could
fall back to pickle or marshal for unsupported objects)
2. implement CSP on top of subinterpreters
3. expand the queue's supported types
4. add something like Interpreter.call()

I didn't include such a queue in this proposal because I wanted to
keep it as focused as possible.  I'll add a note to the PEP about
this.

>
> Am I missing something? Name visibility or scoping issues come to mind
> as possible complications I'm not seeing. At the least, if we don't
> want a callable-and-args form yet, a note in the PEP explaining why
> it's been omitted would be worthwhile.

I'll add a note to the PEP.  Thanks for pointing this out. :)

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Paul Moore
On 7 September 2017 at 19:26, Eric Snow  wrote:
> As part of the multi-core work I'm proposing the addition of the
> "interpreters" module to the stdlib.  This will expose the existing
> subinterpreters C-API to Python code.  I've purposefully kept the API
> simple.  Please let me know what you think.

Looks good. I agree with the idea of keeping the interface simple in
the first instance - we can easily add extra functionality later, but
removing stuff (or worse still, finding that stuff we thought was OK
but had missed corner cases of was broken) is much harder.

>run(code):
>
>   Run the provided Python code in the interpreter, in the current
>   OS thread.  Supported code: source text.

The only quibble I have is that I'd prefer it if we had a
run(callable, *args, **kwargs) method. Either instead of, or as well
as, the run(string) one here.

Is there any reason why passing a callable and args is unsafe, and/or
difficult? Naively, I'd assume that

interp.call('f(a)')

would be precisely as safe as

interp.call(f, a)

Am I missing something? Name visibility or scoping issues come to mind
as possible complications I'm not seeing. At the least, if we don't
want a callable-and-args form yet, a note in the PEP explaining why
it's been omitted would be worthwhile.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-07 Thread Eric Snow
Hi all,

As part of the multi-core work I'm proposing the addition of the
"interpreters" module to the stdlib.  This will expose the existing
subinterpreters C-API to Python code.  I've purposefully kept the API
simple.  Please let me know what you think.

-eric

https://www.python.org/dev/peps/pep-0554/
https://github.com/python/peps/blob/master/pep-0554.rst
https://github.com/python/cpython/pull/1748
https://github.com/python/cpython/pull/1802
https://github.com/ericsnowcurrently/cpython/tree/high-level-interpreters-module

**

PEP: 554
Title: Multiple Interpreters in the Stdlib
Author: Eric Snow 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2017-09-05
Python-Version: 3.7
Post-History:

Abstract


This proposal introduces the stdlib "interpreters" module.  It exposes
the basic functionality of subinterpreters that exists in the C-API.

Rationale
=

Running code in multiple interpreters provides a useful level of
isolation within the same process.  This can be leveraged in number
of ways.  Furthermore, subinterpreters provide a well-defined framework
in which such isolation may extended.

CPython has supported subinterpreters, with increasing levels of
support, since version 1.5.  While the feature has the potential
to be a powerful tool, subinterpreters have suffered from neglect
because they are not available directly from Python.  Exposing the
existing functionality in the stdlib will help reverse the situation.

Proposal


The "interpreters" module will be added to the stdlib.  It will
provide a high-level interface to subinterpreters and wrap the low-level
"_interpreters" module.  The proposed API is inspired by the
threading module.

The module provides the following functions:

enumerate():

   Return a list of all existing interpreters.

get_current():

   Return the currently running interpreter.

get_main():

   Return the main interpreter.

create():

   Initialize a new Python interpreter and return it.  The
   interpreter will be created in the current thread and will remain
   idle until something is run in it.

The module also provides the following class:

Interpreter(id):

   id:

  The interpreter's ID (read-only).

   is_running():

  Return whether or not the interpreter is currently running.

   destroy():

  Finalize and destroy the interpreter.

   run(code):

  Run the provided Python code in the interpreter, in the current
  OS thread.  Supported code: source text.

Copyright
=

This document has been placed in the public domain.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/