Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-21 Thread Barry Scott
Eric,

> On 17 Jul 2018, at 20:35, Eric Snow  wrote:
> 
> With this in mind, here's how I'm approaching the problem:
> 
> 1. interp A "shares" an object with interp B (e.g. through a channel)
>* the object is incref'ed under A before it is sent to B
> 2. the object is wrapped in a proxy owned by B
>* the proxy may not make C-API calls that would mutate the object
> or even cause an incref/decref
> 3. when the proxy is GC'd, the original object is decref'ed
>* the decref must happen in a thread in which A is running

How does the proxy at the same time make the object accessible and prevent 
mutation?

Would it help if there was an explicit owning thread for each object?

I'm thinking that you can do a fast check that the object belongs to the 
current thread and can use that knowledge to avoid locking.
If the object is owned by another thread acquire the GIL in the traditional way 
and mutating the state will be safe.

The "sharing" process can ensure that until an explicit "unsharing" the object 
remains safe to test in all threads that share an object,
avoiding the need for the special processor instructions.

Barry

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-20 Thread Nick Coghlan
On 18 July 2018 at 05:35, Eric Snow  wrote:
> In order to make all this work the missing piece is a mechanism by
> which the decref (#3) happens under the original interpreter.  At the
> moment Emily Morehouse and I are pursuing an approach that extends the
> existing ceval "pending call" machinery currently used for handling
> signals (see Py_AddPendingCall).  The new [*private*] API would work
> the same way but on a per-interpreter basis rather than just the main
> interpreter.  This would allow one interpreter to queue up a decref to
> happen later under another interpreter.
>
> FWIW, this ability to decref an object under a different interpreter
> is a blocker right now for a number of things, including supporting
> buffers in PEP 554 channels.

Aw, I guess the original idea of just doing an active interpreter
context switch in the current thread around the shared object decref
operation didn't work out? That's a shame.

I'd be curious as to the technical details of what actually failed in
that approach, as I would have expected it to at least work, even if
the performance might not have been wonderful. (Although thinking
about it further now given a per-interpreter locking model, I suspect
there could be some wonderful opportunities for cross-interpreter
deadlocks that we didn't consider in our initial design sketch...)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-19 Thread Stephan Houben
Hi Nathaniel,

2018-07-19 1:33 GMT+02:00 Nathaniel Smith :

> Note that this everything you said here also exactly describes the
> programming model for the existing 'multiprocessing' module:
> "structured clone" is equivalent to how multiprocessing uses pickle to
> transfer arbitrary objects, or you can use multiprocessing.Array to
> get a shared view on raw "C"-style data.
>


This is true. In fact, I am a big fan of multiprocessing and I think it is
often
overlooked/underrated. Experience with multiprocessing is also
what has me convinced that share-nothing or share-explicit approach
to concurrency is a useful programming model.

The main limitation of multiprocessing comes when you need to go outside
Python, and you need to interact with C/C++ libraries or operating services
from multiple processes.
The support for this generally varies from "extremely weak" to "none at
all".

For example, things I would like to in parallel with a main thread/process:

* Upload data to the GPU using OpenGL or OpenCL
* Generate a picture in pyqt QImage, then hand over zero-copy to main thread
* interact with a complex scenegraph in C++ (shared with main thread)

This is impossible right now but would be possible if the interpreters were
all in-process.

In addition, there are things which are now hard with "multiprocessing" but
could be fixed.
For example, sharing a Numpy array is possible but very inconvenient.
You need to first allocate the raw data segment, communicate that, then
create in each process
an array which uses this data segment.

Ideally, this would rather work like this:

   ar = numpy.zeros((30, 30), shared=True)

and then "ar" would automatically be shared.

This is fixable but given the other limitations above the question is if it
is worthwhile
to fix it now. It would be a lot simpler to fix if we had the in-process
model.

But yeah, I am actually also very open to ideas on how multiprocessing
could be
made more convenient and powerful. Perhaps there are ways, and I am just
not seeing them.

Stephan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-19 Thread Pau Freixes
Hi,

>
> Note that this everything you said here also exactly describes the
> programming model for the existing 'multiprocessing' module:
> "structured clone" is equivalent to how multiprocessing uses pickle to
> transfer arbitrary objects, or you can use multiprocessing.Array to
> get a shared view on raw "C"-style data.
>

That's good. If finally, CPython can provide a pattern plus an API
that has similitudes with the existing ones will make the adoption
less friction.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Nathaniel Smith
On Wed, Jul 18, 2018 at 11:49 AM, Stephan Houben  wrote:
> Basically, what I am suggesting is a direct translation of Javascript's
> Web Worker API
> (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API)
> to Python.
>
> The Web Worker API is generally considered a "share-nothing" approach,
> although
> as we will see some state can be shared.
>
> The basic principle is that any object lives in a single Worker (Worker =
> subinterpreter).
> If a message is send from Worker A to Worker B, the message is not shared,
> rather the so-called "structured clone" algorithm is used to create
> recursively a NEW message
> object in Worker B. This is roughly equivalent to pickling in A and then
> unpickling in B,
>
> Of course, this may become a bottleneck if large amounts of data need to be
> communicated.
> Therefore, there is a special object type designed to provide a view upon a
> piece
> of shared memory:  SharedArrayBuffer. Notable, this only provides a view
> upon
> raw "C"-style data (ints or floats or whatever), not on Javascript objects.

Note that this everything you said here also exactly describes the
programming model for the existing 'multiprocessing' module:
"structured clone" is equivalent to how multiprocessing uses pickle to
transfer arbitrary objects, or you can use multiprocessing.Array to
get a shared view on raw "C"-style data.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Eric Snow
On Wed, Jul 18, 2018 at 2:38 PM MRAB  wrote:
> What if an object is not going to be shared, but instead "moved" from
> one subinterpreter to another? The first subinterpreter would no longer
> have a reference to the object.
>
> If the object's refcount is 1 and the object doesn't refer to any other
> object, then copying would not be necessary.

Yeah, that's something that I'm sure we'll investigate at some point,
but it's not part of the short-term plans.  This belongs to a whole
class of possibilities that we'll explore once we have the basic
functionality established. :)  FWIW, I don't think that "moving" an
object like this would be to hard to implement.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread MRAB

On 2018-07-18 20:35, Eric Snow wrote:

On Wed, Jul 18, 2018 at 12:49 PM Stephan Houben  wrote:

Antoine said that what I proposed earlier was very similar to what Eric
is trying to do, but from the direction the discussion has taken so far
that appears not to be the case.


It looks like we are after the same thing actually. :)  Sorry for any confusion.


There are currently no provisions for actually sharing objects between
interpreters.  In fact, initially the plan is basically to support
sharing copies of basic builtin immuntable types.  The question of
refcounts comes in when we actually do share underlying data of
immutable objects (e.g. the buffer protocol).

What if an object is not going to be shared, but instead "moved" from 
one subinterpreter to another? The first subinterpreter would no longer 
have a reference to the object.


If the object's refcount is 1 and the object doesn't refer to any other 
object, then copying would not be necessary.


[snip]
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Eric Snow
On Wed, Jul 18, 2018 at 12:49 PM Stephan Houben  wrote:
> Antoine said that what I proposed earlier was very similar to what Eric
> is trying to do, but from the direction the discussion has taken so far
> that appears not to be the case.

It looks like we are after the same thing actually. :)  Sorry for any confusion.


There are currently no provisions for actually sharing objects between
interpreters.  In fact, initially the plan is basically to support
sharing copies of basic builtin immuntable types.  The question of
refcounts comes in when we actually do share underlying data of
immutable objects (e.g. the buffer protocol).

> I will therefore try to clarify my proposal.
>
> Basically, what I am suggesting is a direct translation of Javascript's
> Web Worker API 
> (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API)
> to Python.
>
> The Web Worker API is generally considered a "share-nothing" approach, 
> although
> as we will see some state can be shared.

Yes, there's a strong parallel to that model here.  In fact, I
mentioned web workers in my language summit talk at PyCon 2018.

> The basic principle is that any object lives in a single Worker (Worker = 
> subinterpreter).
> If a message is send from Worker A to Worker B, the message is not shared,
> rather the so-called "structured clone" algorithm is used to create 
> recursively a NEW message
> object in Worker B. This is roughly equivalent to pickling in A and then 
> unpickling in B,

That is exactly what the channels in the PEP 554 implementation do,
though much more efficiently than pickling.  Initial support will be
for basic builtin immutable types.  We can later consider support for
other (even arbitrary?) types, but anything beyond copying (e.g.
pickle) is way off my radar.  Python's C-API is so closely tied to
refcounting that we simply cannot support safely sharing actual Python
objects between interpreters once we no longer share the GIL between
them.

> Of course, this may become a bottleneck if large amounts of data need to be 
> communicated.
> Therefore, there is a special object type designed to provide a view upon a 
> piece
> of shared memory:  SharedArrayBuffer. Notable, this only provides a view upon
> raw "C"-style data (ints or floats or whatever), not on Javascript objects.

Yep, that translates to buffers in Python, which is covered by PEP 554
(see SendChannel.send_buffer).

In this case, where some underlying data is actually shared, the
implementation has to deal with keeping a reference to the original
object and releasing it when done, which is what all the talk of
refcounts has been about.  However, the PEP does not talk about it
because it is an implementation detail that is not exposed in Python.

> To translate this to the Python situation: each Python object is owned by a 
> single
> subinterpreter, and may only be manipulated by a thread which holds the GIL
> of that particular subinterpreter. Message sending between subinterpreters 
> will
> require the message objects to be "structured cloned".

Correct.  That is what PEP 554 does.

As an aside, your phrasing "may only be manipulated by a thread which
holds the GIL of that particular subinterpreter" did spark something
I'll consider later:  perhaps interpreters can acquire each other's
GIL when (infrequently) necessary.  That could simplify a few things.

> Certain C extension types may override what structured cloning means for them.
> In particular, some C extension types may have a two-layer structure where
> the Py_Object contains a refcounted pointer to the actual data.
> The structured cloning on such an object may create a second Py_Object which
> references the same underlying object.
> This secondary refcount will need to be properly atomic, since it may be 
> manipulated
> from multiple subinterpreters.

My implementation of PEP 554 supports this, though I have not made the
C-API for it public.  It's also not part of the PEP.  I was
considering adding it.

> In this way, interpreter-shared data structures can be implemented.
> However, all the "normal" Python objects are not shared and can continue
> to use the current, non-atomic refcounting implementation.

That is correct.  That entirely matches what I'm doing with PEP 554.
In fact, the isolation between interpreters is critical to my
multi-core Python project, of which PEP 554 is a part.  It's necessary
in order to stop sharing the GIL between interpreters.  So actual
objects will never be shared between interpreters.  They can't be.

> Hope this clarifies my proposal.

Yep.  Thanks!

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Jonathan Fine
Hi

Python in the age of the multi-core processor is an important question. And
garbage collection is one of the many issues involved.

I've been thinking about the garbage collection problem, and lurking on
this list, for a while. I think it's now about time I showed myself, and
shared my thoughts. I intend to do this in a new thread, dealing only with
the problem of multi-core reference counting garbage collection. I hope you
don't mind my doing this. Expect the first instalment tomorrow.

with best regards

Jonathan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Eric Snow
On Wed, Jul 18, 2018 at 1:37 AM Barry Scott  wrote:
> Let me try a longer answer. The inc+test and dec+test do not require a
> lock if coded correctly. All OS and run times have solved this to provide
> locks. All processors provide the instructions that are the building blocks
> for lock primitives.
>
> You cannot mutate a mutable python object that is not protected with the GIL 
> as
> the change of state involves multiple parts of the object changing.
>
> If you know that an object is immutable then you could only do a check on the
> ref count as you will never change the state of the object beyond its ref 
> count.
> To access the object you only have to ensure it will not be deleted, which the
> ref count guarantees. The delete of the immutable object is then the only job
> that the original interpreter must do.

Perhaps we're agreeing?  Other than the single decref at when
"releasing" the object, it won't ever be directly modified (even the
refcount) in the other interpreter.  In effect that interpreter holds
a reference to the object which prevents GC in the "owning"
interpreter (the corresponding incref happened in that original
interpreter before the object was "shared").  The only issue is how to
"release" the object in the other interpreter so that the decref
happens in the "owning" interpreter.  As earlier noted, I'm planning
on taking advantage of the exiting ceval "pending calls" machinery.

So I'm not sure where an atomic int would factor in.  If you mean
switching the exiting refcount to an atomic int for the sake of the
cross-interpreter decref then that's not going to happen, as Ronald
suggested.  Larry could tell you about his Gilectomy experience. :)

Are you suggesting something like a second "cross-interpreter
refcount", which would be atomic, and add a check in Py_DECREF?  That
would imply an extra cross-interpreter-oriented C-API to parallel
Py_DECREF.  It would also mean either adding another field to PyObject
(yikes!) or keeping a separate table for tracking cross-interpreter
references.  I'm not sure any of that would be better than the
alternative I'm pursuing.  Then again, I've considered tracking which
interpreters hold a "reference" to an object, which isn't that
different.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Eric Snow
On Wed, Jul 18, 2018 at 3:31 AM Antoine Pitrou  wrote:
> Please read in context: we are not talking about making all refcounts
> atomic, only a couple refcounts on shared objects (which probably
> won't be Python objects, actually).

I have no plans to use refcounts for shared data (outside of Python objects).

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Antoine Pitrou
On Wed, 18 Jul 2018 08:21:31 +0100
Ronald Oussoren via Python-ideas
 wrote:
> Some past attempts at getting rid of the GIL used atomic inc/dec, and that 
> resulted in bad performance because these instructions  aren’t cheap. 

Please read in context: we are not talking about making all refcounts
atomic, only a couple refcounts on shared objects (which probably
won't be Python objects, actually).

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Barry Scott


> On 18 Jul 2018, at 08:21, Ronald Oussoren  wrote:
> 
> Op 18 jul. 2018 om 08:02 heeft Barry  > het volgende geschreven:
> 
>> 
>> 
 On 17 Jul 2018, at 21:00, Eric Snow  wrote:
 
 On Tue, Jul 17, 2018 at 1:44 PM Barry  wrote:
 The decrement itself is not the problem, that can be made thread safe.
>>> 
>>> Yeah, by using the GIL.   Otherwise, please elaborate.  My
>>> understanding is that if the decrement itself were not the problem
>>> then we'd have gotten rid of the GIL already.
>> 
>> All processors have thread safe ways to inc and dec and test, integers 
>> without holding a lock.
>> 
>> That is the mechanism that locks themselves are built out of. You can use 
>> that to avoid holding the GIL until the ref count reaches 0.
>> 
>> In c++ they built it into the language with std::atomic_int, you would have 
>> to find the way to do this C, i don’t have an answer at my finger tips for C.
>> 
> Some past attempts at getting rid of the GIL used atomic inc/dec, and that 
> resulted in bad performance because these instructions  aren’t cheap. 

Isn't this class of problem what leads to the per-processor caches and other 
optimisations in Linux kernel?
I wonder if kernel optimisations could be applied to this problem?

> 
> My gut feeling is that you’d have to get rid of refcounts to get high 
> performance when getting rid of the GIL in a single interpreter, which would 
> almost certainly result in breaking the C API.

Working on the ref count costs might be the enabling tech.

We already have the problem of unchanging objects being copied after a fork 
because of the ref counts being inside the object.
It was suggested that the ref count would have to move out of the object to 
help with this problem.

If there is a desirable solution to the parallel problem we can think about the 
C API migration problem.

Barry

>  
> 
> Ronald
>> Barry
>> 
>>> 
 Do you mean that once the ref reaches 0 you have to make the delete happen 
 on the original interpreter?
>>> 
>>> Yep.  For one thing, GC can trigger __del__, which can do anything,
>>> including modifying other objects from the original interpreter (incl.
>>> decref'ing them).  __del__ should be run under the original
>>> interpreter.  For another thing, during GC containers often decref
>>> their items.  Also, separating the GIL between interpreters may mean
>>> we'll need an allocator per interpreter.  In that case the
>>> deallocation must happen relative to the interpreter where the object
>>> was allocated.
>>> 
>>> -eric
>>> 
>> 
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org 
>> https://mail.python.org/mailman/listinfo/python-ideas 
>> 
>> Code of Conduct: http://python.org/psf/codeofconduct/ 
>> 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Barry Scott



> On 17 Jul 2018, at 21:00, Eric Snow  wrote:
> 
> On Tue, Jul 17, 2018 at 1:44 PM Barry  wrote:
>> The decrement itself is not the problem, that can be made thread safe.
> 
> Yeah, by using the GIL.   Otherwise, please elaborate.  My
> understanding is that if the decrement itself were not the problem
> then we'd have gotten rid of the GIL already.

Let me try a longer answer. The inc+test and dec+test do not require a
lock if coded correctly. All OS and run times have solved this to provide
locks. All processors provide the instructions that are the building blocks
for lock primitives.

You cannot mutate a mutable python object that is not protected with the GIL as
the change of state involves multiple parts of the object changing.

If you know that an object is immutable then you could only do a check on the
ref count as you will never change the state of the object beyond its ref count.
To access the object you only have to ensure it will not be deleted, which the
ref count guarantees. The delete of the immutable object is then the only job
that the original interpreter must do.

> 
>> Do you mean that once the ref reaches 0 you have to make the delete happen 
>> on the original interpreter?
> 
> Yep.  For one thing, GC can trigger __del__, which can do anything,
> including modifying other objects from the original interpreter (incl.
> decref'ing them).  __del__ should be run under the original
> interpreter.  For another thing, during GC containers often decref
> their items.  Also, separating the GIL between interpreters may mean
> we'll need an allocator per interpreter.  In that case the
> deallocation must happen relative to the interpreter where the object
> was allocated.

Yep that I understand.

Barry

> 
> -eric
> 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Ronald Oussoren via Python-ideas
Op 18 jul. 2018 om 08:02 heeft Barry  het volgende 
geschreven:

> 
> 
>>> On 17 Jul 2018, at 21:00, Eric Snow  wrote:
>>> 
>>> On Tue, Jul 17, 2018 at 1:44 PM Barry  wrote:
>>> The decrement itself is not the problem, that can be made thread safe.
>> 
>> Yeah, by using the GIL.   Otherwise, please elaborate.  My
>> understanding is that if the decrement itself were not the problem
>> then we'd have gotten rid of the GIL already.
> 
> All processors have thread safe ways to inc and dec and test, integers 
> without holding a lock.
> 
> That is the mechanism that locks themselves are built out of. You can use 
> that to avoid holding the GIL until the ref count reaches 0.
> 
> In c++ they built it into the language with std::atomic_int, you would have 
> to find the way to do this C, i don’t have an answer at my finger tips for C.
> 
 Some past attempts at getting rid of the GIL used atomic inc/dec, and that 
resulted in bad performance because these instructions  aren’t cheap. 

My gut feeling is that you’d have to get rid of refcounts to get high 
performance when getting rid of the GIL in a single interpreter, which would 
almost certainly result in breaking the C API.  

Ronald
> Barry
> 
>> 
>>> Do you mean that once the ref reaches 0 you have to make the delete happen 
>>> on the original interpreter?
>> 
>> Yep.  For one thing, GC can trigger __del__, which can do anything,
>> including modifying other objects from the original interpreter (incl.
>> decref'ing them).  __del__ should be run under the original
>> interpreter.  For another thing, during GC containers often decref
>> their items.  Also, separating the GIL between interpreters may mean
>> we'll need an allocator per interpreter.  In that case the
>> deallocation must happen relative to the interpreter where the object
>> was allocated.
>> 
>> -eric
>> 
> 
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Barry


>> On 17 Jul 2018, at 21:00, Eric Snow  wrote:
>> 
>> On Tue, Jul 17, 2018 at 1:44 PM Barry  wrote:
>> The decrement itself is not the problem, that can be made thread safe.
> 
> Yeah, by using the GIL.   Otherwise, please elaborate.  My
> understanding is that if the decrement itself were not the problem
> then we'd have gotten rid of the GIL already.

All processors have thread safe ways to inc and dec and test, integers without 
holding a lock.

That is the mechanism that locks themselves are built out of. You can use that 
to avoid holding the GIL until the ref count reaches 0.

In c++ they built it into the language with std::atomic_int, you would have to 
find the way to do this C, i don’t have an answer at my finger tips for C.

Barry

> 
>> Do you mean that once the ref reaches 0 you have to make the delete happen 
>> on the original interpreter?
> 
> Yep.  For one thing, GC can trigger __del__, which can do anything,
> including modifying other objects from the original interpreter (incl.
> decref'ing them).  __del__ should be run under the original
> interpreter.  For another thing, during GC containers often decref
> their items.  Also, separating the GIL between interpreters may mean
> we'll need an allocator per interpreter.  In that case the
> deallocation must happen relative to the interpreter where the object
> was allocated.
> 
> -eric
> 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-17 Thread Trent Nelson


(Apologies for the slow reply, I'm in the middle of a relocation
 at the moment so e-mail access isn't consistent, and will be a
 lot worse over the next few weeks.)

On Tue, Jul 10, 2018 at 07:31:49AM -0700, David Foster wrote:
> I was not aware of PyParallel. The PyParellel "parallel thread"
> line-of-execution implementation is pretty interesting. Trent, big
> kudos to you on that effort.
> 
> Since you're speaking in the past tense and said "but we're not
> doing it like that", I infer that the notion of a parallel thread
> was turned down for integration into CPython, as that appears to
> have been the original goal.
> 
> However I am unable to locate a rationale for why that integration
> was turned down. Was it deemed to be too complex to execute, perhaps
> in the context of providing C extension compatibility? Was there a
> desire to see a similar implementation on Linux as well as Windows?
> Some other reason? Since I presume you were directly involved in the
> discussions, perhaps you have a link to the relevant thread handy?
> 
> The last update I see from you RE PyParallel on this list is:
> https://mail.python.org/pipermail/python-ideas/2015-September/035725.html


PyParallel was... ambitious to say the least.  When I started it,
I sort of *hand wavy* envisioned it would lead to something that
I could formally pitch to python-dev@.  But there was a lot of
blissful ignorance of the ensuing complexity in that initial
sentiment, though.

So, nothing was formally turned down by core developers, as I
never really ended up pitching something formal that could be
assessed for inclusion.  By the time I'd developed something
that was at least an alpha-level proof-of-concept, I had to
make 50+ pretty sizable implementation decisions that would
have warranted their own PEP if the work ever made it into
the mainline Python.

I definitely think a PyParallel-esque approach (where we play
it fast and loose with what's considered the GIL, how and when
reference counting is done, etc.) is the only viable *performant*
option we have for solving the problem -- i.e. I can't see how
a "remove the GIL, introduce fine grained locking, use interlocked
ops for ref counts"-type conventional approach will ever yield
acceptable performance.

But, yeah, I'm not optimistic we'll see a solution actually in
the mainline Python any time soon.  I logged about 2500 hours
of development time hacking PyParallel into it's initial alpha
proof-of-concept state.  It only worked on one operating system,
required intimate knowledge of Python innards (which I lacked at
the start), and exposed a very brittle socket-server oriented
interface to leverage the parallelism (there was no parallel
compute/free-threading type support provided, really).

I can't think of how we'll arrive at something production quality
without it being a multi-year, many-developer (full time, ideally
located in proximity to each other) project.  I think you'd really
need a BDFL Guido/Linus/Cutler-type lead driving the whole effort
too, as there will be a lot of tough, dividing decisions that need
to be made.

How would that be funded?!  It's almost a bit of a moon-shot type
project.  Definitely high-risk.  There's no precedent for the PSF
funding such projects, nor large corporate entities (i.e. Google,
Amazon, Microsoft).  What's the ROI for those companies to take on
so much cost and risk?  Perhaps if the end solution only ran on
their cloud infrastructure (Azure, AWS, GCS) -- maybe at least
initially.  That... that would be an interesting turn of events.

Maybe we just wait 20 years 'til a NumPy/SciPy/Z3-stack does some
cloud AI stuff to "solve" which parts of an existing program can
be executed in parallel without any user/developer assistance :-)

> David Foster | Seattle, WA, USA

Regards,

Trent.
--
https://trent.me
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-17 Thread Greg Ewing

MRAB wrote:
The shared object's refcount would be incremented and the sharing 
function would return a proxy to the shared object.


Refcounting in the thread/process would be done on the proxy.

When the proxy is closed or garbage-collected, the shared object's 
refcount would be decremented.


What about other objects accessed through the shared object?
They would need to get wrapped in proxies too.

Also, if the shared object is mutable, changes to it would
need to be protected by a lock of some kind.

Maybe all this could be taken care of by the proxy objects,
but it seems like it would be quite tricky to get right.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-17 Thread Barry



> On 17 Jul 2018, at 20:35, Eric Snow  wrote:
> 
>> On Mon, Jul 16, 2018 at 11:08 AM Antoine Pitrou  wrote:
>> On Mon, 16 Jul 2018 18:00:37 +0100
>> MRAB  wrote:
>>> Could you explicitly share an object in a similar way to how you
>>> explicitly open a file?
>>> 
>>> The shared object's refcount would be incremented and the sharing
>>> function would return a proxy to the shared object.
>>> 
>>> Refcounting in the thread/process would be done on the proxy.
>>> 
>>> When the proxy is closed or garbage-collected, the shared object's
>>> refcount would be decremented.
>>> 
>>> The shared object could be garbage-collected when its refcount drops to
>>> zero.
>> 
>> Yes, I'm assuming that would be how shareable buffers could be
>> implemented: a per-interpreter proxy (with a regular Python refcount)
>> mediating access to a shared object (which could have an atomic /
>> thread-safe refcount).
> 
> Nice! That's exactly how I'm doing it. :)  The buffer protocol makes
> it easier, but the idea could apply to arbitrary objects generally.
> That's something I'll look into in a later phase of the project.
> 
> In both cases the tricky part is ensuring that the proxy does not
> directly mutate the object (especially the refcount).  In fact, the
> decref part above is the trickiest.  The trickiness is a consequence
> of our goals.  In my multi-core project we're aiming for not sharing
> the GIL between interpreters.  That means reaching and keeping proper
> separation between interpreters.  Notably, without a GIL shared by
> interpreters, refcount operations are not thread-safe.  Also, in the
> decref case GC would happen under the wrong interpreter (which is
> problematic for several reasons).
> 
> With this in mind, here's how I'm approaching the problem:
> 
> 1. interp A "shares" an object with interp B (e.g. through a channel)
>* the object is incref'ed under A before it is sent to B
> 2. the object is wrapped in a proxy owned by B
>* the proxy may not make C-API calls that would mutate the object
> or even cause an incref/decref
> 3. when the proxy is GC'd, the original object is decref'ed
>* the decref must happen in a thread in which A is running
> 
> In order to make all this work the missing piece is a mechanism by
> which the decref (#3) happens under the original interpreter.  At the
> moment Emily Morehouse and I are pursuing an approach that extends the
> existing ceval "pending call" machinery currently used for handling
> signals (see Py_AddPendingCall).  The new [*private*] API would work
> the same way but on a per-interpreter basis rather than just the main
> interpreter.  This would allow one interpreter to queue up a decref to
> happen later under another interpreter.

The decrement itself is not the problem, that can be made thread safe.

Do you mean that once the ref reaches 0 you have to make the delete happen on 
the original interpreter?

Barry

> 
> FWIW, this ability to decref an object under a different interpreter
> is a blocker right now for a number of things, including supporting
> buffers in PEP 554 channels.
> 
> -eric
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-17 Thread Eric Snow
On Tue, Jul 17, 2018 at 1:44 PM Barry  wrote:
> The decrement itself is not the problem, that can be made thread safe.

Yeah, by using the GIL.   Otherwise, please elaborate.  My
understanding is that if the decrement itself were not the problem
then we'd have gotten rid of the GIL already.

> Do you mean that once the ref reaches 0 you have to make the delete happen on 
> the original interpreter?

Yep.  For one thing, GC can trigger __del__, which can do anything,
including modifying other objects from the original interpreter (incl.
decref'ing them).  __del__ should be run under the original
interpreter.  For another thing, during GC containers often decref
their items.  Also, separating the GIL between interpreters may mean
we'll need an allocator per interpreter.  In that case the
deallocation must happen relative to the interpreter where the object
was allocated.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-17 Thread Eric Snow
On Mon, Jul 16, 2018 at 11:08 AM Antoine Pitrou  wrote:
> On Mon, 16 Jul 2018 18:00:37 +0100
> MRAB  wrote:
> > Could you explicitly share an object in a similar way to how you
> > explicitly open a file?
> >
> > The shared object's refcount would be incremented and the sharing
> > function would return a proxy to the shared object.
> >
> > Refcounting in the thread/process would be done on the proxy.
> >
> > When the proxy is closed or garbage-collected, the shared object's
> > refcount would be decremented.
> >
> > The shared object could be garbage-collected when its refcount drops to
> > zero.
>
> Yes, I'm assuming that would be how shareable buffers could be
> implemented: a per-interpreter proxy (with a regular Python refcount)
> mediating access to a shared object (which could have an atomic /
> thread-safe refcount).

Nice! That's exactly how I'm doing it. :)  The buffer protocol makes
it easier, but the idea could apply to arbitrary objects generally.
That's something I'll look into in a later phase of the project.

In both cases the tricky part is ensuring that the proxy does not
directly mutate the object (especially the refcount).  In fact, the
decref part above is the trickiest.  The trickiness is a consequence
of our goals.  In my multi-core project we're aiming for not sharing
the GIL between interpreters.  That means reaching and keeping proper
separation between interpreters.  Notably, without a GIL shared by
interpreters, refcount operations are not thread-safe.  Also, in the
decref case GC would happen under the wrong interpreter (which is
problematic for several reasons).

With this in mind, here's how I'm approaching the problem:

1. interp A "shares" an object with interp B (e.g. through a channel)
* the object is incref'ed under A before it is sent to B
2. the object is wrapped in a proxy owned by B
* the proxy may not make C-API calls that would mutate the object
or even cause an incref/decref
3. when the proxy is GC'd, the original object is decref'ed
* the decref must happen in a thread in which A is running

In order to make all this work the missing piece is a mechanism by
which the decref (#3) happens under the original interpreter.  At the
moment Emily Morehouse and I are pursuing an approach that extends the
existing ceval "pending call" machinery currently used for handling
signals (see Py_AddPendingCall).  The new [*private*] API would work
the same way but on a per-interpreter basis rather than just the main
interpreter.  This would allow one interpreter to queue up a decref to
happen later under another interpreter.

FWIW, this ability to decref an object under a different interpreter
is a blocker right now for a number of things, including supporting
buffers in PEP 554 channels.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-16 Thread Antoine Pitrou
On Mon, 16 Jul 2018 18:00:37 +0100
MRAB  wrote:
> Could you explicitly share an object in a similar way to how you 
> explicitly open a file?
> 
> The shared object's refcount would be incremented and the sharing 
> function would return a proxy to the shared object.
> 
> Refcounting in the thread/process would be done on the proxy.
> 
> When the proxy is closed or garbage-collected, the shared object's 
> refcount would be decremented.
> 
> The shared object could be garbage-collected when its refcount drops to 
> zero.

Yes, I'm assuming that would be how shareable buffers could be
implemented: a per-interpreter proxy (with a regular Python refcount)
mediating access to a shared object (which could have an atomic /
thread-safe refcount).

As for how shareable buffers could be useful, see my work on PEP 574:
https://www.python.org/dev/peps/pep-0574/

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-16 Thread MRAB

On 2018-07-16 05:24, Chris Angelico wrote:

On Mon, Jul 16, 2018 at 1:21 PM, Nathaniel Smith  wrote:

On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico  wrote:

On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith  wrote:

On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:

* The Actor model can be used with some effort via the “multiprocessing”
module, but it doesn’t seem that streamlined and forces there to be a
separate OS process per line of execution, which is relatively expensive.


What do you mean by "the Actor model"? Just shared-nothing
concurrency? (My understanding is that in academia it means
shared-nothing + every thread/process/whatever gets an associated
queue + queues are globally addressable + queues have unbounded
buffering + every thread/process/whatever is implemented as a loop
that reads messages from its queue and responds to them, with no
internal concurrency. I don't know why this particular bundle of
features is considered special. Lots of people seem to use it in
looser sense though.)


Shared-nothing concurrency is, of course, the very easiest way to
parallelize. But let's suppose you're trying to create an online
multiplayer game. Since it's a popular genre at the moment, I'll go
for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
people enter; one leaves. The game has to let those hundred people
interact, which means that all hundred people have to be connected to
the same server. And you have to process everyone's movements,
gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
server "tick" enough times per second - I would say 32 ticks per
second is an absolute minimum, 64 is definitely better. So what
happens when the processing required takes more than one CPU core for
1/32 seconds? A shared-nothing model is either fundamentally
impossible, or a meaningless abstraction (if you interpret it to mean
"explicit queues/pipes for everything"). What would the "Actor" model
do here?


"Shared-nothing" is a bit of jargon that means there's no *implicit*
sharing; your threads can still communicate, the communication just
has to be explicit. I don't know exactly what algorithms your
hypothetical game needs, but they might be totally fine in a
shared-nothing approach. It's not just for embarrassingly parallel
problems.


Right, so basically it's the exact model that Python *already* has for
multiprocessing - once you go to separate processes, nothing is
implicitly shared, and everything has to be done with queues.


Ideally, I would like to be able to write my code as a set of
functions, then easily spin them off as separate threads, and have
them able to magically run across separate CPUs. Unicorns not being a
thing, I'm okay with warping my code a bit around the need for
parallelism, but I'm not sure how best to do that. Assume here that we
can't cheat by getting most of the processing work done with the GIL
released (eg in Numpy), and it actually does require Python-level
parallelism of CPU-heavy work.


If you need shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python, then yeah, you
basically need a free-threaded implementation of Python. Jython is
such an implementation. PyPy could be if anyone were interested in
funding it [1], but apparently no-one is. Probably removing the GIL
from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
don't have anything better to report.


(This was a purely hypothetical example.)

There could be some interesting results from using the GIL only for
truly global objects, and then having other objects guarded by arena
locks. The trouble is that, in CPython, as soon as you reference any
read-only object from the globals, you need to raise its refcount.
ISTR someone mentioned something along the lines of
sys.eternalize(obj) to flag something as "never GC this thing, it no
longer has a refcount", which would then allow global objects to be
referenced in a truly read-only way (eg to call a function). Sadly,
I'm not expert enough to actually look into implementing it, but it
does seem like a very cool concept. It also fits into the "warping my
code a bit" category (eg eternalizing a small handful of key objects,
and paying the price of "well, now they can never be garbage
collected"), with the potential to then parallelize more easily.

Could you explicitly share an object in a similar way to how you 
explicitly open a file?


The shared object's refcount would be incremented and the sharing 
function would return a proxy to the shared object.


Refcounting in the thread/process would be done on the proxy.

When the proxy is closed or garbage-collected, the shared object's 
refcount would be decremented.


The shared object could be garbage-collected when its refcount drops to 
zero.



The good news is that there are many, many situations where you don't
actually need "shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is 

Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-16 Thread Antoine Pitrou
On Mon, 16 Jul 2018 07:00:34 +0200
Stephan Houben 
wrote:
> What about the following model: you have N Python interpreters, each with
> their own GIL. Each *Python* object belongs to precisely one interpreter.

This is roughly what Eric's subinterpreters approach tries to do.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-16 Thread Antoine Pitrou
On Sun, 15 Jul 2018 20:21:56 -0700
Nathaniel Smith  wrote:
> 
> If you need shared-memory threads, on multiple cores, for CPU-bound
> logic, where the logic is implemented in Python, then yeah, you
> basically need a free-threaded implementation of Python. Jython is
> such an implementation. PyPy could be if anyone were interested in
> funding it [1], but apparently no-one is. Probably removing the GIL
> from CPython is impossible. (I'd be happy to be proven wrong.)

It's not that it's impossible, it's that everyone trying to remove it
ended up with a 30-40% slowdown in a single-threaded mode (*).  Perhaps
Larry manages to do better, though ;-)

(*) a figure which I assume is highly workload-dependent

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-16 Thread Sebastian Krause
Nick Coghlan  wrote:
> It was never extended beyond Windows, and a Windows-only solution
> doesn't meet the needs of a lot of folks interested in more efficient
> exploitation of multiple local CPU cores.

On the other hand Windows has a higher need for a better multi-core
story. A reasonable Unix-only solution already exists with fork() if
you don't need a lot of shared memory, multi-processing on Windows
is just not on the same level.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Chris Angelico
On Mon, Jul 16, 2018 at 3:00 PM, Stephan Houben  wrote:
> What about the following model: you have N Python interpreters, each with
> their own GIL. Each *Python* object belongs to precisely one interpreter.
>
> However, the interpreters share some common data storage: perhaps a shared
> Numpy array, or a shared Sqlite in-memory db. Or some key-value store where
> the key and values are binary data. The interpreters communicate through
> that.

Interesting. The actual concrete idea that I had in mind was an image
comparison job, downloading umpteen billion separate images and trying
to find the one most similar to a template. Due to lack of easy
parallelism I was unable to then compare the top thousand against each
other quadratically, but it would be interesting to see if I could
have done something like that to share image comparison information.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Stephan Houben
What about the following model: you have N Python interpreters, each with
their own GIL. Each *Python* object belongs to precisely one interpreter.

However, the interpreters share some common data storage: perhaps a shared
Numpy array, or a shared Sqlite in-memory db. Or some key-value store where
the key and values are binary data. The interpreters communicate through
that.

Stephan

Op ma 16 jul. 2018 06:25 schreef Chris Angelico :

> On Mon, Jul 16, 2018 at 1:21 PM, Nathaniel Smith  wrote:
> > On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico 
> wrote:
> >> On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith 
> wrote:
> >>> On Sun, Jul 8, 2018 at 11:27 AM, David Foster 
> wrote:
>  * The Actor model can be used with some effort via the
> “multiprocessing”
>  module, but it doesn’t seem that streamlined and forces there to be a
>  separate OS process per line of execution, which is relatively
> expensive.
> >>>
> >>> What do you mean by "the Actor model"? Just shared-nothing
> >>> concurrency? (My understanding is that in academia it means
> >>> shared-nothing + every thread/process/whatever gets an associated
> >>> queue + queues are globally addressable + queues have unbounded
> >>> buffering + every thread/process/whatever is implemented as a loop
> >>> that reads messages from its queue and responds to them, with no
> >>> internal concurrency. I don't know why this particular bundle of
> >>> features is considered special. Lots of people seem to use it in
> >>> looser sense though.)
> >>
> >> Shared-nothing concurrency is, of course, the very easiest way to
> >> parallelize. But let's suppose you're trying to create an online
> >> multiplayer game. Since it's a popular genre at the moment, I'll go
> >> for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
> >> people enter; one leaves. The game has to let those hundred people
> >> interact, which means that all hundred people have to be connected to
> >> the same server. And you have to process everyone's movements,
> >> gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
> >> server "tick" enough times per second - I would say 32 ticks per
> >> second is an absolute minimum, 64 is definitely better. So what
> >> happens when the processing required takes more than one CPU core for
> >> 1/32 seconds? A shared-nothing model is either fundamentally
> >> impossible, or a meaningless abstraction (if you interpret it to mean
> >> "explicit queues/pipes for everything"). What would the "Actor" model
> >> do here?
> >
> > "Shared-nothing" is a bit of jargon that means there's no *implicit*
> > sharing; your threads can still communicate, the communication just
> > has to be explicit. I don't know exactly what algorithms your
> > hypothetical game needs, but they might be totally fine in a
> > shared-nothing approach. It's not just for embarrassingly parallel
> > problems.
>
> Right, so basically it's the exact model that Python *already* has for
> multiprocessing - once you go to separate processes, nothing is
> implicitly shared, and everything has to be done with queues.
>
> >> Ideally, I would like to be able to write my code as a set of
> >> functions, then easily spin them off as separate threads, and have
> >> them able to magically run across separate CPUs. Unicorns not being a
> >> thing, I'm okay with warping my code a bit around the need for
> >> parallelism, but I'm not sure how best to do that. Assume here that we
> >> can't cheat by getting most of the processing work done with the GIL
> >> released (eg in Numpy), and it actually does require Python-level
> >> parallelism of CPU-heavy work.
> >
> > If you need shared-memory threads, on multiple cores, for CPU-bound
> > logic, where the logic is implemented in Python, then yeah, you
> > basically need a free-threaded implementation of Python. Jython is
> > such an implementation. PyPy could be if anyone were interested in
> > funding it [1], but apparently no-one is. Probably removing the GIL
> > from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
> > don't have anything better to report.
>
> (This was a purely hypothetical example.)
>
> There could be some interesting results from using the GIL only for
> truly global objects, and then having other objects guarded by arena
> locks. The trouble is that, in CPython, as soon as you reference any
> read-only object from the globals, you need to raise its refcount.
> ISTR someone mentioned something along the lines of
> sys.eternalize(obj) to flag something as "never GC this thing, it no
> longer has a refcount", which would then allow global objects to be
> referenced in a truly read-only way (eg to call a function). Sadly,
> I'm not expert enough to actually look into implementing it, but it
> does seem like a very cool concept. It also fits into the "warping my
> code a bit" category (eg eternalizing a small handful of key objects,
> and paying the price of "well, now they can never 

Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Chris Angelico
On Mon, Jul 16, 2018 at 1:21 PM, Nathaniel Smith  wrote:
> On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico  wrote:
>> On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith  wrote:
>>> On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
 * The Actor model can be used with some effort via the “multiprocessing”
 module, but it doesn’t seem that streamlined and forces there to be a
 separate OS process per line of execution, which is relatively expensive.
>>>
>>> What do you mean by "the Actor model"? Just shared-nothing
>>> concurrency? (My understanding is that in academia it means
>>> shared-nothing + every thread/process/whatever gets an associated
>>> queue + queues are globally addressable + queues have unbounded
>>> buffering + every thread/process/whatever is implemented as a loop
>>> that reads messages from its queue and responds to them, with no
>>> internal concurrency. I don't know why this particular bundle of
>>> features is considered special. Lots of people seem to use it in
>>> looser sense though.)
>>
>> Shared-nothing concurrency is, of course, the very easiest way to
>> parallelize. But let's suppose you're trying to create an online
>> multiplayer game. Since it's a popular genre at the moment, I'll go
>> for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
>> people enter; one leaves. The game has to let those hundred people
>> interact, which means that all hundred people have to be connected to
>> the same server. And you have to process everyone's movements,
>> gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
>> server "tick" enough times per second - I would say 32 ticks per
>> second is an absolute minimum, 64 is definitely better. So what
>> happens when the processing required takes more than one CPU core for
>> 1/32 seconds? A shared-nothing model is either fundamentally
>> impossible, or a meaningless abstraction (if you interpret it to mean
>> "explicit queues/pipes for everything"). What would the "Actor" model
>> do here?
>
> "Shared-nothing" is a bit of jargon that means there's no *implicit*
> sharing; your threads can still communicate, the communication just
> has to be explicit. I don't know exactly what algorithms your
> hypothetical game needs, but they might be totally fine in a
> shared-nothing approach. It's not just for embarrassingly parallel
> problems.

Right, so basically it's the exact model that Python *already* has for
multiprocessing - once you go to separate processes, nothing is
implicitly shared, and everything has to be done with queues.

>> Ideally, I would like to be able to write my code as a set of
>> functions, then easily spin them off as separate threads, and have
>> them able to magically run across separate CPUs. Unicorns not being a
>> thing, I'm okay with warping my code a bit around the need for
>> parallelism, but I'm not sure how best to do that. Assume here that we
>> can't cheat by getting most of the processing work done with the GIL
>> released (eg in Numpy), and it actually does require Python-level
>> parallelism of CPU-heavy work.
>
> If you need shared-memory threads, on multiple cores, for CPU-bound
> logic, where the logic is implemented in Python, then yeah, you
> basically need a free-threaded implementation of Python. Jython is
> such an implementation. PyPy could be if anyone were interested in
> funding it [1], but apparently no-one is. Probably removing the GIL
> from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
> don't have anything better to report.

(This was a purely hypothetical example.)

There could be some interesting results from using the GIL only for
truly global objects, and then having other objects guarded by arena
locks. The trouble is that, in CPython, as soon as you reference any
read-only object from the globals, you need to raise its refcount.
ISTR someone mentioned something along the lines of
sys.eternalize(obj) to flag something as "never GC this thing, it no
longer has a refcount", which would then allow global objects to be
referenced in a truly read-only way (eg to call a function). Sadly,
I'm not expert enough to actually look into implementing it, but it
does seem like a very cool concept. It also fits into the "warping my
code a bit" category (eg eternalizing a small handful of key objects,
and paying the price of "well, now they can never be garbage
collected"), with the potential to then parallelize more easily.

> The good news is that there are many, many situations where you don't
> actually need "shared-memory threads, on multiple cores, for CPU-bound
> logic, where the logic is implemented in Python".

Oh absolutely. MOST of my parallelism requirements involve regular
Python threads, because they spend most of their time blocked on
something. That one is easy. The hassle comes when something MIGHT
need parallelism and might not, based on (say) how much data it has to
work with; for those kinds of programs, I would like 

Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Nathaniel Smith
On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico  wrote:
> On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith  wrote:
>> On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
>>> * The Actor model can be used with some effort via the “multiprocessing”
>>> module, but it doesn’t seem that streamlined and forces there to be a
>>> separate OS process per line of execution, which is relatively expensive.
>>
>> What do you mean by "the Actor model"? Just shared-nothing
>> concurrency? (My understanding is that in academia it means
>> shared-nothing + every thread/process/whatever gets an associated
>> queue + queues are globally addressable + queues have unbounded
>> buffering + every thread/process/whatever is implemented as a loop
>> that reads messages from its queue and responds to them, with no
>> internal concurrency. I don't know why this particular bundle of
>> features is considered special. Lots of people seem to use it in
>> looser sense though.)
>
> Shared-nothing concurrency is, of course, the very easiest way to
> parallelize. But let's suppose you're trying to create an online
> multiplayer game. Since it's a popular genre at the moment, I'll go
> for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
> people enter; one leaves. The game has to let those hundred people
> interact, which means that all hundred people have to be connected to
> the same server. And you have to process everyone's movements,
> gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
> server "tick" enough times per second - I would say 32 ticks per
> second is an absolute minimum, 64 is definitely better. So what
> happens when the processing required takes more than one CPU core for
> 1/32 seconds? A shared-nothing model is either fundamentally
> impossible, or a meaningless abstraction (if you interpret it to mean
> "explicit queues/pipes for everything"). What would the "Actor" model
> do here?

"Shared-nothing" is a bit of jargon that means there's no *implicit*
sharing; your threads can still communicate, the communication just
has to be explicit. I don't know exactly what algorithms your
hypothetical game needs, but they might be totally fine in a
shared-nothing approach. It's not just for embarrassingly parallel
problems.

> Ideally, I would like to be able to write my code as a set of
> functions, then easily spin them off as separate threads, and have
> them able to magically run across separate CPUs. Unicorns not being a
> thing, I'm okay with warping my code a bit around the need for
> parallelism, but I'm not sure how best to do that. Assume here that we
> can't cheat by getting most of the processing work done with the GIL
> released (eg in Numpy), and it actually does require Python-level
> parallelism of CPU-heavy work.

If you need shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python, then yeah, you
basically need a free-threaded implementation of Python. Jython is
such an implementation. PyPy could be if anyone were interested in
funding it [1], but apparently no-one is. Probably removing the GIL
from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
don't have anything better to report.

The good news is that there are many, many situations where you don't
actually need "shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python". If you're in that
specific niche and don't have $100k to throw at PyPy, then I dunno, I
hear Rust is good at that sort of thing? It's frustrating for sure,
but there will always be niches where Python isn't the best choice.

-n

[1] 
https://morepypy.blogspot.com/2017/08/lets-remove-global-interpreter-lock.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Chris Angelico
On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith  wrote:
> On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
>> * The Actor model can be used with some effort via the “multiprocessing”
>> module, but it doesn’t seem that streamlined and forces there to be a
>> separate OS process per line of execution, which is relatively expensive.
>
> What do you mean by "the Actor model"? Just shared-nothing
> concurrency? (My understanding is that in academia it means
> shared-nothing + every thread/process/whatever gets an associated
> queue + queues are globally addressable + queues have unbounded
> buffering + every thread/process/whatever is implemented as a loop
> that reads messages from its queue and responds to them, with no
> internal concurrency. I don't know why this particular bundle of
> features is considered special. Lots of people seem to use it in
> looser sense though.)

Shared-nothing concurrency is, of course, the very easiest way to
parallelize. But let's suppose you're trying to create an online
multiplayer game. Since it's a popular genre at the moment, I'll go
for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
people enter; one leaves. The game has to let those hundred people
interact, which means that all hundred people have to be connected to
the same server. And you have to process everyone's movements,
gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
server "tick" enough times per second - I would say 32 ticks per
second is an absolute minimum, 64 is definitely better. So what
happens when the processing required takes more than one CPU core for
1/32 seconds? A shared-nothing model is either fundamentally
impossible, or a meaningless abstraction (if you interpret it to mean
"explicit queues/pipes for everything"). What would the "Actor" model
do here?

Ideally, I would like to be able to write my code as a set of
functions, then easily spin them off as separate threads, and have
them able to magically run across separate CPUs. Unicorns not being a
thing, I'm okay with warping my code a bit around the need for
parallelism, but I'm not sure how best to do that. Assume here that we
can't cheat by getting most of the processing work done with the GIL
released (eg in Numpy), and it actually does require Python-level
parallelism of CPU-heavy work.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Nathaniel Smith
On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
> * The Actor model can be used with some effort via the “multiprocessing”
> module, but it doesn’t seem that streamlined and forces there to be a
> separate OS process per line of execution, which is relatively expensive.

What do you mean by "the Actor model"? Just shared-nothing
concurrency? (My understanding is that in academia it means
shared-nothing + every thread/process/whatever gets an associated
queue + queues are globally addressable + queues have unbounded
buffering + every thread/process/whatever is implemented as a loop
that reads messages from its queue and responds to them, with no
internal concurrency. I don't know why this particular bundle of
features is considered special. Lots of people seem to use it in
looser sense though.)

> I'd like to solicit some feedback on what might be the most efficient way to
> make forward progress on efficient parallelization in Python inside the same
> OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of PEP
> 554 enumerates areas that are currently shared, some of which probably
> shouldn't be.
>
> 2. Give up on making things work inside the same OS process and rather focus
> on implementing better abstractions on top of the existing multiprocessing
> API so that the actor model is easier to program against. For example,
> providing some notion of Channels to communicate between lines of execution,
> a way to monitor the number of Messages waiting in each channel for
> throughput profiling and diagnostics, Supervision, etc. In particular I
> could do this by using an existing library like Pykka or Thespian and
> extending it where necessary.

I guess I would distinguish though between "multiple processes" and
"the multiprocessing module". The module might be at the point in its
lifecycle where starting over is at least worth considering, and one
thing I'm hoping to do with Trio is experiment with making worker
process patterns easier to work with.

But the nice thing about these two options is that subinterpreters are
basically a way to emulate multiple Python processes within a single
OS process, which means they're largely interchangeable. There are
trade-offs in terms of compatibility, how much work needs to be done,
probably speed, but if you come up with a great API based around one
model then you should be able to switch out the backend later without
affecting users. So if you want to start experimenting now, I'd use
multiple processes and plan to switch to subinterpreters later if it
turns out to make sense.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Leonardo Santagada
No one talked about this, but modern cpus with multiple numa nodes are
atrocious to any shared memory (maybe threadripper is better, but
multiple socket xeon is slow) and more and more all cpus will move to
it on single chips, so a share nothing aproach can really make python
a good contender on modern hardware.

On Sun, Jul 15, 2018 at 6:20 AM, Nick Coghlan  wrote:
> On 11 July 2018 at 00:31, David Foster  wrote:
>> I was not aware of PyParallel. The PyParellel "parallel thread"
>> line-of-execution implementation is pretty interesting. Trent, big kudos to
>> you on that effort.
>>
>> Since you're speaking in the past tense and said "but we're not doing it
>> like that", I infer that the notion of a parallel thread was turned down for
>> integration into CPython, as that appears to have been the original goal.
>>
>> However I am unable to locate a rationale for why that integration was
>> turned down. Was it deemed to be too complex to execute, perhaps in the
>> context of providing C extension compatibility? Was there a desire to see a
>> similar implementation on Linux as well as Windows? Some other reason?
>
> It was never extended beyond Windows, and a Windows-only solution
> doesn't meet the needs of a lot of folks interested in more efficient
> exploitation of multiple local CPU cores.
>
> It's still an interesting design concept though, especially for
> problems that can be deconstructed into a setup phase (read/write main
> thread), and a parallel operation phase (ephemeral worker threads that
> store all persistent state in memory mapped files, or otherwise
> outside the current process).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 

Leonardo Santagada
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-14 Thread Nick Coghlan
On 11 July 2018 at 00:31, David Foster  wrote:
> I was not aware of PyParallel. The PyParellel "parallel thread"
> line-of-execution implementation is pretty interesting. Trent, big kudos to
> you on that effort.
>
> Since you're speaking in the past tense and said "but we're not doing it
> like that", I infer that the notion of a parallel thread was turned down for
> integration into CPython, as that appears to have been the original goal.
>
> However I am unable to locate a rationale for why that integration was
> turned down. Was it deemed to be too complex to execute, perhaps in the
> context of providing C extension compatibility? Was there a desire to see a
> similar implementation on Linux as well as Windows? Some other reason?

It was never extended beyond Windows, and a Windows-only solution
doesn't meet the needs of a lot of folks interested in more efficient
exploitation of multiple local CPU cores.

It's still an interesting design concept though, especially for
problems that can be deconstructed into a setup phase (read/write main
thread), and a parallel operation phase (ephemeral worker threads that
store all persistent state in memory mapped files, or otherwise
outside the current process).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-14 Thread Terry Reedy

On 7/14/2018 5:40 AM, Antoine Pitrou wrote:

On Fri, 13 Jul 2018 18:22:24 -0600
Eric Snow 
wrote:

2. Give up on making things work inside the same OS process and rather
focus on implementing better abstractions on top of the existing
multiprocessing API so that the actor model is easier to program
against. For example, providing some notion of Channels to communicate
between lines of execution, a way to monitor the number of Messages
waiting in each channel for throughput profiling and diagnostics,
Supervision, etc. In particular I could do this by using an existing
library like Pykka or Thespian and extending it where necessary.


It may worth a shot.  You should ask Davin Potts (CC'ed) about this.
We discussed this a little at PyCon.  I'm sure he'd welcome help in
improving the multiprocessing module.


Davin has been mostly inactive.  I'm the de facto maintainer for
multiprocessing.


It's good to know that there is an active coredev who can be added as 
nosy on multiprocessing issues.  The multiprocessing line in the 
Expert's Index,

https://devguide.python.org/experts/
has Davin *ed (assign issues to him) and you not (nosy only).  Perhaps 
Davin should be un-starred.


Some time ago, on pydev list, you suggested that the solution to IDLE's 
problems with subprocess and sockets might be to use multiprocessing and 
pipes.  I noticed then that there were numerous bug report and little 
activity and wondered then how usable multiprocessing was in practice.


Checking again, there are 52 open behavior and 6 open crash issues with 
'multiprocessing' in the title.  The must severe one for IDLE that I 
noticed is #33111: importing tkinter and running multiprocessing on 
MacOS does not seem to work.


This week I (re)read the main multiprocessing doc chapter.  The main 
issue I saw is 'Beware of replacing sys.stdin with a “file like 
object”'.  I don't *think* that this is a showstopper, but a minimal 
failing example would help to be sure.


--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-14 Thread Eric Snow
On Sat, Jul 14, 2018 at 3:41 AM Antoine Pitrou  wrote:
> Davin has been mostly inactive.  I'm the de facto maintainer for
> multiprocessing.

Ah, that's great to know.  Sorry about the confusion.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-14 Thread Antoine Pitrou
On Fri, 13 Jul 2018 18:22:24 -0600
Eric Snow 
wrote:
> > 2. Give up on making things work inside the same OS process and rather
> > focus on implementing better abstractions on top of the existing
> > multiprocessing API so that the actor model is easier to program
> > against. For example, providing some notion of Channels to communicate
> > between lines of execution, a way to monitor the number of Messages
> > waiting in each channel for throughput profiling and diagnostics,
> > Supervision, etc. In particular I could do this by using an existing
> > library like Pykka or Thespian and extending it where necessary.  
> 
> It may worth a shot.  You should ask Davin Potts (CC'ed) about this.
> We discussed this a little at PyCon.  I'm sure he'd welcome help in
> improving the multiprocessing module.

Davin has been mostly inactive.  I'm the de facto maintainer for
multiprocessing.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-13 Thread Eric Snow
On Tue, Jul 10, 2018 at 8:32 AM David Foster  wrote:
>
> I was not aware of PyParallel. The PyParellel "parallel thread"
> line-of-execution implementation is pretty interesting. Trent, big kudos
> to you on that effort.

+1  It's a neat project.  Trent's pretty smart. :)

> Since you're speaking in the past tense and said "but we're not doing it
> like that", I infer that the notion of a parallel thread was turned down
> for integration into CPython, as that appears to have been the original
> goal.
>
> However I am unable to locate a rationale for why that integration was
> turned down. Was it deemed to be too complex to execute, perhaps in the
> context of providing C extension compatibility? Was there a desire to
> see a similar implementation on Linux as well as Windows? Some other
> reason?

Trent can correct me if I'm wrong, but I believe it boiled down to
challenges with the POSIX implementation (that email thread implies
this as well), likely coupled with limited time for Trent to work on
it.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-13 Thread Eric Snow
On Sun, Jul 8, 2018 at 12:30 PM David Foster  wrote:
> In the past I have personally viewed Python as difficult to use for
> parallel applications, which need to do multiple things simultaneously
> for increased performance:
>
> * The old Threads, Locks, & Shared State model is inefficient in Python
> due to the GIL, which limits CPU usage to only one thread at a time
> (ignoring certain functions implemented in C, such as I/O).
>
> * The Actor model can be used with some effort via the “multiprocessing”
> module, but it doesn’t seem that streamlined and forces there to be a
> separate OS process per line of execution, which is relatively expensive.

Yep, Python's multi-core story is a bit rough (Jython/IronPython
aside).  It's especially hard for folks used to
concurrency/parallelism in other languages.  I'm hopeful that we can
improve the situation.

> I was thinking it would be nice if there was a better way to implement
> the Actor model, with multiple lines of execution in the same process,

FWIW, at this point I'm a big fan of this concurrency model.  I find
it hurts my brain least. :)

> yet avoiding contention from the GIL. This implies a separate GIL for
> each line of execution (to eliminate contention) and a controlled way to
> exchange data between different lines of execution.
>
> So I was thinking of proposing a design for implementing such a system.
> Or at least get interested parties thinking about such a system.
>
> With some additional research I notice that [PEP 554] (“Multiple
> subinterpeters in the stdlib”) appears to be putting forward a design
> similar to the one I described. I notice however it mentions that
> subinterpreters currently share the GIL, which would seem to make them
> unusable for parallel scenarios due to GIL contention.

I'm glad you found PEP 554.  I wanted to keep the PEP focused on
exposing the existing subinterpreter support (and the basic,
CSP-inspired concurrency model), which is why it doesn't go into much
detail about changes to the CPython runtime that will allow GIL-free
multi-core parallelism.   As Nick mentioned, my talk at the language
summit covers my plans.

Improving Python's multi-core story has been the major focus of my
(sadly relatively small) contributions to CPython for several years
now.  I've made slow progress due to limited time, but things are
picking up, especially since I got a job in December at Microsoft that
allows me to work on CPython for part of each week.  On top of that,
several other people are directly helping now (including Emily
Morehouse) and I got a lot of positive feedback for the project at
PyCon this year.

> I'd like to solicit some feedback on what might be the most efficient
> way to make forward progress on efficient parallelization in Python
> inside the same OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of
> PEP 554 enumerates areas that are currently shared, some of which
> probably shouldn't be.

Right, this is the approach I'm driving.  At this point I have the
project broken down pretty well into manageable chunks.  You're
welcome to join in. :)  Regardless,  I'd be glad to discuss it with
you in more depth if you're interested.

> 2. Give up on making things work inside the same OS process and rather
> focus on implementing better abstractions on top of the existing
> multiprocessing API so that the actor model is easier to program
> against. For example, providing some notion of Channels to communicate
> between lines of execution, a way to monitor the number of Messages
> waiting in each channel for throughput profiling and diagnostics,
> Supervision, etc. In particular I could do this by using an existing
> library like Pykka or Thespian and extending it where necessary.

It may worth a shot.  You should ask Davin Potts (CC'ed) about this.
We discussed this a little at PyCon.  I'm sure he'd welcome help in
improving the multiprocessing module.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-10 Thread Terry Reedy

On 7/10/2018 10:31 AM, David Foster wrote:

Since you're speaking in the past tense and said "but we're not doing it 
like that", I infer that the notion of a parallel thread was turned down 
for integration into CPython, as that appears to have been the original 
goal.


A far as I remember, there was never a formal proposal (PEP).  And I 
just searched PEP 0 for 'parallel'.  Hence, no formal rejection, 
rationale, or thread.


However I am unable to locate a rationale for why that integration was 
turned down. Was it deemed to be too complex to execute, perhaps in the 
context of providing C extension compatibility? Was there a desire to 
see a similar implementation on Linux as well as Windows? Some other 
reason? Since I presume you were directly involved in the discussions, 
perhaps you have a link to the relevant thread handy?


As always, there may have been private, off-the-record, informal 
discussions.


--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-10 Thread David Foster
I was not aware of PyParallel. The PyParellel "parallel thread" 
line-of-execution implementation is pretty interesting. Trent, big kudos 
to you on that effort.


Since you're speaking in the past tense and said "but we're not doing it 
like that", I infer that the notion of a parallel thread was turned down 
for integration into CPython, as that appears to have been the original 
goal.


However I am unable to locate a rationale for why that integration was 
turned down. Was it deemed to be too complex to execute, perhaps in the 
context of providing C extension compatibility? Was there a desire to 
see a similar implementation on Linux as well as Windows? Some other 
reason? Since I presume you were directly involved in the discussions, 
perhaps you have a link to the relevant thread handy?


The last update I see from you RE PyParallel on this list is:
https://mail.python.org/pipermail/python-ideas/2015-September/035725.html

David Foster | Seattle, WA, USA

On 7/9/18 9:17 AM, Trent Nelson wrote:

On Sun, Jul 08, 2018 at 11:27:08AM -0700, David Foster wrote:


I'd like to solicit some feedback on what might be the most
efficient way to make forward progress on efficient parallelization
in Python inside the same OS process. The most promising areas
appear to be:


You might find PyParallel interesting, at least from a "here's what was
tried, it worked, but we're not doing it like that" perspective.

 http://pyparallel.org
 
https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores

I still think it was a pretty successful proof-of-concept regarding
removing the GIL without having to actually remove it.  Performance was
pretty good too, as you can see in those graphs.


--
David Foster | Seattle, WA, USA


Regards,

 Trent.

--
https://trent.me


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-09 Thread Trent Nelson
On Sun, Jul 08, 2018 at 11:27:08AM -0700, David Foster wrote:

> I'd like to solicit some feedback on what might be the most
> efficient way to make forward progress on efficient parallelization
> in Python inside the same OS process. The most promising areas
> appear to be:

You might find PyParallel interesting, at least from a "here's what was
tried, it worked, but we're not doing it like that" perspective.

http://pyparallel.org

https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores

I still think it was a pretty successful proof-of-concept regarding
removing the GIL without having to actually remove it.  Performance was
pretty good too, as you can see in those graphs.

> -- 
> David Foster | Seattle, WA, USA

Regards,

Trent.

--
https://trent.me
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-09 Thread Nick Coghlan
On 9 July 2018 at 04:27, David Foster  wrote:
> I'd like to solicit some feedback on what might be the most efficient way to
> make forward progress on efficient parallelization in Python inside the same
> OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of PEP
> 554 enumerates areas that are currently shared, some of which probably
> shouldn't be.
>
> 2. Give up on making things work inside the same OS process and rather focus
> on implementing better abstractions on top of the existing multiprocessing
> API so that the actor model is easier to program against. For example,
> providing some notion of Channels to communicate between lines of execution,
> a way to monitor the number of Messages waiting in each channel for
> throughput profiling and diagnostics, Supervision, etc. In particular I
> could do this by using an existing library like Pykka or Thespian and
> extending it where necessary.

Yep, that's basically the way Eric and I and a few others have been
thinking. Eric started off this year's language summit with a
presentation on the topic: https://lwn.net/Articles/754162/

The intent behind PEP 554 is to eventually get to a point where each
subinterpreter has its own dedicated eval loop lock, and the GIL
either disappears entirely (replaced by smaller purpose specific
locks) or becomes a read/write lock (where write access is only needed
to adjust certain state that is shared across subinterpreters).

On the multiprocessing front, it could be quite interesting to attempt
to adapt the channel API from PEP 554 to the
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.sharedctypes
data sharing capabilities in the modern multiprocessing module.

Also of relevance is Antoine Pitrou's work on a new version of the
pickle protocol that allows for out-of-band data sharing to avoid
redundant memory copies: https://www.python.org/dev/peps/pep-0574/

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/