Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-24 Thread Sturla Molden
Antoine Pitrou  wrote:

> When writing C code to interact with buffer-providing objects, you
> usually don't bother with memoryviews at all.  You just use a Py_buffer
> structure.

I was taking about "typed memoryviews" which is a Cython abstraction for a
Py_buffer struct. I was not taking about Python memoryviews, which is
something else. When writing Cython code that interacts with a buffer we
usually use typed memoryviews, not Py_buffer structs directly.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-24 Thread Antoine Pitrou
On Thu, 12 May 2016 23:14:36 + (UTC)
Sturla Molden  wrote:
> Antoine Pitrou  wrote:
> 
> > Can you define "expensive"?
> 
> Slow enough to cause complaints on the Cython mailing list.

What kind of complaints? Please be specific.

> > Buffer acquisition itself only calls a single C callback and uses a
> > stack-allocated C structure. It shouldn't be "expensive".
> 
> I don't know the reason, only that buffer acquisition from NumPy arrays
> with typed memoryviews

Again, what have memoryviews to do with it?  "Acquiring a buffer" is
just asking the buffer provider (the Numpy array) to fill a Py_buffer
structure's contents.  That has nothing to do with memoryviews.

When writing C code to interact with buffer-providing objects, you
usually don't bother with memoryviews at all.  You just use a Py_buffer
structure.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Sturla Molden
Matěj Týč  wrote:

> Does it mean
> that if you pass the numpy array to the child process using Queue, no
> significant amount of data will flow through it? 

This is what my shared memory arrayes do.

> Or I shouldn't pass it
> using Queue at all and just rely on inheritance? 

This is what David Baddeley's shared memory arrays do.

> Finally, I assume that
> passing it as an argument to the Process class is the worst option,
> because it will be pickled and unpickled.
 
My shared memory arrays only pickles the metadata, and can be used in this
way.

> Or maybe you refer to modules s.a. joblib that use this functionality
> and expose only a nice interface?

Joblib creates "share memory" by memory mapping a temporary file, which is
back by RAM on Libux (tempfs). It is backed by a physical file on disk on
Mac and Windows. In this resepect, joblib is much better on Linux than Mac
or Windows.


> And finally, cow means that returning large arrays still involves data
> moving between processes, whereas the shm approach has the workaround
> that you can preallocate the result array by the parent process, where
> the worker process can write to.

My shared memory arrays need no workaround dor this. They also allow shared
memory arrays to be returned to the parent process. No preallocation is
needed.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Matěj Týč
On 17.5.2016 14:13, Sturla Molden wrote:

> Matěj Týč  wrote:
>
>>  - Parallel processing of HUGE data, and
> This is mainly a Windows problem, as copy-on-write fork() will solve this
> on any other platform. ...
That sounds interesting, could you elaborate on it a bit? Does it mean
that if you pass the numpy array to the child process using Queue, no
significant amount of data will flow through it? Or I shouldn't pass it
using Queue at all and just rely on inheritance? Finally, I assume that
passing it as an argument to the Process class is the worst option,
because it will be pickled and unpickled.

Or maybe you refer to modules s.a. joblib that use this functionality
and expose only a nice interface?
And finally, cow means that returning large arrays still involves data
moving between processes, whereas the shm approach has the workaround
that you can preallocate the result array by the parent process, where
the worker process can write to.
> What this means is that shared memory is seldom useful for sharing huge
> data, even on Windows. It is only useful for this on Unix/Linux, where base
> addresses can stay they same. But on non-Windows platforms, the COW will in
> 99.99% of the cases be sufficient, thus make shared memory superfluous
> anyway. We don't need shared memory to scatter large data on Linux, only
> fork.
I am actually quite comfortable with sharing numpy arrays only. It is a
nice format for sharing large amounts of numbers, which is what I want
and what many modules accept as input (e.g. the "shapely" module).

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Sturla Molden
Matěj Týč  wrote:

>  - Parallel processing of HUGE data, and

This is mainly a Windows problem, as copy-on-write fork() will solve this
on any other platform. I am more in favor of asking  Microsoft to fix their
broken OS. 

Also observe that the usefulness of shared memory is very limited on
Windows, as we in practice never get the same base address in a spawned
process. This prevents sharing data structures with pointers and Python
objects. Anything more complex than an array cannot be shared.

What this means is that shared memory is seldom useful for sharing huge
data, even on Windows. It is only useful for this on Unix/Linux, where base
addresses can stay they same. But on non-Windows platforms, the COW will in
99.99% of the cases be sufficient, thus make shared memory superfluous
anyway. We don't need shared memory to scatter large data on Linux, only
fork.

As I see it. shared memory is mostly useful as a means to construct an
inter-process communication (IPC) protocol. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Matěj Týč
On 11.5.2016 10:29, Sturla Molden wrote:
> I did some work on this some years ago. ...
>
I am sorry, I have missed this discussion when it started.

There are two cases when I had feeling that I had to use this functionality:

 - Parallel processing of HUGE data, and

 - using parallel processing in an application that had plug-ins which
operated on one shared array (that was updated every one and then - it
was a producer-consumer pattern thing). As everything got set up, it
worked like a charm.

The thing I especially like about the proposed module is the lack of
external dependencies + it works if one knows how to use it.

The bad thing about it is its fragility - I admit that using it as it is
is not particularly intuitive. Unlike Sturla, I think that this is not a
dead end, but it indeed feels clumsy. However, I dislike the necessity
of writing Cython or C to get true multithreading for reasons I have
mentioned - what if you want to run high-level Python functions in parallel?

So, what I would really like to see is some kind of numpy documentation
on how to approach parallel computing with numpy arrays (depending on
what kind of task one wants to achieve). Maybe just using the queue is
good enough, or there are those 3-rd party modules with known
limitations? Plenty of people start off with numpy, so some kind of
overview should be part of numpy docs.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Sturla Molden
Feng Yu  wrote:

> Also, did you checkout http://zeromq.org/blog:zero-copy ?
> ZeroMQ is a dependency of Jupyter, so it is quite available.

ZeroMQ is great, but it lacks some crucial features. In particular it does
not support IPC on Windows. Ideally one should e.g. use Unix doman sockets
on Linux and named pipes on Windows. Most MPI implementations seems to
prefer shared memory over these mechanisms, though. Also I am not sure
about ZeroMQ and asynch i/o. I would e.g. like to use IOCP on Windows, GCD
on Mac, and a threadpool plus epoll on Linux. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Feng Yu
>
> Personally I prefer a parallel programming style with queues – either to
> scatter arrays to workers and collecting arrays from workers, or to chain
> workers together in a pipeline (without using coroutines). But exactly how
> you program is a matter of taste. I want to make it as inexpensive as
> possible to pass a NumPy array through a queue. If anyone else wants to
> help improve parallel programming with NumPy using a different paradigm,
> that is fine too. I just wanted to clarify why I stopped working on shared
> memory arrays.

Even I am not very obsessed with functional and queues, I still have
to agree with you
queues tend to produce more readable and less verbose code -- if there
is the right tool.

>
> (As for the implementation, I am also experimenting with platform dependent
> asynchronous I/O (IOCP, GCD or kqueue, epoll) to pass NumPy arrays though a
> queue as inexpensively and scalably as possible. And no, there is no public
> repo, as I like to experiment with my pet project undisturbed before I let
> it out in the wild.)

It will be wonderful if there is a way to pass numpy array around
without a huge dependency list.

After all, we know the address of the array and, in principle we are
able to find the physical pages and map them in the receiver side.

Also, did you checkout http://zeromq.org/blog:zero-copy ?
ZeroMQ is a dependency of Jupyter, so it is quite available.

- Yu

>

>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Sturla Molden
Niki Spahiev  wrote:

> Apparently next Win10 will have fork as part of bash integration.

It is Interix/SUA rebranded "Subsystem for Linux". It remains to be seen
how long it will stay this time. Also a Python built for this subsystem
will not run on the Win32 subsystem, so there is no graphics. Also it will
not be installed by default, just like SUA.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Feng Yu  wrote:
 
> In most (half?) situations the result can be directly write back via
> preallocated shared array before works are spawned. Then there is no
> need to pass data back with named segments.

You can work around it in various ways, this being one of them.

Personally I prefer a parallel programming style with queues – either to
scatter arrays to workers and collecting arrays from workers, or to chain
workers together in a pipeline (without using coroutines). But exactly how
you program is a matter of taste. I want to make it as inexpensive as
possible to pass a NumPy array through a queue. If anyone else wants to
help improve parallel programming with NumPy using a different paradigm,
that is fine too. I just wanted to clarify why I stopped working on shared
memory arrays.

(As for the implementation, I am also experimenting with platform dependent
asynchronous I/O (IOCP, GCD or kqueue, epoll) to pass NumPy arrays though a
queue as inexpensively and scalably as possible. And no, there is no public
repo, as I like to experiment with my pet project undisturbed before I let
it out in the wild.)


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Dave
Antoine Pitrou  pitrou.net> writes:

> 
> On Thu, 12 May 2016 06:27:43 + (UTC)
> Sturla Molden  gmail.com> wrote:
> 
> > Allan Haldane  gmail.com> wrote:
> > 
> > > You probably already know this, but I just wanted to note that the
> > > mpi4py module has worked around pickle too. They discuss how they
> > > efficiently transfer numpy arrays in mpi messages here:
> > > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-
python-objects-and-array-data
> > 
> > Unless I am mistaken, they use the PEP 3118 buffer interface to 
support
> > NumPy as well as a number of other Python objects. However, this 
protocol
> > makes buffer aquisition an expensive operation.
> 
> Can you define "expensive"?
> 
> > You can see this in Cython
> > if you use typed memory views. Assigning a NumPy array to a typed
> > memoryview (i,e, buffer acqisition) is slow.
> 
> You're assuming this is the cost of "buffer acquisition", while most
> likely it's the cost of creating the memoryview object itself.
> 
> Buffer acquisition itself only calls a single C callback and uses a
> stack-allocated C structure. It shouldn't be "expensive".
> 
> Regards
> 
> Antoine.
> 


When I looked at it, using a typed memoryview was between 7-50 times 
slower than using numpy directly:

http://thread.gmane.org/gmane.comp.python.cython.devel/14626


It looks like there was some improvement since then:

https://github.com/numpy/numpy/pull/3779


...and repeating my experiment shows the deficit is down to 3-11 times 
slower.


In [5]: x = randn(1)

In [6]: %timeit echo_memview(x)
The slowest run took 14.98 times longer than the fastest. This could 
mean that an intermediate result is being cached.
10 loops, best of 3: 5.31 µs per loop

In [7]: %timeit echo_memview_nocast(x)
The slowest run took 10.80 times longer than the fastest. This could 
mean that an intermediate result is being cached.
100 loops, best of 3: 1.58 µs per loop

In [8]: %timeit echo_numpy(x)
The slowest run took 58.81 times longer than the fastest. This could 
mean that an intermediate result is being cached.
100 loops, best of 3: 474 ns per loop



-Dave
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Niki Spahiev  wrote:

> Apparently next Win10 will have fork as part of bash integration.

That would be great. The lack of fork on Windows is very annoying.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Antoine Pitrou  wrote:

> Can you define "expensive"?

Slow enough to cause complaints on the Cython mailing list.

> You're assuming this is the cost of "buffer acquisition", while most
> likely it's the cost of creating the memoryview object itself.

Constructing a typed memoryview from a typed memoryview or a slice is fast.
Numerical code doing this intensively is still within 80-90% of the speed
of plain c code using pointer arithmetics.

 
> Buffer acquisition itself only calls a single C callback and uses a
> stack-allocated C structure. It shouldn't be "expensive".

I don't know the reason, only that buffer acquisition from NumPy arrays
with typed memoryviews is very expensive compared to assigning a typed
memoryview to another or slicing a typed memoryview.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Feng Yu
> Again, not everyone uses Unix.
>
> And on Unix it is not trival to pass data back from the child process. I
> solved that problem with Sys V IPC (pickling the name of the segment).
>

I wonder if it is neccessary insist being able to pass large amount of data
back from child to the parent process.

In most (half?) situations the result can be directly write back via
preallocated shared array before works are spawned. Then there is no
need to pass data back with named segments.

Here I am just doodling some possible use cases along the OpenMP line.
The sample would just copy the data from s to r, in two different
ways. On systems that does not support multiprocess + fork, the
semantics is still well preserved if threading is used.

```
import .. as mp

# the access attribute of inherited variables is at least 'privatecopy'
# but with threading backend it becomes 'shared'
s = numpy.arange(1)

with mp.parallel(num_threads=8) as section:
r = section.empty(1) # variables defined via section.empty
will always be 'shared'
def work():
 # variables defined in the body is 'private'
 tid = section.get_thread_num()
 size = section.get_num_threads()
 sl = slice(tid * r.size // size, (tid + 1) * r.size // size)
 r[sl] = s[sl]

status = section.run(work)
assert not any(status.errors)

# the support to the following could be implemented with section.run

chunksize = 1000
def work(i):
  sl = slice(i, i + chunksize)
  r[sl] = s[sl]
  return s[sl].sum()
status = section.loop(work, range(0, r.size, chunksize), schedule='static')
assert not any(status.errors)
total = sum(status.results)
```

>> 6. If we are to define a set of operations I would recommend take a
>> look at OpenMP as a reference -- It has been out there for decades and
>> used widely. An equiavlant to the 'omp parallel for' construct in
>> Python will be a very good starting point and immediately useful.
>
> If you are on Unix, you can just use a context manager. Call os.fork in
> __enter__ and os.waitpid in __exit__.
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Antoine Pitrou
On Thu, 12 May 2016 06:27:43 + (UTC)
Sturla Molden  wrote:

> Allan Haldane  wrote:
> 
> > You probably already know this, but I just wanted to note that the
> > mpi4py module has worked around pickle too. They discuss how they
> > efficiently transfer numpy arrays in mpi messages here:
> > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data
> 
> Unless I am mistaken, they use the PEP 3118 buffer interface to support
> NumPy as well as a number of other Python objects. However, this protocol
> makes buffer aquisition an expensive operation.

Can you define "expensive"?

> You can see this in Cython
> if you use typed memory views. Assigning a NumPy array to a typed
> memoryview (i,e, buffer acqisition) is slow.

You're assuming this is the cost of "buffer acquisition", while most
likely it's the cost of creating the memoryview object itself.

Buffer acquisition itself only calls a single C callback and uses a
stack-allocated C structure. It shouldn't be "expensive".

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Niki Spahiev

On 12.05.2016 02:02, Sturla Molden wrote:

Feng Yu  wrote:


1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'.


Not available on Windows. On Unix it only allows one-way communication,
from parent to child.


Apparently next Win10 will have fork as part of bash integration.

Niki

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Allan Haldane  wrote:

> You probably already know this, but I just wanted to note that the
> mpi4py module has worked around pickle too. They discuss how they
> efficiently transfer numpy arrays in mpi messages here:
> http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data

Unless I am mistaken, they use the PEP 3118 buffer interface to support
NumPy as well as a number of other Python objects. However, this protocol
makes buffer aquisition an expensive operation. You can see this in Cython
if you use typed memory views. Assigning a NumPy array to a typed
memoryview (i,e, buffer acqisition) is slow. They are correct that avoiding
pickle means we save some memory. It also avoids creating and destroying
temporary Python objects, and associated reference counting. However,
because of the expensive buffer acquisition, I am not sure how much faster
their apporach will be. I prefer to use the NumPy C API, and bypass any
unneccesary overhead. The idea is to make IPC of NumPy arrays fast, and
then we cannot have an expensive buffer acquisition in there.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:48 PM, Sturla Molden wrote:
> Elliot Hallmark  wrote:
>> Strula, this sounds brilliant!  To be clear, you're talking about
>> serializing the numpy array and reconstructing it in a way that's faster
>> than pickle?
> 
> Yes. We know the binary format of NumPy arrays. We don't need to invoke the
> machinery of pickle to serialize an array and write the bytes to some IPC
> mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC
> mechanism might not even be relevant, and could even be deferred to a
> library like ZeroMQ. The point is that if multiple peocesses are to
> cooperate efficiently, we need a way to let them communicate NumPy arrays
> quickly. That is where using multiprocessing hurts today, and shared memory
> does not help here.
> 
> Sturla

You probably already know this, but I just wanted to note that the
mpi4py module has worked around pickle too. They discuss how they
efficiently transfer numpy arrays in mpi messages here:
http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data

Of course not everyone is able to install mpi easily.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:39 PM, Joe Kington wrote:
> 
> 
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
> 
> 
> Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. 
> The default for the `pickle` module changed, but multiprocessing has
> always used a binary pickle protocol to communicate between processes. 
> Have a look at multiprocessing's forking.py  in
> Python 2.7.

Are you sure? As far as I understood the code, it uses the default
protocol 0. The file forking.py no longer exists, also.

https://github.com/python/cpython/tree/master/Lib/multiprocessing
(see reduction.py and queue.py)
http://bugs.python.org/issue23403
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Feng Yu  wrote:

> 1. If we are talking about shared memory and copy-on-write
> inheritance, then we are using 'fork'. 

Not available on Windows. On Unix it only allows one-way communication,
from parent to child.

> 2. Picking of inherited shared memory array can be done minimally by
> just picking the array_interface and the pointer address. It is
> because the child process and the parent share the same address space
> layout, guarenteed by the fork call.

Again, not everyone uses Unix. 

And on Unix it is not trival to pass data back from the child process. I
solved that problem with Sys V IPC (pickling the name of the segment).

> 6. If we are to define a set of operations I would recommend take a
> look at OpenMP as a reference -- It has been out there for decades and
> used widely. An equiavlant to the 'omp parallel for' construct in
> Python will be a very good starting point and immediately useful.

If you are on Unix, you can just use a context manager. Call os.fork in
__enter__ and os.waitpid in __exit__. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Joe Kington  wrote:

> You're far better off just
> communicating between processes as opposed to using shared memory.

Yes.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Benjamin Root  wrote:

> Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
> imagine that it is because the array is huge, right? 

That is a case for shared memory, but what. i was taking about is more
common than this. In order for processes to cooperate, they must
communicate. So we need a way to pass around NumPy arrays quickly.
Sometimes we want to use shared memory because of the size of the data, but
more often it is just used as a form of inexpensive IPC.

> So, the pickling
> approach would copy that array for each process, which defeats the purpose,
> right?

I am not sure what you mean. When I made shared memory arrays I used named
segments, and made sure only the name of the segments were pickled, not the
contents of the buffers.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane  wrote:

> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?

I am not sure. As you have noticed, pickle is faster by to orders of
magnitude on Python 3. But several microseconds is also a lot, particularly
if we are going to do this often during a computation.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Elliot Hallmark  wrote:
> Strula, this sounds brilliant!  To be clear, you're talking about
> serializing the numpy array and reconstructing it in a way that's faster
> than pickle?

Yes. We know the binary format of NumPy arrays. We don't need to invoke the
machinery of pickle to serialize an array and write the bytes to some IPC
mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC
mechanism might not even be relevant, and could even be deferred to a
library like ZeroMQ. The point is that if multiple peocesses are to
cooperate efficiently, we need a way to let them communicate NumPy arrays
quickly. That is where using multiprocessing hurts today, and shared memory
does not help here.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Joe Kington
In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
>
Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0.  The
default for the `pickle` module changed, but multiprocessing has always
used a binary pickle protocol to communicate between processes.  Have a
look at multiprocessing's forking.py in Python 2.7.

As some context here for folks that may not be aware, Sturla is referring
to his earlier shared memory implementation
 he wrote that avoids
actually pickling the data, and instead essentially pickles a pointer to an
array in shared memory.  As Sturla very nicely summed up, it saves memory
usage, but doesn't help the deeper issues.  You're far better off just
communicating between processes as opposed to using shared memory.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Feng Yu
Hi,

I've been thinking and exploring this for some time. If we are to
start some effort I'd like to help. Here are my comments, mostly
regarding to Sturla's comments.

1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'. If we are free to use fork,
then a large chunk of the concerns regarding the python std library
multiprocessing is no longer relevant. Especially those functions must
be in a module limitation that tends to impose a special requirement
on the software design.

2. Picking of inherited shared memory array can be done minimally by
just picking the array_interface and the pointer address. It is
because the child process and the parent share the same address space
layout, guarenteed by the fork call.

3. The RawArray and RawValue implementation in std multiprocessing has
its own memory allocator for managing small variables. It is a huge
overkill (in terms of implementation) if we only care about very large
memory chunks.

4. Hidden sychronization cost on multi-cpu (NUMA?) systems. A choice
is to defer the responsibility of avoiding racing to the developer.
Simple structs for working on slices of array in parallel can cover a
huge fraction of use cases and fully avoid this issue.

5. Whether to delegate parallelism to underlying low level
implementation or to implement the paralellism in python while
maintaining the underlying low level implementation sequential is
probably dependent on the problem. It may be convenient as of the
current state of parallelism support in Python to delegate, but will
it forever be the case?

For example, after the MPI FFTW binding stuck for a long time, someone
wrote a parallel python FFT package
(https://github.com/spectralDNS/mpiFFT4py)  that uses FFTW for
sequential and write all parallel semantics in Python with mpi4py, and
it uses a more efficient domain decomposition.

6. If we are to define a set of operations I would recommend take a
look at OpenMP as a reference -- It has been out there for decades and
used widely. An equiavlant to the 'omp parallel for' construct in
Python will be a very good starting point and immediately useful.

- Yu

On Wed, May 11, 2016 at 11:22 AM, Benjamin Root  wrote:
> Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
> imagine that it is because the array is huge, right? So, the pickling
> approach would copy that array for each process, which defeats the purpose,
> right?
>
> Ben Root
>
> On Wed, May 11, 2016 at 2:01 PM, Allan Haldane 
> wrote:
>>
>> On 05/11/2016 04:29 AM, Sturla Molden wrote:
>> > 4. The reason IPC appears expensive with NumPy is because
>> > multiprocessing
>> > pickles the arrays. It is pickle that is slow, not the IPC. Some would
>> > say
>> > that the pickle overhead is an integral part of the IPC ovearhead, but i
>> > will argue that it is not. The slowness of pickle is a separate problem
>> > alltogether.
>>
>> That's interesting. I've also used multiprocessing with numpy and didn't
>> realize that. Is this true in python3 too?
>>
>> In python2 it appears that multiprocessing uses pickle protocol 0 which
>> must cause a big slowdown (a factor of 100) relative to protocol 2, and
>> uses pickle instead of cPickle.
>>
>> a = np.arange(40*40)
>>
>> %timeit pickle.dumps(a)
>> 1000 loops, best of 3: 1.63 ms per loop
>>
>> %timeit cPickle.dumps(a)
>> 1000 loops, best of 3: 1.56 ms per loop
>>
>> %timeit cPickle.dumps(a, protocol=2)
>> 10 loops, best of 3: 18.9 µs per loop
>>
>> Python 3 uses protocol 3 by default:
>>
>> %timeit pickle.dumps(a)
>> 1 loops, best of 3: 20 µs per loop
>>
>>
>> > 5. Share memory does not improve on the pickle overhead because also
>> > NumPy
>> > arrays with shared memory must be pickled. Multiprocessing can bypass
>> > pickling the RawArray object, but the rest of the NumPy array is
>> > pickled.
>> > Using shared memory arrays have no speed advantage over normal NumPy
>> > arrays
>> > when we use multiprocessing.
>> >
>> > 6. It is much easier to write concurrent code that uses queues for
>> > message
>> > passing than anything else. That is why using a Queue object has been
>> > the
>> > popular Pythonic approach to both multitreading and multiprocessing. I
>> > would like this to continue.
>> >
>> > I am therefore focusing my effort on the multiprocessing.Queue object.
>> > If
>> > you understand the six points I listed you will see where this is going:
>> > What we really need is a specialized queue that has knowledge about
>> > NumPy
>> > arrays and can bypass pickle. I am therefore focusing my efforts on
>> > creating a NumPy aware queue object.
>> >
>> > We are not doing the users a favor by encouraging the use of shared
>> > memory
>> > arrays. They help with nothing.
>> >
>> >
>> > Sturla Molden
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> 

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Benjamin Root
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
imagine that it is because the array is huge, right? So, the pickling
approach would copy that array for each process, which defeats the purpose,
right?

Ben Root

On Wed, May 11, 2016 at 2:01 PM, Allan Haldane 
wrote:

> On 05/11/2016 04:29 AM, Sturla Molden wrote:
> > 4. The reason IPC appears expensive with NumPy is because multiprocessing
> > pickles the arrays. It is pickle that is slow, not the IPC. Some would
> say
> > that the pickle overhead is an integral part of the IPC ovearhead, but i
> > will argue that it is not. The slowness of pickle is a separate problem
> > alltogether.
>
> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?
>
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
> a = np.arange(40*40)
>
> %timeit pickle.dumps(a)
> 1000 loops, best of 3: 1.63 ms per loop
>
> %timeit cPickle.dumps(a)
> 1000 loops, best of 3: 1.56 ms per loop
>
> %timeit cPickle.dumps(a, protocol=2)
> 10 loops, best of 3: 18.9 µs per loop
>
> Python 3 uses protocol 3 by default:
>
> %timeit pickle.dumps(a)
> 1 loops, best of 3: 20 µs per loop
>
>
> > 5. Share memory does not improve on the pickle overhead because also
> NumPy
> > arrays with shared memory must be pickled. Multiprocessing can bypass
> > pickling the RawArray object, but the rest of the NumPy array is pickled.
> > Using shared memory arrays have no speed advantage over normal NumPy
> arrays
> > when we use multiprocessing.
> >
> > 6. It is much easier to write concurrent code that uses queues for
> message
> > passing than anything else. That is why using a Queue object has been the
> > popular Pythonic approach to both multitreading and multiprocessing. I
> > would like this to continue.
> >
> > I am therefore focusing my effort on the multiprocessing.Queue object. If
> > you understand the six points I listed you will see where this is going:
> > What we really need is a specialized queue that has knowledge about NumPy
> > arrays and can bypass pickle. I am therefore focusing my efforts on
> > creating a NumPy aware queue object.
> >
> > We are not doing the users a favor by encouraging the use of shared
> memory
> > arrays. They help with nothing.
> >
> >
> > Sturla Molden
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 04:29 AM, Sturla Molden wrote:
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.

That's interesting. I've also used multiprocessing with numpy and didn't
realize that. Is this true in python3 too?

In python2 it appears that multiprocessing uses pickle protocol 0 which
must cause a big slowdown (a factor of 100) relative to protocol 2, and
uses pickle instead of cPickle.

a = np.arange(40*40)

%timeit pickle.dumps(a)
1000 loops, best of 3: 1.63 ms per loop

%timeit cPickle.dumps(a)
1000 loops, best of 3: 1.56 ms per loop

%timeit cPickle.dumps(a, protocol=2)
10 loops, best of 3: 18.9 µs per loop

Python 3 uses protocol 3 by default:

%timeit pickle.dumps(a)
1 loops, best of 3: 20 µs per loop


> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
> 
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
> 
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
> 
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
> 
> 
> Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Elliot Hallmark
Strula, this sounds brilliant!  To be clear, you're talking about
serializing the numpy array and reconstructing it in a way that's faster
than pickle? Or using shared memory and signaling array creation around
that shared memory rather than using pickle?

For what it's worth, I have used shared memory with numpy arrays as IPC (no
queue), with one process writing to it and one process reading from it, and
liked it.  Your point #5 did not apply because I was reusing the shared
memory.

Do you have a public repo where you are working on this?

Thanks!
  Elliot

On Wed, May 11, 2016 at 3:29 AM, Sturla Molden 
wrote:

> I did some work on this some years ago. I have more or less concluded that
> it was a waste of effort. But first let me explain what the suggested
> approach do not work. As it uses memory mapping to create shared memory
> (i.e. shared segments are not named), they must be created ahead of
> spawning processes. But if you really want this to work smoothly, you want
> named shared memory (Sys V IPC or posix shm_open), so that shared arrays
> can be created in the spawned processes and passed back.
>
> Now for the reason I don't care about shared memory arrays anymore, and
> what I am currently working on instead:
>
> 1. I have come across very few cases where threaded code cannot be used in
> numerical computing. In fact, multithreading nearly always happens in the
> code where I write pure C or Fortran anyway. Most often it happens in
> library code that are already multithreaded (Intel MKL, Apple Accelerate
> Framework, OpenBLAS, etc.), which means using it requires no extra effort
> from my side. A multithreaded LAPACK library is not less multithreaded if I
> call it from Python.
>
> 2. Getting shared memory right can be difficult because of hierarchical
> memory and false sharing. You might not see it if you only have a multicore
> CPU with a shared cache. But your code might not scale up on computers with
> more than one physical processor. False sharing acts like the GIL, except
> it happens in hardware and affects your C code invisibly without any
> explicit locking you can pinpoint. This is also why MPI code tends to scale
> much better than OpenMP code. If nothing is shared there will be no false
> sharing.
>
> 3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
> sockets instead of shared memory it is cheap. There are very few cases
> where the IPC tends to be a bottleneck.
>
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.
>
> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
>
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
>
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
>
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
>
>
> Sturla Molden
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
I did some work on this some years ago. I have more or less concluded that
it was a waste of effort. But first let me explain what the suggested
approach do not work. As it uses memory mapping to create shared memory
(i.e. shared segments are not named), they must be created ahead of
spawning processes. But if you really want this to work smoothly, you want
named shared memory (Sys V IPC or posix shm_open), so that shared arrays
can be created in the spawned processes and passed back.

Now for the reason I don't care about shared memory arrays anymore, and
what I am currently working on instead:

1. I have come across very few cases where threaded code cannot be used in
numerical computing. In fact, multithreading nearly always happens in the
code where I write pure C or Fortran anyway. Most often it happens in
library code that are already multithreaded (Intel MKL, Apple Accelerate
Framework, OpenBLAS, etc.), which means using it requires no extra effort
from my side. A multithreaded LAPACK library is not less multithreaded if I
call it from Python.

2. Getting shared memory right can be difficult because of hierarchical
memory and false sharing. You might not see it if you only have a multicore
CPU with a shared cache. But your code might not scale up on computers with
more than one physical processor. False sharing acts like the GIL, except
it happens in hardware and affects your C code invisibly without any
explicit locking you can pinpoint. This is also why MPI code tends to scale
much better than OpenMP code. If nothing is shared there will be no false
sharing.

3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
sockets instead of shared memory it is cheap. There are very few cases
where the IPC tends to be a bottleneck. 

4. The reason IPC appears expensive with NumPy is because multiprocessing
pickles the arrays. It is pickle that is slow, not the IPC. Some would say
that the pickle overhead is an integral part of the IPC ovearhead, but i
will argue that it is not. The slowness of pickle is a separate problem
alltogether.

5. Share memory does not improve on the pickle overhead because also NumPy
arrays with shared memory must be pickled. Multiprocessing can bypass
pickling the RawArray object, but the rest of the NumPy array is pickled.
Using shared memory arrays have no speed advantage over normal NumPy arrays
when we use multiprocessing.

6. It is much easier to write concurrent code that uses queues for message
passing than anything else. That is why using a Queue object has been the
popular Pythonic approach to both multitreading and multiprocessing. I
would like this to continue.

I am therefore focusing my effort on the multiprocessing.Queue object. If
you understand the six points I listed you will see where this is going:
What we really need is a specialized queue that has knowledge about NumPy
arrays and can bypass pickle. I am therefore focusing my efforts on
creating a NumPy aware queue object.

We are not doing the users a favor by encouraging the use of shared memory
arrays. They help with nothing.


Sturla Molden



Matěj  Týč  wrote:
> Dear Numpy developers,
> I propose a pull request https://github.com/numpy/numpy/pull/7533 that
> features numpy arrays that can be shared among processes (with some
> effort).
> 
> Why:
> In CPython, multiprocessing is the only way of how to exploit
> multi-core CPUs if your parallel code can't avoid creating Python
> objects. In that case, CPython's GIL makes threads unusable. However,
> unlike with threading, sharing data among processes is something that
> is non-trivial and platform-dependent.
> 
> Although numpy (and certainly some other packages) implement some
> operations in a way that GIL is not a concern, consider another case:
> You have a large amount of data in a form of a numpy array and you
> want to pass it to a function of an arbitrary Python module that also
> expects numpy array (e.g. list of vertices coordinates as an input and
> array of the corresponding polygon as an output). Here, it is clear
> GIL is an issue you and since you want a numpy array on both ends, now
> you would have to copy your numpy array to a multiprocessing.Array (to
> pass the data) and then to convert it back to ndarray in the worker
> process.
> This contribution would streamline it a bit - you would create an
> array as you are used to, pass it to the subprocess as you would do
> with the multiprocessing.Array, and the process can work with a numpy
> array right away.
> 
> How:
> The idea is to create a numpy array in a buffer that can be shared
> among processes. Python has support for this in its standard library,
> so the current solution creates a multiprocessing.Array and then
> passes it as the "buffer" to the ndarray.__new__. That would be it on
> Unixes, but on Windows, there has to be a a custom pickle method,
> otherwise the array "forgets" that its buffer is that special and the
> sharing doesn't 

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-04-11 Thread Stephan Hoyer
On Mon, Apr 11, 2016 at 5:39 AM, Matěj Týč  wrote:

> * ... I do see some value in providing a canonical right way to
> construct shared memory arrays in NumPy, but I'm not very happy with
> this solution, ... terrible code organization (with the global
> variables):
> * I understand that, however this is a pattern of Python
> multiprocessing and everybody who wants to use the Pool and shared
> data either is familiar with this approach or has to become familiar
> with[2, 3]. The good compromise is to have a separate module for each
> parallel calculation, so global variables are not a problem.
>

OK, we can agree to disagree on this one. I still don't think I could get
code using this pattern checked in at my work (for good reason).


> * If there's some way to we can paper over the boilerplate such that
>
users can use it without understanding the arcana of multiprocessing,
> then yes, that would be great. But otherwise I'm not sure there's
> anything to be gained by putting it in a library rather than referring
> users to the examples on StackOverflow [1] [2].
> * What about telling users: "You can use numpy with multiprocessing.
> Remeber the multiprocessing.Value and multiprocessing.Aray classes?
> numpy.shm works exactly the same way, which means that it shares their
> limitations. Refer to an example: ." Notice that
> although those SO links contain all of the information, it is very
> difficult to get it up and running for a newcomer like me few years
> ago.
>

I guess I'm still not convinced this is the best we can with the
multiprocessing library. If we're going to do this, then we definitely need
to have the fully canonical example.

For example, could you make the shared array a global variable and then
still pass references to functions called by the processes anyways? The
examples on stackoverflow that we're both looking are varied enough that
it's not obvious to me that this is as good as it gets.

* This needs tests and justification for custom pickling methods,
> which are not used in any of the current examples. ...
> * I am sorry, but don't fully understand that point. The custom
> pickling method of shmarray has to be there on Windows, but users
> don't have to know about it at all. As noted earlier, the global
> variable is the only way of using standard Python multiprocessing.Pool
> with shared objects.
>

That sounds like a fine justification, but given that it wasn't obvious you
needs a comment saying as much in the source code :). Also, it breaks
pickle, which is another limitation that needs to be documented.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-04-11 Thread Matěj Týč
Dear Numpy developers,
I propose a pull request https://github.com/numpy/numpy/pull/7533 that
features numpy arrays that can be shared among processes (with some
effort).

Why:
In CPython, multiprocessing is the only way of how to exploit
multi-core CPUs if your parallel code can't avoid creating Python
objects. In that case, CPython's GIL makes threads unusable. However,
unlike with threading, sharing data among processes is something that
is non-trivial and platform-dependent.

Although numpy (and certainly some other packages) implement some
operations in a way that GIL is not a concern, consider another case:
You have a large amount of data in a form of a numpy array and you
want to pass it to a function of an arbitrary Python module that also
expects numpy array (e.g. list of vertices coordinates as an input and
array of the corresponding polygon as an output). Here, it is clear
GIL is an issue you and since you want a numpy array on both ends, now
you would have to copy your numpy array to a multiprocessing.Array (to
pass the data) and then to convert it back to ndarray in the worker
process.
This contribution would streamline it a bit - you would create an
array as you are used to, pass it to the subprocess as you would do
with the multiprocessing.Array, and the process can work with a numpy
array right away.

How:
The idea is to create a numpy array in a buffer that can be shared
among processes. Python has support for this in its standard library,
so the current solution creates a multiprocessing.Array and then
passes it as the "buffer" to the ndarray.__new__. That would be it on
Unixes, but on Windows, there has to be a a custom pickle method,
otherwise the array "forgets" that its buffer is that special and the
sharing doesn't work.

Some of what has been said in the pull request & my answer to that:

* ... I do see some value in providing a canonical right way to
construct shared memory arrays in NumPy, but I'm not very happy with
this solution, ... terrible code organization (with the global
variables):
* I understand that, however this is a pattern of Python
multiprocessing and everybody who wants to use the Pool and shared
data either is familiar with this approach or has to become familiar
with[2, 3]. The good compromise is to have a separate module for each
parallel calculation, so global variables are not a problem.

* Can you explain why the ndarray subclass is needed? Subclasses can
be rather annoying to get right, and also for other reasons.
* The shmarray class needs the custom pickler (but only on Windows).

* If there's some way to we can paper over the boilerplate such that
users can use it without understanding the arcana of multiprocessing,
then yes, that would be great. But otherwise I'm not sure there's
anything to be gained by putting it in a library rather than referring
users to the examples on StackOverflow [1] [2].
* What about telling users: "You can use numpy with multiprocessing.
Remeber the multiprocessing.Value and multiprocessing.Aray classes?
numpy.shm works exactly the same way, which means that it shares their
limitations. Refer to an example: ." Notice that
although those SO links contain all of the information, it is very
difficult to get it up and running for a newcomer like me few years
ago.

* This needs tests and justification for custom pickling methods,
which are not used in any of the current examples. ...
* I am sorry, but don't fully understand that point. The custom
pickling method of shmarray has to be there on Windows, but users
don't have to know about it at all. As noted earlier, the global
variable is the only way of using standard Python multiprocessing.Pool
with shared objects.

[1]: 
http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing
[2]: 
http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing
[3]: 
http://stackoverflow.com/questions/1675766/how-to-combine-pool-map-with-array-shared-memory-in-python-multiprocessing
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion