Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-24 Thread Sturla Molden
Antoine Pitrou wrote: > When writing C code to interact with buffer-providing objects, you > usually don't bother with memoryviews at all. You just use a Py_buffer > structure. I was taking about "typed memoryviews" which is a Cython abstraction for a Py_buffer struct. I

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-24 Thread Antoine Pitrou
On Thu, 12 May 2016 23:14:36 + (UTC) Sturla Molden wrote: > Antoine Pitrou wrote: > > > Can you define "expensive"? > > Slow enough to cause complaints on the Cython mailing list. What kind of complaints? Please be specific. > > Buffer

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Sturla Molden
Matěj Týč wrote: > Does it mean > that if you pass the numpy array to the child process using Queue, no > significant amount of data will flow through it? This is what my shared memory arrayes do. > Or I shouldn't pass it > using Queue at all and just rely on inheritance?

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Matěj Týč
On 17.5.2016 14:13, Sturla Molden wrote: > Matěj Týč wrote: > >> - Parallel processing of HUGE data, and > This is mainly a Windows problem, as copy-on-write fork() will solve this > on any other platform. ... That sounds interesting, could you elaborate on it a bit? Does

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Sturla Molden
Matěj Týč wrote: > - Parallel processing of HUGE data, and This is mainly a Windows problem, as copy-on-write fork() will solve this on any other platform. I am more in favor of asking Microsoft to fix their broken OS. Also observe that the usefulness of shared memory

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-17 Thread Matěj Týč
On 11.5.2016 10:29, Sturla Molden wrote: > I did some work on this some years ago. ... > I am sorry, I have missed this discussion when it started. There are two cases when I had feeling that I had to use this functionality: - Parallel processing of HUGE data, and - using parallel processing

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Sturla Molden
Feng Yu wrote: > Also, did you checkout http://zeromq.org/blog:zero-copy ? > ZeroMQ is a dependency of Jupyter, so it is quite available. ZeroMQ is great, but it lacks some crucial features. In particular it does not support IPC on Windows. Ideally one should e.g. use

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Feng Yu
> > Personally I prefer a parallel programming style with queues – either to > scatter arrays to workers and collecting arrays from workers, or to chain > workers together in a pipeline (without using coroutines). But exactly how > you program is a matter of taste. I want to make it as inexpensive

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-13 Thread Sturla Molden
Niki Spahiev wrote: > Apparently next Win10 will have fork as part of bash integration. It is Interix/SUA rebranded "Subsystem for Linux". It remains to be seen how long it will stay this time. Also a Python built for this subsystem will not run on the Win32 subsystem,

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Feng Yu wrote: > In most (half?) situations the result can be directly write back via > preallocated shared array before works are spawned. Then there is no > need to pass data back with named segments. You can work around it in various ways, this being one of them.

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Dave
Antoine Pitrou pitrou.net> writes: > > On Thu, 12 May 2016 06:27:43 + (UTC) > Sturla Molden gmail.com> wrote: > > > Allan Haldane gmail.com> wrote: > > > > > You probably already know this, but I just wanted to note that the > > > mpi4py module has worked around pickle too. They discuss

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Niki Spahiev wrote: > Apparently next Win10 will have fork as part of bash integration. That would be great. The lack of fork on Windows is very annoying. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Antoine Pitrou wrote: > Can you define "expensive"? Slow enough to cause complaints on the Cython mailing list. > You're assuming this is the cost of "buffer acquisition", while most > likely it's the cost of creating the memoryview object itself. Constructing a typed

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Feng Yu
> Again, not everyone uses Unix. > > And on Unix it is not trival to pass data back from the child process. I > solved that problem with Sys V IPC (pickling the name of the segment). > I wonder if it is neccessary insist being able to pass large amount of data back from child to the parent

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Antoine Pitrou
On Thu, 12 May 2016 06:27:43 + (UTC) Sturla Molden wrote: > Allan Haldane wrote: > > > You probably already know this, but I just wanted to note that the > > mpi4py module has worked around pickle too. They discuss how they > > efficiently

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Niki Spahiev
On 12.05.2016 02:02, Sturla Molden wrote: Feng Yu wrote: 1. If we are talking about shared memory and copy-on-write inheritance, then we are using 'fork'. Not available on Windows. On Unix it only allows one-way communication, from parent to child. Apparently next

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-12 Thread Sturla Molden
Allan Haldane wrote: > You probably already know this, but I just wanted to note that the > mpi4py module has worked around pickle too. They discuss how they > efficiently transfer numpy arrays in mpi messages here: >

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:48 PM, Sturla Molden wrote: > Elliot Hallmark wrote: >> Strula, this sounds brilliant! To be clear, you're talking about >> serializing the numpy array and reconstructing it in a way that's faster >> than pickle? > > Yes. We know the binary format of

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:39 PM, Joe Kington wrote: > > > In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > > Even on Python 2.x, multiprocessing uses

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Feng Yu wrote: > 1. If we are talking about shared memory and copy-on-write > inheritance, then we are using 'fork'. Not available on Windows. On Unix it only allows one-way communication, from parent to child. > 2. Picking of inherited shared memory array can be done

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Joe Kington wrote: > You're far better off just > communicating between processes as opposed to using shared memory. Yes. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Benjamin Root wrote: > Oftentimes, if one needs to share numpy arrays for multiprocessing, I would > imagine that it is because the array is huge, right? That is a case for shared memory, but what. i was taking about is more common than this. In order for processes to

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane wrote: > That's interesting. I've also used multiprocessing with numpy and didn't > realize that. Is this true in python3 too? I am not sure. As you have noticed, pickle is faster by to orders of magnitude on Python 3. But several microseconds is also a

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Elliot Hallmark wrote: > Strula, this sounds brilliant! To be clear, you're talking about > serializing the numpy array and reconstructing it in a way that's faster > than pickle? Yes. We know the binary format of NumPy arrays. We don't need to invoke the machinery of

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Joe Kington
In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. The default for the `pickle` module changed,

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Feng Yu
Hi, I've been thinking and exploring this for some time. If we are to start some effort I'd like to help. Here are my comments, mostly regarding to Sturla's comments. 1. If we are talking about shared memory and copy-on-write inheritance, then we are using 'fork'. If we are free to use fork,

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Benjamin Root
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would imagine that it is because the array is huge, right? So, the pickling approach would copy that array for each process, which defeats the purpose, right? Ben Root On Wed, May 11, 2016 at 2:01 PM, Allan Haldane

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 04:29 AM, Sturla Molden wrote: > 4. The reason IPC appears expensive with NumPy is because multiprocessing > pickles the arrays. It is pickle that is slow, not the IPC. Some would say > that the pickle overhead is an integral part of the IPC ovearhead, but i > will argue that it is

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Elliot Hallmark
Strula, this sounds brilliant! To be clear, you're talking about serializing the numpy array and reconstructing it in a way that's faster than pickle? Or using shared memory and signaling array creation around that shared memory rather than using pickle? For what it's worth, I have used shared

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
I did some work on this some years ago. I have more or less concluded that it was a waste of effort. But first let me explain what the suggested approach do not work. As it uses memory mapping to create shared memory (i.e. shared segments are not named), they must be created ahead of spawning

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-04-11 Thread Stephan Hoyer
On Mon, Apr 11, 2016 at 5:39 AM, Matěj Týč wrote: > * ... I do see some value in providing a canonical right way to > construct shared memory arrays in NumPy, but I'm not very happy with > this solution, ... terrible code organization (with the global > variables): > * I

[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-04-11 Thread Matěj Týč
Dear Numpy developers, I propose a pull request https://github.com/numpy/numpy/pull/7533 that features numpy arrays that can be shared among processes (with some effort). Why: In CPython, multiprocessing is the only way of how to exploit multi-core CPUs if your parallel code can't avoid creating