Antoine Pitrou wrote:
> When writing C code to interact with buffer-providing objects, you
> usually don't bother with memoryviews at all. You just use a Py_buffer
> structure.
I was taking about "typed memoryviews" which is a Cython abstraction for a
Py_buffer struct. I
On Thu, 12 May 2016 23:14:36 + (UTC)
Sturla Molden wrote:
> Antoine Pitrou wrote:
>
> > Can you define "expensive"?
>
> Slow enough to cause complaints on the Cython mailing list.
What kind of complaints? Please be specific.
> > Buffer
Matěj Týč wrote:
> Does it mean
> that if you pass the numpy array to the child process using Queue, no
> significant amount of data will flow through it?
This is what my shared memory arrayes do.
> Or I shouldn't pass it
> using Queue at all and just rely on inheritance?
On 17.5.2016 14:13, Sturla Molden wrote:
> Matěj Týč wrote:
>
>> - Parallel processing of HUGE data, and
> This is mainly a Windows problem, as copy-on-write fork() will solve this
> on any other platform. ...
That sounds interesting, could you elaborate on it a bit? Does
Matěj Týč wrote:
> - Parallel processing of HUGE data, and
This is mainly a Windows problem, as copy-on-write fork() will solve this
on any other platform. I am more in favor of asking Microsoft to fix their
broken OS.
Also observe that the usefulness of shared memory
On 11.5.2016 10:29, Sturla Molden wrote:
> I did some work on this some years ago. ...
>
I am sorry, I have missed this discussion when it started.
There are two cases when I had feeling that I had to use this functionality:
- Parallel processing of HUGE data, and
- using parallel processing
Feng Yu wrote:
> Also, did you checkout http://zeromq.org/blog:zero-copy ?
> ZeroMQ is a dependency of Jupyter, so it is quite available.
ZeroMQ is great, but it lacks some crucial features. In particular it does
not support IPC on Windows. Ideally one should e.g. use
>
> Personally I prefer a parallel programming style with queues – either to
> scatter arrays to workers and collecting arrays from workers, or to chain
> workers together in a pipeline (without using coroutines). But exactly how
> you program is a matter of taste. I want to make it as inexpensive
Niki Spahiev wrote:
> Apparently next Win10 will have fork as part of bash integration.
It is Interix/SUA rebranded "Subsystem for Linux". It remains to be seen
how long it will stay this time. Also a Python built for this subsystem
will not run on the Win32 subsystem,
Feng Yu wrote:
> In most (half?) situations the result can be directly write back via
> preallocated shared array before works are spawned. Then there is no
> need to pass data back with named segments.
You can work around it in various ways, this being one of them.
Antoine Pitrou pitrou.net> writes:
>
> On Thu, 12 May 2016 06:27:43 + (UTC)
> Sturla Molden gmail.com> wrote:
>
> > Allan Haldane gmail.com> wrote:
> >
> > > You probably already know this, but I just wanted to note that the
> > > mpi4py module has worked around pickle too. They discuss
Niki Spahiev wrote:
> Apparently next Win10 will have fork as part of bash integration.
That would be great. The lack of fork on Windows is very annoying.
Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
Antoine Pitrou wrote:
> Can you define "expensive"?
Slow enough to cause complaints on the Cython mailing list.
> You're assuming this is the cost of "buffer acquisition", while most
> likely it's the cost of creating the memoryview object itself.
Constructing a typed
> Again, not everyone uses Unix.
>
> And on Unix it is not trival to pass data back from the child process. I
> solved that problem with Sys V IPC (pickling the name of the segment).
>
I wonder if it is neccessary insist being able to pass large amount of data
back from child to the parent
On Thu, 12 May 2016 06:27:43 + (UTC)
Sturla Molden wrote:
> Allan Haldane wrote:
>
> > You probably already know this, but I just wanted to note that the
> > mpi4py module has worked around pickle too. They discuss how they
> > efficiently
On 12.05.2016 02:02, Sturla Molden wrote:
Feng Yu wrote:
1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'.
Not available on Windows. On Unix it only allows one-way communication,
from parent to child.
Apparently next
Allan Haldane wrote:
> You probably already know this, but I just wanted to note that the
> mpi4py module has worked around pickle too. They discuss how they
> efficiently transfer numpy arrays in mpi messages here:
>
On 05/11/2016 06:48 PM, Sturla Molden wrote:
> Elliot Hallmark wrote:
>> Strula, this sounds brilliant! To be clear, you're talking about
>> serializing the numpy array and reconstructing it in a way that's faster
>> than pickle?
>
> Yes. We know the binary format of
On 05/11/2016 06:39 PM, Joe Kington wrote:
>
>
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
>
> Even on Python 2.x, multiprocessing uses
Feng Yu wrote:
> 1. If we are talking about shared memory and copy-on-write
> inheritance, then we are using 'fork'.
Not available on Windows. On Unix it only allows one-way communication,
from parent to child.
> 2. Picking of inherited shared memory array can be done
Joe Kington wrote:
> You're far better off just
> communicating between processes as opposed to using shared memory.
Yes.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
Benjamin Root wrote:
> Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
> imagine that it is because the array is huge, right?
That is a case for shared memory, but what. i was taking about is more
common than this. In order for processes to
Allan Haldane wrote:
> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?
I am not sure. As you have noticed, pickle is faster by to orders of
magnitude on Python 3. But several microseconds is also a
Elliot Hallmark wrote:
> Strula, this sounds brilliant! To be clear, you're talking about
> serializing the numpy array and reconstructing it in a way that's faster
> than pickle?
Yes. We know the binary format of NumPy arrays. We don't need to invoke the
machinery of
In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
>
Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. The
default for the `pickle` module changed,
Hi,
I've been thinking and exploring this for some time. If we are to
start some effort I'd like to help. Here are my comments, mostly
regarding to Sturla's comments.
1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'. If we are free to use fork,
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
imagine that it is because the array is huge, right? So, the pickling
approach would copy that array for each process, which defeats the purpose,
right?
Ben Root
On Wed, May 11, 2016 at 2:01 PM, Allan Haldane
On 05/11/2016 04:29 AM, Sturla Molden wrote:
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is
Strula, this sounds brilliant! To be clear, you're talking about
serializing the numpy array and reconstructing it in a way that's faster
than pickle? Or using shared memory and signaling array creation around
that shared memory rather than using pickle?
For what it's worth, I have used shared
I did some work on this some years ago. I have more or less concluded that
it was a waste of effort. But first let me explain what the suggested
approach do not work. As it uses memory mapping to create shared memory
(i.e. shared segments are not named), they must be created ahead of
spawning
On Mon, Apr 11, 2016 at 5:39 AM, Matěj Týč wrote:
> * ... I do see some value in providing a canonical right way to
> construct shared memory arrays in NumPy, but I'm not very happy with
> this solution, ... terrible code organization (with the global
> variables):
> * I
Dear Numpy developers,
I propose a pull request https://github.com/numpy/numpy/pull/7533 that
features numpy arrays that can be shared among processes (with some
effort).
Why:
In CPython, multiprocessing is the only way of how to exploit
multi-core CPUs if your parallel code can't avoid creating
32 matches
Mail list logo