Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-09 Thread Daπid
On 6 February 2016 at 23:56, Elliot Hallmark  wrote:
> Now, I would like to have these arrays shared between processes spawned via 
> multiprocessing (for fast interprocess communication purposes, not for 
> parallelizing work on an array).  I don't care about mapping to a file on 
> disk, and I don't want disk I/O happening.  I don't care (really) about data 
> being copied in memory on resize.  I *do* want the array to be resized "in 
> place", so that the child processes can still access the arrays from the 
> object they were initialized with.

If you are only reading in parallel, and you can afford the extra
dependency, one alternative way to do this would be to use an
expandable array from HDF5:

http://www.pytables.org/usersguide/libref/homogenous_storage.html#earrayclassdescr

To avoid I/O, your file can live in RAM.

http://www.pytables.org/cookbook/inmemory_hdf5_files.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-09 Thread Feng Yu
Hi,

If the base address and size of the anonymous memory map are 'shared',
then one can protect them with a lock, grow the memmap with remap (or
unmap and map, or other tricks), and release the lock. During the
'resize' call, any reference to the array from Python in other
processes could just spin on the lock.

This is probably more defined than using signals. but I am not sure
about how to enforce the spinning when an object is referenced.

A possibility is that one can insist that a 'resizable' mmap must be
accessed via a context manager, e.g.

growable = shm.growable(initsize)

rank = do the magic to fork processes

if rank == 0:
growable.grow(fill=0, size=10)
else:
with growable as a:
   a += 10


Yu

On Sun, Feb 7, 2016 at 3:11 PM, Elliot Hallmark  wrote:
> That makes sense.  I could either send a signal to the child process letting
> it know to re-instantiate the numpy array using the same (but now resized)
> buffer, or I could have it check to see if the buffer has been resized when
> it might need it and re-instantiate then.  That's actually not too bad.  It
> would be nice if the array could be resized, but it's probably unstable to
> do so and there isn't much demand for it.
>
> Thanks,
>   Elliot
>
> On Sat, Feb 6, 2016 at 8:01 PM, Sebastian Berg 
> wrote:
>>
>> On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
>> > Hi all,
>> >
>> > I have a program that uses resize-able arrays.  I already over
>> > -provision the arrays and use slices, but every now and then the data
>> > outgrows that array and it needs to be resized.
>> >
>> > Now, I would like to have these arrays shared between processes
>> > spawned via multiprocessing (for fast interprocess communication
>> > purposes, not for parallelizing work on an array).  I don't care
>> > about mapping to a file on disk, and I don't want disk I/O happening.
>> >   I don't care (really) about data being copied in memory on resize.
>> > I *do* want the array to be resized "in place", so that the child
>> > processes can still access the arrays from the object they were
>> > initialized with.
>> >
>> >
>> > I can share arrays easily using arrays that are backed by memmap.
>> > Ie:
>> >
>> > ```
>> > #Source: http://github.com/rainwoodman/sharedmem
>> >
>> >
>> > class anonymousmemmap(numpy.memmap):
>> > def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
>> >
>> > descr = numpy.dtype(dtype)
>> > _dbytes = descr.itemsize
>> >
>> > shape = numpy.atleast_1d(shape)
>> > size = 1
>> > for k in shape:
>> > size *= k
>> >
>> > bytes = int(size*_dbytes)
>> >
>> > if bytes > 0:
>> > mm = mmap.mmap(-1,bytes)
>> > else:
>> > mm = numpy.empty(0, dtype=descr)
>> > self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
>> > buffer=mm, order=order)
>> > self._mmap = mm
>> > return self
>> >
>> > def __array_wrap__(self, outarr, context=None):
>> > return
>> > numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
>> > context)
>> > ```
>> >
>> > This cannot be resized because it does not own it's own data
>> > (ValueError: cannot resize this array: it does not own its data).
>> > (numpy.memmap has this same issue [0], even if I set refcheck to
>> > False and even though the docs say otherwise [1]).
>> >
>> > arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
>> > Bad file descriptor).  If I create a file and use that fileno to
>> > create the memmap, then I can resize `arr._mmap` but the array itself
>> > is not resized.
>> >
>> > Is there a way to accomplish what I want?  Or, do I just need to
>> > figure out a way to communicate new arrays to the child processes?
>> >
>>
>> I guess the answer is no, but the first question should be whether you
>> can create a new array viewing the same data that is just larger? Since
>> you have the mmap, that would be creating a new view into it.
>>
>> I.e. your "array" would be the memmap, and to use it, you always rewrap
>> it into a new numpy array.
>>
>> Other then that, you would have to mess with the internal ndarray
>> structure, since these kind of operations appear rather unsafe.
>>
>> - Sebastian
>>
>>
>> > Thanks,
>> >   Elliot
>> >
>> > [0] https://github.com/numpy/numpy/issues/4198.
>> >
>> > [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
>> > resize.html
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> 

Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-07 Thread Elliot Hallmark
That makes sense.  I could either send a signal to the child process
letting it know to re-instantiate the numpy array using the same (but now
resized) buffer, or I could have it check to see if the buffer has been
resized when it might need it and re-instantiate then.  That's actually not
too bad.  It would be nice if the array could be resized, but it's probably
unstable to do so and there isn't much demand for it.

Thanks,
  Elliot

On Sat, Feb 6, 2016 at 8:01 PM, Sebastian Berg 
wrote:

> On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
> > Hi all,
> >
> > I have a program that uses resize-able arrays.  I already over
> > -provision the arrays and use slices, but every now and then the data
> > outgrows that array and it needs to be resized.
> >
> > Now, I would like to have these arrays shared between processes
> > spawned via multiprocessing (for fast interprocess communication
> > purposes, not for parallelizing work on an array).  I don't care
> > about mapping to a file on disk, and I don't want disk I/O happening.
> >   I don't care (really) about data being copied in memory on resize.
> > I *do* want the array to be resized "in place", so that the child
> > processes can still access the arrays from the object they were
> > initialized with.
> >
> >
> > I can share arrays easily using arrays that are backed by memmap.
> > Ie:
> >
> > ```
> > #Source: http://github.com/rainwoodman/sharedmem
> >
> >
> > class anonymousmemmap(numpy.memmap):
> > def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
> >
> > descr = numpy.dtype(dtype)
> > _dbytes = descr.itemsize
> >
> > shape = numpy.atleast_1d(shape)
> > size = 1
> > for k in shape:
> > size *= k
> >
> > bytes = int(size*_dbytes)
> >
> > if bytes > 0:
> > mm = mmap.mmap(-1,bytes)
> > else:
> > mm = numpy.empty(0, dtype=descr)
> > self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
> > buffer=mm, order=order)
> > self._mmap = mm
> > return self
> >
> > def __array_wrap__(self, outarr, context=None):
> > return
> > numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
> > context)
> > ```
> >
> > This cannot be resized because it does not own it's own data
> > (ValueError: cannot resize this array: it does not own its data).
> > (numpy.memmap has this same issue [0], even if I set refcheck to
> > False and even though the docs say otherwise [1]).
> >
> > arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
> > Bad file descriptor).  If I create a file and use that fileno to
> > create the memmap, then I can resize `arr._mmap` but the array itself
> > is not resized.
> >
> > Is there a way to accomplish what I want?  Or, do I just need to
> > figure out a way to communicate new arrays to the child processes?
> >
>
> I guess the answer is no, but the first question should be whether you
> can create a new array viewing the same data that is just larger? Since
> you have the mmap, that would be creating a new view into it.
>
> I.e. your "array" would be the memmap, and to use it, you always rewrap
> it into a new numpy array.
>
> Other then that, you would have to mess with the internal ndarray
> structure, since these kind of operations appear rather unsafe.
>
> - Sebastian
>
>
> > Thanks,
> >   Elliot
> >
> > [0] https://github.com/numpy/numpy/issues/4198.
> >
> > [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
> > resize.html
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-06 Thread Sebastian Berg
On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
> Hi all,
> 
> I have a program that uses resize-able arrays.  I already over
> -provision the arrays and use slices, but every now and then the data
> outgrows that array and it needs to be resized.  
> 
> Now, I would like to have these arrays shared between processes
> spawned via multiprocessing (for fast interprocess communication
> purposes, not for parallelizing work on an array).  I don't care
> about mapping to a file on disk, and I don't want disk I/O happening.
>   I don't care (really) about data being copied in memory on resize. 
> I *do* want the array to be resized "in place", so that the child
> processes can still access the arrays from the object they were
> initialized with.
> 
> 
> I can share arrays easily using arrays that are backed by memmap. 
> Ie:
> 
> ```
> #Source: http://github.com/rainwoodman/sharedmem 
> 
> 
> class anonymousmemmap(numpy.memmap):
> def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
> 
> descr = numpy.dtype(dtype)
> _dbytes = descr.itemsize
> 
> shape = numpy.atleast_1d(shape)
> size = 1
> for k in shape:
> size *= k
> 
> bytes = int(size*_dbytes)
> 
> if bytes > 0:
> mm = mmap.mmap(-1,bytes)
> else:
> mm = numpy.empty(0, dtype=descr)
> self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
> buffer=mm, order=order)
> self._mmap = mm
> return self
> 
> def __array_wrap__(self, outarr, context=None):
> return
> numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
> context)
> ```
> 
> This cannot be resized because it does not own it's own data
> (ValueError: cannot resize this array: it does not own its data). 
> (numpy.memmap has this same issue [0], even if I set refcheck to
> False and even though the docs say otherwise [1]). 
> 
> arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
> Bad file descriptor).  If I create a file and use that fileno to
> create the memmap, then I can resize `arr._mmap` but the array itself
> is not resized.
> 
> Is there a way to accomplish what I want?  Or, do I just need to
> figure out a way to communicate new arrays to the child processes?
> 

I guess the answer is no, but the first question should be whether you
can create a new array viewing the same data that is just larger? Since
you have the mmap, that would be creating a new view into it.

I.e. your "array" would be the memmap, and to use it, you always rewrap
it into a new numpy array.

Other then that, you would have to mess with the internal ndarray
structure, since these kind of operations appear rather unsafe.

- Sebastian


> Thanks,
>   Elliot
> 
> [0] https://github.com/numpy/numpy/issues/4198.
> 
> [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
> resize.html
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion