Re: [Numpy-discussion] np.array, copy=False and memmap

2017-09-06 Thread Thomas Jollans
On 2017-08-07 23:01, Nisoli Isaia wrote:
> Dear all,
> I have a question about the behaviour of 
> |
> |
> |y ||=||np.array(x, copy||=||False||, dtype||=||'float32'||)|
> 
> when x is a memmap. If we check the memmap attribute of mmap> |
> |
> |print||"mmap attribute"||, y._mmap|
> |
> |
> numpy tells us that y is not a memmap.

Regardless of any bugs exposed by the snippet of code below, everything
is fine here. You created y as an array, so it's an array, not a memmap.
Maybe it should be a memmap. It doesn't matter: it's still backed by a
memmap!


Python 2.7.5 (default, Aug  2 2017, 11:05:32)
Type "copyright", "credits" or "license" for more information.

IPython 5.4.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help  -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.13.0'

In [3]: with open('test_memmap', 'w+b') as fp:
   ...: fp.write(b'\0' * 2048)
   ...:

In [4]: x = np.memmap('test_memmap', dtype='int16')

In [5]: x
Out[5]: memmap([0, 0, 0, ..., 0, 0, 0], dtype=int16)

In [6]: id(x)
Out[6]: 47365848

In [7]: y = np.array(x, copy=False)

In [8]: y
Out[8]: array([0, 0, 0, ..., 0, 0, 0], dtype=int16)

In [9]: del x

In [10]: y.base
Out[10]: memmap([0, 0, 0, ..., 0, 0, 0], dtype=int16)

In [11]: id(y.base) == Out[6]
Out[11]: True

In [12]: y[:] = 0x0102

In [13]: y
Out[13]: array([258, 258, 258, ..., 258, 258, 258], dtype=int16)

In [14]: del y

In [15]: with open('test_memmap', 'rb') as fp:
...: print [ord(c) for c in fp.read(10)]
...:
[2, 1, 2, 1, 2, 1, 2, 1, 2, 1]

In [16]:


> But the following code snippet crashes the python interpreter
> 
> |# opens the memmap|
> |with ||open||(filename,||'r+b'||) as f:|
> |  ||mm ||=| |mmap.mmap(f.fileno(),||0||)|
> |  ||x ||=| |np.frombuffer(mm, dtype||=||'float32'||)|
>  
> |# builds an array from the memmap, with the option copy=False|
> |y ||=| |np.array(x, copy||=||False||, dtype||=||'float32'||)|
> |print| |"before"||, y|
>  
> |# closes the file|
> |mm.close()|
> |print| |"after"||, y|
> 
> In my code I use memmaps to share read-only objects when doing parallel
> processing
> and the behaviour of np.array, even if not consistent, it's desirable.
> I share scipy sparse matrices over many processes and if np.array would
> make a copy
> when dealing with memmaps this would force me to rewrite part of the
> sparse matrices
> code.
> Would it be possible in the future releases of numpy to have np.array
> check, 
> if copy is false, if y is a memmap and in that case return a full memmap
> object
> instead of slicing it?
> 
> Best wishes
> Isaia
> 
> P.S. A longer account of the issue may be found on my university blog
> http://www.im.ufrj.br/nisoli/blog/?p=131
> 
> -- 
> Isaia Nisoli
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


-- 
Thomas Jollans
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array, copy=False and memmap

2017-08-10 Thread Allan Haldane
On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
> Dear all,
> I have a question about the behaviour of
> 
> y = np.array(x, copy=False, dtype='float32')
> 
> when x is a memmap. If we check the memmap attribute of mmap
> 
> print "mmap attribute", y._mmap
> 
> numpy tells us that y is not a memmap.
> But the following code snippet crashes the python interpreter
> 
> # opens the memmap
> with open(filename,'r+b') as f:
>   mm = mmap.mmap(f.fileno(),0)
>   x = np.frombuffer(mm, dtype='float32')
> 
> # builds an array from the memmap, with the option copy=False
> y = np.array(x, copy=False, dtype='float32')
> print "before", y
> 
> # closes the file
> mm.close()
> print "after", y
> 
> In my code I use memmaps to share read-only objects when doing parallel
> processing
> and the behaviour of np.array, even if not consistent, it's desirable.
> I share scipy sparse matrices over many processes and if np.array would
> make a copy
> when dealing with memmaps this would force me to rewrite part of the sparse
> matrices
> code.
> Would it be possible in the future releases of numpy to have np.array
> check,
> if copy is false, if y is a memmap and in that case return a full memmap
> object
> instead of slicing it?
> 
> Best wishes
> Isaia
> 
> P.S. A longer account of the issue may be found on my university blog
> http://www.im.ufrj.br/nisoli/blog/?p=131

I just read your blog post, as well.

To confirm your question there: yes, if you slice or "view" a numpy
array which points to memmapped data, then the slice or view will also
point to memmapped data and will not make a copy. This way you avoid
using up a lot of memory.

It is also important to realize that `np.memmap` is merely a subclass of
`np.ndarray` which just provides a few extra helper methods which
ndarrays don't have, but is otherwise identical. The most important
difference is that `np.memmap` has a `flush` method. (It also has a
_mmap private attribute). But otherwise, both ndarrays and memmaps have
an internal data pointer pointing to the underlying data, and slices or
views of ndarrays (or memmaps) will point to the same memory (no
copies). In your code when you do

y = np.array(x, copy=False)

where x is a np.memmap object, y will point to the same memory locations
as x. However, y will not be a memmap object, because of how you
constructed it, so will not have the `flush` method which can be
important if you are writing to y and expect it to be written to disk.
If you are only reading from  y, though, this shouldn't matter.

Also, note that an np.memmap object is different from an mmap.mmap
object: The former uses the latter internally.

Allan


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array, copy=False and memmap

2017-08-10 Thread Allan Haldane
On 08/10/2017 02:24 PM, Sebastian Berg wrote:
> On Thu, 2017-08-10 at 12:27 -0400, Allan Haldane wrote:
>> On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
>>> Dear all,
>>> I have a question about the behaviour of
>>>
>>> y = np.array(x, copy=False, dtype='float32')
>>>
>>> when x is a memmap. If we check the memmap attribute of mmap
>>>
>>> print "mmap attribute", y._mmap
>>>
>>> numpy tells us that y is not a memmap.
>>> But the following code snippet crashes the python interpreter
>>>
>>> # opens the memmap
>>> with open(filename,'r+b') as f:
>>>   mm = mmap.mmap(f.fileno(),0)
>>>   x = np.frombuffer(mm, dtype='float32')
>>>
>>> # builds an array from the memmap, with the option copy=False
>>> y = np.array(x, copy=False, dtype='float32')
>>> print "before", y
>>>
>>> # closes the file
>>> mm.close()
>>> print "after", y
>>>
>>> In my code I use memmaps to share read-only objects when doing
>>> parallel
>>> processing
>>> and the behaviour of np.array, even if not consistent, it's
>>> desirable.
>>> I share scipy sparse matrices over many processes and if np.array
>>> would
>>> make a copy
>>> when dealing with memmaps this would force me to rewrite part of
>>> the sparse
>>> matrices
>>> code.
>>> Would it be possible in the future releases of numpy to have
>>> np.array
>>> check,
>>> if copy is false, if y is a memmap and in that case return a full
>>> memmap
>>> object
>>> instead of slicing it?
>>
>> This does appear to be a bug in numpy or mmap.
>>
> 
> Frankly on first sight, I do not think it is a bug in either of them.
> Numpy uses view (memmap really is just a name for a memory map backed
> numpy array). The numpy array will hold a reference to the memory map
> object in its `.base` attribute (or the base of the base, etc.).
> 
> If you close a mmap object, and then keep using it, you can get
> segfaults of course, I am not sure what you can do about it. Maybe
> python can try to warn you when you exit the context/close a file
> pointer, but I suppose: Python does memory management for you, it makes
> doing IO management easy, but you need to manage the IO correctly. That
> this segfaults and not just errors may be annoying, but seems the
> nature of things on first sight.
> 
> - Sebastian

I admit I have not had time to investigate it thoroughly, but it appears
to me that the intended design of mmap was to make it impossible to
close a mmap if there were still pointers to it.

Consider the following behavior (python3):

>>> import mmap
>>> with open('test', 'r+b') as f:
>>> mm = mmap.mmap(f.fileno(),0)
>>> mv = memoryview(mm)
>>> mm.close()
BufferError: cannot close exported pointers exist

If memoryview behaves this way, why doesn't/can't ndarray? (Both use the
PEP3118 interface, as far as I understand).

You can see in the mmap code that it tries to carefully keep track of
any exported buffers, but numpy manages to bypass this:
https://github.com/python/cpython/blob/b879fe82e7e5c3f7673c9a7fa4aad42bd05445d8/Modules/mmapmodule.c#L727



Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array, copy=False and memmap

2017-08-10 Thread Sebastian Berg
On Thu, 2017-08-10 at 12:27 -0400, Allan Haldane wrote:
> On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
> > Dear all,
> > I have a question about the behaviour of
> > 
> > y = np.array(x, copy=False, dtype='float32')
> > 
> > when x is a memmap. If we check the memmap attribute of mmap
> > 
> > print "mmap attribute", y._mmap
> > 
> > numpy tells us that y is not a memmap.
> > But the following code snippet crashes the python interpreter
> > 
> > # opens the memmap
> > with open(filename,'r+b') as f:
> >   mm = mmap.mmap(f.fileno(),0)
> >   x = np.frombuffer(mm, dtype='float32')
> > 
> > # builds an array from the memmap, with the option copy=False
> > y = np.array(x, copy=False, dtype='float32')
> > print "before", y
> > 
> > # closes the file
> > mm.close()
> > print "after", y
> > 
> > In my code I use memmaps to share read-only objects when doing
> > parallel
> > processing
> > and the behaviour of np.array, even if not consistent, it's
> > desirable.
> > I share scipy sparse matrices over many processes and if np.array
> > would
> > make a copy
> > when dealing with memmaps this would force me to rewrite part of
> > the sparse
> > matrices
> > code.
> > Would it be possible in the future releases of numpy to have
> > np.array
> > check,
> > if copy is false, if y is a memmap and in that case return a full
> > memmap
> > object
> > instead of slicing it?
> 
> This does appear to be a bug in numpy or mmap.
> 

Frankly on first sight, I do not think it is a bug in either of them.
Numpy uses view (memmap really is just a name for a memory map backed
numpy array). The numpy array will hold a reference to the memory map
object in its `.base` attribute (or the base of the base, etc.).

If you close a mmap object, and then keep using it, you can get
segfaults of course, I am not sure what you can do about it. Maybe
python can try to warn you when you exit the context/close a file
pointer, but I suppose: Python does memory management for you, it makes
doing IO management easy, but you need to manage the IO correctly. That
this segfaults and not just errors may be annoying, but seems the
nature of things on first sight.

- Sebastian



> Probably the solution isn't to make mmaps a special case, rather we
> should fix a bug somewhere in the use of the PEP3118 interface.
> 
> I've opened an issue on github for your issue:
> https://github.com/numpy/numpy/issues/9537
> 
> It seems to me that the "correct" behavior may be for it to me
> impossible to close the memmap while pointers to it exist; this is
> the
> behavior for `memoryview`s of mmaps. That is, your line `mm.close()`
> shoud raise an error `BufferError: cannot close exported pointers
> exist`.
> 
> 
> > Best wishes
> > Isaia
> > 
> > P.S. A longer account of the issue may be found on my university
> > blog
> > http://www.im.ufrj.br/nisoli/blog/?p=131
> > 
> > 
> > 
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.array, copy=False and memmap

2017-08-10 Thread Allan Haldane
On 08/07/2017 05:01 PM, Nisoli Isaia wrote:
> Dear all,
> I have a question about the behaviour of
> 
> y = np.array(x, copy=False, dtype='float32')
> 
> when x is a memmap. If we check the memmap attribute of mmap
> 
> print "mmap attribute", y._mmap
> 
> numpy tells us that y is not a memmap.
> But the following code snippet crashes the python interpreter
> 
> # opens the memmap
> with open(filename,'r+b') as f:
>   mm = mmap.mmap(f.fileno(),0)
>   x = np.frombuffer(mm, dtype='float32')
> 
> # builds an array from the memmap, with the option copy=False
> y = np.array(x, copy=False, dtype='float32')
> print "before", y
> 
> # closes the file
> mm.close()
> print "after", y
> 
> In my code I use memmaps to share read-only objects when doing parallel
> processing
> and the behaviour of np.array, even if not consistent, it's desirable.
> I share scipy sparse matrices over many processes and if np.array would
> make a copy
> when dealing with memmaps this would force me to rewrite part of the sparse
> matrices
> code.
> Would it be possible in the future releases of numpy to have np.array
> check,
> if copy is false, if y is a memmap and in that case return a full memmap
> object
> instead of slicing it?

This does appear to be a bug in numpy or mmap.

Probably the solution isn't to make mmaps a special case, rather we
should fix a bug somewhere in the use of the PEP3118 interface.

I've opened an issue on github for your issue:
https://github.com/numpy/numpy/issues/9537

It seems to me that the "correct" behavior may be for it to me
impossible to close the memmap while pointers to it exist; this is the
behavior for `memoryview`s of mmaps. That is, your line `mm.close()`
shoud raise an error `BufferError: cannot close exported pointers exist`.


> Best wishes
> Isaia
> 
> P.S. A longer account of the issue may be found on my university blog
> http://www.im.ufrj.br/nisoli/blog/?p=131
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion