[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Josh Triplett j...@joshtriplett.org added the comment: I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7. -- nosy: +joshtriplett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Josh Triplett j...@joshtriplett.org added the comment: I currently use Python 2.7, and I'd like to make use of memoryview. Specifically, I work on BITS (http://biosbits.org/), which runs Python in ring 0 as part of GRUB, and I'd like to use memoryview to give Python access to data in physical memory. I ran into several of the problems documented here when trying to do so. I'd really love to see a backport of this fixed version into 2.7. More specifically, the primary functionality that I'd like to use exists in 3.3 as PyMemoryView_FromMemory. I've tried to approximate that function using the available API in 2.7, but that led me here. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 373f6cdc6d08 by Stefan Krah in branch 'default': Issue #10181: The decision was to raise a buffer error in memory_exit() http://hg.python.org/cpython/rev/373f6cdc6d08 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: It occurs to me that one thing that *could* be backported to early versions are some of the documentation improvements on how to correctly handle the lifecycle of fields in Py_buffer. That gets messy though because memoryview doesn't behave nicely in those versions, so we'd be violating our own guidelines. Perhaps the relevant sections should be updated with a reference saying that the semantics have been cleaned up in 3.3, and if in doubt, try to follow the 3.3 documentation? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: From the PEP: The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases. Great, that's exactly what I was looking for. - As far as I can see the issue is finished, so I'm closing it. Thanks again everyone for all the help! -- resolution: - fixed stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I'm busy adding the C-API changes to the docs. Regarding the stable ABI: The general mood was to *keep* the removal of the buffer interface for some time, did I get that right? In that case this removal (especially of the Py_buffer struct) should be documented also for 3.2. -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 9d3c7dd55c7f by Stefan Krah in branch 'default': Issue #10181: Add warning that structure layouts in memoryobject.h and http://hg.python.org/cpython/rev/9d3c7dd55c7f -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: For 3.2, there's no removal to document - we asked MvL to drop the buffer APIs from PEP 384, so they've never been part of the stable ABI. From the PEP: The buffer interface (type Py_buffer, type slots bf_getbuffer and bf_releasebuffer, etc) has been omitted from the ABI, since the stability of the Py_buffer structure is not clear at this time. Inclusion in the ABI can be considered in future releases. I agree we shouldn't be hasty in making even the updated buffer interface implementation officially part of the stable ABI. Better to leave it out for 3.3, get some real world feedback on the latest version, then commit to stability for 3.4+. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick Coghlan rep...@bugs.python.org wrote: PEP section makes sense - I plan to mark PEP 3118 as Final once you commit this (or you can do that yourself, for that matter). Array features are complete except for multi-dimensional indexing and slicing. I think it would be nice to add those in a separate issue; it's not a lot of additional code and it doesn't interfere with the current code. The overall array handling scheme is sound. Life would be a bit easier if contiguity flags were a mandatory part of the Py_buffer structure that the exporter has to fill in. Then there is an open issue (#3132) for expanding the struct module syntax, where the wording in some sections of the PEP led to a bit of head-scratching. :) In another issue (#13072) the question came up whether the proposed 'u' and 'w' formats still make sense after PEP-393 (I think they do, they should map to UCS-2 and UCS-4). We need to decide what to do about 2.7 and 3.2. It's pretty difficult by now to separate the bug fixes from the features. I could follow the example of CPU manufacturers: start with the whole feature set and disable as much as possible. Another problem for 2.7 and 3.2 is that the 'B' format would still need to accept bytes instead of ints. Or can we change that as a bug fix? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Aw, I guess I was too optimistic in thinking this was the last gap before we could declare it Final... +1 on creating a new feature request for NumPy style multi-dimensional indexing and slicing support on memoryview (I'm assuming that's what you meant). As far as your last question goes, I'm not sure we *can* salvage this in 2.7/3.2. We do have the option of throwing our hands up in the air and saying Sorry, we couldn't figure out how to fix it without completely rewriting it. Take a look at 3.3 if you want to see how it's *supposed* to work. I hesitate to suggest this (since I'm not volunteering to do the work) but perhaps it would be worth packaging up the 3.3. memoryview and publishing it on PyPI for the benefit of 3.2 and 2.7? May also be worth bringing up on python-dev to see if anyone else has any bright ideas. Myself, I'm not seeing any real choices beyond won't fix, backport 3.3 version and remove/disable the new features, backport 3.3 version, new features and all and publish a full featured version on PyPI. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default': - Issue #10181: New memoryview implementation fixes multiple ownership http://hg.python.org/cpython/rev/3f9b3b6f7ff0 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Thanks, Nick. I'll try to get it done this weekend. I've uploaded Misc/NEWS and Doc/whatsnew/3.3.rst (my apologies to Antoine for plagiarizing the first sentence, I found it hard to come up with a better version). I wasn't sure whether to put the whatsnew entry into the section Other Language Changes or into the PEP section. I chose the latter, since many new features have been added that are part of the PEP. -- Added file: http://bugs.python.org/file24629/issue10181-news.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: PEP section makes sense - I plan to mark PEP 3118 as Final once you commit this (or you can do that yourself, for that matter). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Latest version looks good to me - I vote for landing it whenever you're ready :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file24381/a5c4a96dc981.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I've uploaded a new patch that should address the remaining issues: o In the documentation _testbuffer has been replaced by m.cast() + the now multi-dimensional m.tolist(). o I restored the state of the limited API. If we want to include Py_buffer again, I think this should be done in a separate patch. o Flags of the memoryview object are private. Additionally, because NumPy allows non-aligned array accesses, I changed the packing functions to use memcpy for multi-byte types. On x86/amd64 gcc is smart enough to produce almost exactly the same asm output as before, with a slowdown of 0-1%, depending on the benchmark. On other platforms the situation might be worse, but I don't have access to real hardware where alignment actually matters. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Antoine Pitrou rep...@bugs.python.org wrote: I thought the whole Py_buffer API was only temporarily removed from the limited API until the issues were sorted out: http://bugs.python.org/issue10181#msg125462 I'm talking about the memoryview access macros. It's like PyList_GET_SIZE, it isn't part of the limited API and you have to use PyList_GetSize instead. You're right: I presumed that the macros were excluded temporarily when in fact that had already happened in 62b61abd02b8. The flags are primarily for the memoryview itself to avoid repeated calls to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER to get the view and also needs to determine the contiguity type, that code could also use the flags. But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for example, PyObject_GetBuffer? You seldom have APIs which *expect* a memoryview, I think. Instead, they would expect a buffer-compatible object. That's a good question. It looks to me that the macro was present as PyMemoryView() initially. You renamed it in #3560 (with Guido's approval), and later PyMemoryView_GET_BUFFER appeared in the docs. I think 3rd-party code uses the macros mainly because they are present and, in the case of PyMemoryView_GET_BUFFER, documented. In most situations PyObject_GetBuffer() could be used indeed. Most use cases I can think of would also involve having access to the managed buffer API. As an example, here's a technique that is similar to what goes on in PyMemoryView_GetContiguous(): Suppose you have an initialized bytes object that you want to wrap as a multi-dimensional exporter. Then: - Base the memoryview on the bytes object and keep exactly one reference to it. - Adjust the shape, strides etc. to get the structure you want. - Return the view: You now have a fully compliant exporter that only needs a single Py_DECREF() to disappear and do all cleanup. Of course this could also be exposed as a function, e.g.: /* stealing a reference to bytes */ PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info); So let's make the flags private. What do you prefer? 1) Leave them in memoryview.h, but with a leading underscore: _Py_MEMORYVIEW_C [...] 2) Move them to memoryobject.c, with a leading underscore. 3) Move them to memoryobject.c, without a leading underscore (I find this more readable). 4) Move them to memoryobject.c as MV_C, MV_FORTRAN, etc. Also, I'll add a note to the docs that PyMemoryView_GET_BUFFER can probably be avoided by using PyObject_GetBuffer() directly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Of course this could also be exposed as a function, e.g.: /* stealing a reference to bytes */ PyMemoryView_FromBytesAndInfo(PyObject *bytes, Py_buffer *info); I think we should minimize the number of reference-stealing functions. So let's make the flags private. What do you prefer? I don't really mind, whatever you think is best :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I've been busy fixing the memoryview.h issues first. Could you take a look at: http://hg.python.org/features/pep-3118/file/f020716704d4/Include/memoryobject.h Changes: a) Make all functions and the two buffer access macros part of the limited API again. b) Make the managed buffer API definitely private. c) Make PyBUF_MAX_NDIM=64 part of the official buffer API. I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., since the comment already says ... Don't access their fields directly.. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick Coghlan rep...@bugs.python.org wrote: Radical suggestion: make it public as collections.simple_ndarray? Heh, that's a nice one. :) While I think that the code in _testbuffer.c is sound and very well tested, the API is low-level and optimized for testing all sorts of quirks. Examples: _testbuffer.ndarray has the questionable capability of changing views while buffers are exported. This is to test the pathological case that came up in the discussion. Then, similar to NumPy's ndarray, _testbuffer.ndarray constructs arrays from a flat list. This is a bit clumsy but quite suitable for testing. NumPy's ndarray is also the low level API. I think most people use the high level array: a = array([[1,2], [3,4]], dtype=L) a array([[1, 2], [3, 4]], dtype=uint64) So it would take some effort to polish the API. Meanwhile, to eliminate the use of _testbuffer in the documentation, I think the following might work: I've just added complete support for displaying multi-dimensional lists and multi-dimensional comparisons to memoryobject.c in my private repo. The code is quite succinct and follows exactly the same pattern as copy_base/copy_rec. Then, in the documentation we can use cast() and memoryview.tolist(), but add a comment that cast() is just used for demonstration purposes. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Nick Coghlan rep...@bugs.python.org wrote: Radical suggestion: make it public as collections.simple_ndarray? Heh, that's a nice one. :) Well, the question is: does it have a point? Is this ndarray useful outside of any third-party infrastructure such as the APIs offered by numpy? You might answer that we already have array.array but ISTM that array.array is quite useless in general :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Antoine Pitrou rep...@bugs.python.org wrote: Nick Coghlan rep...@bugs.python.org wrote: Radical suggestion: make it public as collections.simple_ndarray? Heh, that's a nice one. :) Well, the question is: does it have a point? Is this ndarray useful outside of any third-party infrastructure such as the APIs offered by numpy? I think it would be mainly educational. It's easier to understand the whole (multi-dimensional) point of PEP-3118 when there's a concrete object that you can play with. Of course serious users would go straight to NumPy. The other issue is that's it's slightly strange to have a fully featured memoryview with no object in the stdlib that can utilize all features (at least the testing side is now complete). Anyway, right now I don't know myself if I'm for or against it. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: a) Make all functions and the two buffer access macros part of the limited API again. Hmm, I don't think buffer access macros should be part of the limited API. If they are truely important (which I doubt), we should have equivalent functions for them. I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., since the comment already says ... Don't access their fields directly.. My question is whether there is any point in making these flags. Does 3rd-party code want to manipulate memoryview internals, instead of querying the Py_buffer? (and of course the memoryview object is becoming more and more like another Py_buffer :-)) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Antoine Pitrou rep...@bugs.python.org wrote: a) Make all functions and the two buffer access macros part of the limited API again. Hmm, I don't think buffer access macros should be part of the limited API. If they are truely important (which I doubt), we should have equivalent functions for them. I thought the whole Py_buffer API was only temporarily removed from the limited API until the issues were sorted out: http://bugs.python.org/issue10181#msg125462 For instance, here the macros are not excluded: http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h Since the issues seem resolved in general, I thought it was time to restore the previous state. [I just noticed that I didn't revert all of Martin's changes, so xxlimited currently does not build in the pep-3118 repo.] As for the two macros specifically, I know Pauli was using them: http://bugs.python.org/issue10181#msg139775 I think it might be OK to defer the decision about Py_MEMORYVIEW_C etc., since the comment already says ... Don't access their fields directly.. My question is whether there is any point in making these flags. Does 3rd-party code want to manipulate memoryview internals, instead of querying the Py_buffer? The flags are primarily for the memoryview itself to avoid repeated calls to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER to get the view and also needs to determine the contiguity type, that code could also use the flags. Pauli: If you are still following the issue, do you think having access to contiguity flags is useful for 3rd-party code or not? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Antoine Pitrou rep...@bugs.python.org wrote: a) Make all functions and the two buffer access macros part of the limited API again. Hmm, I don't think buffer access macros should be part of the limited API. If they are truely important (which I doubt), we should have equivalent functions for them. I thought the whole Py_buffer API was only temporarily removed from the limited API until the issues were sorted out: http://bugs.python.org/issue10181#msg125462 I'm talking about the memoryview access macros. It's like PyList_GET_SIZE, it isn't part of the limited API and you have to use PyList_GetSize instead. The limited API promises binary (ABI) compatibility accross releases. It's a very strong promise to make and we must be careful not to put too much stuff in the limited API. I don't think the rule for inclusion is we should put everything useful in the limited API. For instance, here the macros are not excluded: http://hg.python.org/cpython/file/3292cc222d5c/Include/memoryobject.h It might be a mistake, then. As for the two macros specifically, I know Pauli was using them: http://bugs.python.org/issue10181#msg139775 Well, Pauli might use them, but it just means his code isn't compatible with the limited API, then. The flags are primarily for the memoryview itself to avoid repeated calls to PyBuffer_IsContiguous(). So when 3rd-party uses PyMemoryView_GET_BUFFER to get the view and also needs to determine the contiguity type, that code could also use the flags. But why would 3rd-party code use PyMemoryView_GET_BUFFER instead of, for example, PyObject_GetBuffer? You seldom have APIs which *expect* a memoryview, I think. Instead, they would expect a buffer-compatible object. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: I'm with Antoine here - we want to be *very* conservative with what we expose through the limited API. Due to the ABI compatibility promise, anything exposed that way is very hard to change. Keeping things out of the limited API isn't really an issue - it just means that anyone that wants to use them needs to rebuild their extensions for each new version of the C API (just as they do now). Addings things to the stable ABI later is easy though - that's why we had Martin take all the Py_buffer related stuff out of the initial version of the limited API. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Also, +1 to replacing _testbuffer with .cast() to demonstrate the multi-dimensional support. Without an actual multi-dimensional array object in the standard library, demonstrating those aspects is always going to be a little tricky. However, it's worth doing so that the likes of NumPy and PIL don't need to explain those interactions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Aside from some minor comments that I included in my review, the latest patch gets a +1 from me. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Antoine's review picked up on several issues I missed or glossed over - I actually agree with his point about making most of the new APIs private rather than public. With regards to exposing _testbuffer in the documentation of memoryview's hash support, perhaps it would be better to use a 1D bytes object + memoryview.cast() to get an officially supported multi-dimensional view of a region of memory? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: With regards to exposing _testbuffer in the documentation of memoryview's hash support, perhaps it would be better to use a 1D bytes object + memoryview.cast() to get an officially supported multi-dimensional view of a region of memory? By the way, I didn't point them, but there are other places in the memoryview doc where _testbuffer is mentioned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I've added some comments on Rietveld. This time, I pasted the email addresses manually into the CC field, apparently without success (I didn't receive mail). Regarding the use of _testbuffer in the docs: I agree that it's strange, on the other hand it illustrates several examples much better (length of a multi-dimensional view, result of cast to ND, etc). So I'm not sure what to do. Of course I'll take out the examples if you really don't like it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Radical suggestion: make it public as collections.simple_ndarray? (prefixing with simple_ to be explicit that this is not even *close* to being the all-singing, all-dancing NumPy.ndarray) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file24323/8dd9f0ea4876.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Revisiting memoryview.size: I foresee problems for NumPy users, since array.size has a different meaning there: x = array([[1,2,3], [4,5,6]], dtype='q') x.shape (2, 3) x.itemsize 8 len(x) 2 x.size 6 x.nbytes 48 So here we have: x.nbytes == product(shape) * itemsize == Py_buffer.len == (virtual!) byte length x.size == product(shape) == number of elements My suggestion is to use memoryview.nbytes as well. memoryview.size would have the additional problem that Py_buffer.len is always the byte size of the logical structure (e.g. after slicing) and not necessarily the byte size of the physical memory area. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: nbytes sounds reasonable to me, given the unfortunate ambiguity of both size and len. As far as #12834 goes, I'm happy to go along with whatever you think is best. You've spent a lot more time down in the guts of the implementation than I have, and the approach you describe seems to make sense :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Err, make that #12384 (oops) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Nick Coghlan ncogh...@gmail.com: -- Removed message: http://bugs.python.org/msg151539 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Kristján Valur Jónsson krist...@ccpgames.com added the comment: Thanks Nick. Looking through this discussion it looks as though you encountered all the confusing bits that I ran into (dup_buffer, ownership issues, reference counting and so forth.) Good work. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Thanks for the comments. Most of them should be easy to fix. Nick Coghlan rep...@bugs.python.org wrote: [...] expose the Py_buffer len field as memoryview.size instead of memoryview.len (to reduce confusion with len(memoryview) and to encourage a conceptual link with sys.getsizeof()). I agree with this. The best thing is probably to have both versions work in 3.3 and remove memoryview.len in 3.4 (or later). As noted in the review, I also think fixing #12384 should be fairly straightforward and it would be nice to have that working when the change is applied. tobytes() currently calls PyBuffer_ToContiguous(..., 'C') from Objects/abstract.c. Since the function repeatedly calls PyBuffer_GetPointer(), it has this comment: /* XXX : This is not going to be the fastest code in the world several optimizations are possible. */ So if the fix for tobytes() should go in, I'd like to use the copy_buffer() function from _testbuffer.c. PyBuffer_ToContiguous() would need to be fixed separately. test_buffer.py already asserts that for each result, result.tolist() and result.tobytes() match the results obtained from PyBuffer_GetPointer(indices) (See: test_buffer.py:779), so I'm not worried about using the same implementation in the tests as in the production code. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Finally reviewed Stefan's latest patch. It mostly looks great to me. Aside from a few minor cosmetic quibbles, the only substantive change I would like is to expose the Py_buffer len field as memoryview.size instead of memoryview.len (to reduce confusion with len(memoryview) and to encourage a conceptual link with sys.getsizeof()). As noted in the review, I also think fixing #12384 should be fairly straightforward and it would be nice to have that working when the change is applied. (Kristjan: I added you to the nosy list, since you commented on another issue that you had been poking around in the memoryview internals and noticed a few dodgy things. Stefan's work on this issue is aimed at comprehensively addressing those problems). -- nosy: +krisvale versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Paul Moore p.f.mo...@gmail.com: -- nosy: +pmoore ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file23185/4492afe05a07.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Revision 4492afe05a07 allows memoryview to handle objects with an __index__() method. This is for compatibility with the struct module (See also #8300). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file23119/51cedae5acfc.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Here's a completely restructured memoryview implementation that I believe is quite robust. Both memoryobject.c (the parts I worked on, which is 91%) and _testbuffer.c have 100% code coverage, including all error conditions [1]. memoryview is tested against the struct module (via _testbuffer's ndarray), array.array and bytearray. To deal with the case explosions inherent to the specification test_buffer.py resorts to brute force in some cases (most notably in testing getbuffer() flags). PyMemoryViewObject is now a PyVarObject with private arrays. Unless ndim = 0, shape and strides are always present after initialization. Memoryview now has support for all native single-value struct module specifiers. I abandoned the idea of using the struct module for packing/unpacking. New benchmarks (See #10227) indicate that _testbuffer's tolist() is over 1000x slower that the optimized versions of memoryview. The cast function from #5231 is completely implemented. Review would be much appreciated. Perhaps it would be possible to do a mini PEP-3118 sprint? Usually I'm never on IRC, but I could dust off irssi for the occasion. These are the main changes in detail: - o Restructure memoryobject.c into sections. o Use struct hack in PyMemoryViewObject for the dynamic arrays. o Rework initialization. o Add a couple of invariants: A new memoryview will always have complete shape/strides/format information, unless ndim = 0. o Add buffer flags: A new memoryview will always have flag information that determines whether it is a scalar (ndim = 0), C/Fortran contiguous, NumPY or PIL style, released or active. This eliminates the need for expensive re-computations during a getbuffer() request. o Add support for all native struct module formats in indexing, assigning and tolist(). o Add memoryview.cast(format=x, shape=y), where x is a native format specifier and y is any multidimensional shape. o Add PEP-3118 compliant getbuffer() method. o Slice assignments now support non-contiguous 1-D slices. o Comparisons now support non-contiguous 1-D buffers. o Representation of empty shape etc. is now an empty tuple (O.K. with Mark Dickinson and Travis Oliphant). [1] 100% coverage requires a patch for systematically triggering failures of Python API functions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: is over 1000x slower that the optimized versions of memoryview. Oh dear, scratch that. Lets make that 15x. ;) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file22909/70a8ccd53ade.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: 70a8ccd53ade fixes conversion of NumPy-style arrays to a suboffset representation. The previous representation did not work for slicing non-contiguous arrays with backward strides. Also, I've added long comments that hopefully explain how slicing with arbitrary strides and suboffsets is supposed to work. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file22873/78fb1181787a.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I thought it might be productive to switch to documentation/test driven development for PEP-3118 in general. So I updated the documentation, trying to spell out the responsibilities of both exporter and consumer as clearly as possible. In order to have a PEP-3118 reference implementation, I wrote Modules/_testbuffer.c and Lib/test/test_buffer.py. The test module contains an ndarray object (independent from NumPy's ndarray) with these features: o Full base object capabilities, including responding to flag specific requests. o Full re-exporter capability: The object obtains a buffer from another exporter and poses as a base object. o Optional capability to change layout while buffers are exported. o Full support for arbitrary format strings using the struct module. o Fortran style arrays. o Arbitrary multidimensional structures, including offsets and negative strides. o Support for converting arrays to suboffset representations. o Support for multidimensional indexing, slicing and tolist(). o Optional support for testing against NumPy. In memoryobject.c I only fixed the buffer release issues that came up. Before proceeding with allocating private arrays for each memoryview, it would be great to have agreement on the revised documentation and a couple of other issues. Documentation - I'll highlight some changes here: 1) view-obj: Must be a new reference to the provider if the buffer is obtained via getbuffer(), otherwise NULL. In case of failure the field MUST be set to NULL (was: undefined). So, logically this should be seen the same way as returning a new reference from a standard Python function. 2) view-buf: This can (and _does_ for Numpy arrays) point to any location in the underlying memory block. If a consumer doesn't request one of the simple or contiguous buffers, all bets are off. 3) view-len: Make it clear that the PEP defines it as product(shape) * itemsize. 4) Ownership for shape, strides, internal: read-only for the consumer. 5) Flag explanations: The new section puts emphasis on which fields a consumer can expect in response to a buffer request, starting with the fields that will always be set. 6) Rule for writable/read-only requests. Only raise if the buffer is read-only and the request is 'writable'. This seems to be the most practical solution. 7) Add NumPy-style and PIL-style sections to make clear what a consumer must be able to handle if complicated structures are requested (thanks Pauli Virtanen for clarifying what valid NumPy arrays may look like). What I would like to spell out clearly: 8) A PEP-3118 compliant provider MUST always be able to respond to a PyBUF_FULL_RO request (i.e. fill in shape AND strides). ctypes doesn't currently do that, but could be fixed easily. This is easy to implement for the exporter (see how PyBuffer_FillInfo() does it in a few lines). Memoryview -- 8) Add PyMemoryView_FromBytes(). This could potentially replace PyMemoryView_FromBuffer(). 9) I've come to think that memoryviews should only be created from PyBUF_FULL_RO requests (this also automatically succeeds if an object is writable, see above). One reason is that we don't constantly have to check for the presence of shape and strides and format (unless ndim = 0). Currently it is possible to create any kind of view via PyMemoryView_FromBuffer(). It would be possible to deprecate it or make it a rule that the buffer argument must have full information. 10) NumPy has a limit of ndim = 64. It would be possible to use that limit as well and make shape, strides and suboffsets static arrays of size 64. Perhaps this is a bit wasteful, it depends on how many views are typically created. 11) test_buffer.py contains an example (see: test_memoryview_release()) that Antoine's test case will not work if a re-exporter is involved. Should we leave that or fix it, too? My apologies for the long list, but it'll be easier to proceed if some rules are written in stone. :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Lenard Lindstrom le...@telus.net: -- nosy: -kermode ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Antoine is right, this needs to be fixed. I think that for *practical* purposes, the existing release() method already behaves like a tryrelease() method: Traceback (most recent call last): File stdin, line 1, in module BufferError: Existing exports of data: object cannot be re-sized So while m1.release() in fact *does* release a buffer, the desired effect (freeing up 'b' for subsequent operations) only happens after also calling m2.release(). This applies to Python and NumPy objects. With this amount of complexity, it might be a good idea to separate refcounting and exports, like so: typedef struct { PyObject_HEAD int released; Py_ssize_t exports; /* total number of exports */ Py_buffer master; } PyManagedBufferObject; typedef struct { PyObject_HEAD PyManagedBufferObject *mbuf; Py_ssize_t shape[Py_MEMORYVIEW_MAXSTATIC]; Py_ssize_t strides[Py_MEMORYVIEW_MAXSTATIC]; Py_ssize_t suboffsets[Py_MEMORYVIEW_MAXSTATIC]; int released; /* for clarity in the code */ Py_ssize_t exports; /* buffer exports */ Py_buffer view; } PyMemoryViewObject; Then it is possible for a MemoryView to call mbuf_release() from memory_release() if self-exports == 0 and --self-mbuf-exports == 0. So, if I understand Antoine correctly, in following graph m1, m2 and m4 could mark themselves as 'released' and decrement self-mbuf-exports. This could happen in any order. What should happen with m2.release()? Raising would be odd here as well, since Pauli stated earlier that some people might actually construct new objects from an exported buffer. m2 could mark itself as 'release_pending', and henceforth only accept releasebuffer requests from b1, b2 and x1. Once those are all in, it releases itself properly and sends the message to mbuf. 'release_pending' could be expressed as (self-released self-exports 0). ManagedBuffer (mbuf) | --- | | m1 (private arrays) m3 (private arrays) | | m2 m4 (private arrays) | - | | | | b1 b2 (regular temporary buffer exports) | x1 (new object constructed from a buffer export) Antoine, was this roughly your suggestion? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: [The first part of the message again, this time via the web interface.] Antoine is right, this needs to be fixed. I think that for *practical* purposes, the existing release() method already behaves like a tryrelease() method: b = bytearray(b'123456789') m1 = memoryview(b) m2 = memoryview(m1) m1.release() b.append(1) Traceback (most recent call last): File stdin, line 1, in module BufferError: Existing exports of data: object cannot be re-sized So while m1.release() in fact *does* release a buffer, the desired effect (freeing up 'b' for subsequent operations) only happens after also calling m2.release(). This applies to Python and NumPy objects. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Le mercredi 06 juillet 2011 à 12:39 +, Stefan Krah a écrit : Antoine, was this roughly your suggestion? I think so (!), but I also agree with Nick that raising BufferError when calling release() on a memoryview with exports is a reasonable alternative. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Yeah, the reason my originally proposed semantics were wrong is because copying (or slicing) a memoryview object and then explicitly releasing that object would always fail through no fault of that code. That would be broken and the only way to fix it is to allow the release() call but not actually call the ReleaseBuffer API until all the buffer references are gone. Exports are different - whenever you copy or slice a memoryview, you get a new memoryview object with the export count set to zero, so it is reasonable to require that anyone that wants to explicitly release the memoryview's buffer reference make sure that there aren't any exports hanging around. Keep in mind that memoryview copies and slices don't increase the export count, as those will refer directly back to the original managed buffer. +1 on tracking buffer exports explicitly rather than relying solely on playing games with refcounts though - while technically redundant in the ManagedBuffer case, I agree it will make the code far more self-explanatory. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick, Pauli, thanks for all the comments. I'm busy implementing the easy changes; then it'll be easier to deal with the flags issues. Pauli: Does numpy use the (undocumented) smalltable array in the Py_buffer structure? We would like to drop it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file22577/718801740bde.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I've uploaded a revised version that addresses several suggestions. I think we have agreement on those now: - Officially ditch smalltable. - Comment static storage fields inside PyMemoryViewObject. - Improve refcounting in PyMemoryView_FromBuffer()/PyMemoryView_FromObject(). - Increment mbuf refcount in memory_getbuf(). - Create separate sections for managedbuffer and memoryview. Still open: - Update documentation. - Should PyManagedBuffer be private to this file? Do we need mbuf_new()? - Add test to _testcapimodule.c. I wrote a small test for the problematic case in PyMemoryView_GetContiguous(), and it indeed returns an unaltered view. I suggest that we leave the NotImplementedError for now and handle that in a separate issue. - Flag handling. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES seem to allow for discontiguous arrays, yet STRIDES - ND - C_CONTIGUOUS. PyBUF_FULL[_RO] | PyBUF_INDIRECT -- PyBUF_FORMAT --[PyBUF_WRITABLE] | PyBUF_STRIDES (This would be used when the consumer can handle strided, discontiguous arrays ...) | PyBUF_ND - PyBUF_CONTIG (why?) | PyBUF_C_CONTIGUOUS (... but the implication chain leads us to a contiguous buffer) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: I'm slightly confused about the implication chain in the flags. PyBUF_STRIDES seem to allow for discontiguous arrays, yet STRIDES - ND - C_CONTIGUOUS. To be honest I have never understood anything about these flags, and I doubt anyone without a numpy background would. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Pauli Virtanen p...@iki.fi added the comment: The flags don't seem to be meant to describe the properties of the buffer, only what the exporter is required to fill in. STRIDES does not imply necessarily discontinuous, only that the `strides` field is present. The C_/F_/ANY_CONTIGUOUS flags imply that the memory layout of an n-dim array is C/Fortran/either contiguous. Why these flags imply STRIDES is probably to make the result unambiguous, and because typically when dealing with n-d arrays you usually need to know the strides anyway. `NULL` `strides` implies C-contiguous, so the CONTIG flag does not imply STRIDES (no idea why it's different from PyBUF_C_CONTIGUOUS). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: It took me a bit of thinking, but I figured out why the contiguous flags imply STRIDES. A quick recap of all the flags: WRITABLE - error if can't support write access FORMAT - request format info in Py_buffer struct. Should never error, but report unsigned bytes if not requested ND - requests shape info in Py_buffer struct. Report 1 dimensional if not requested. Error if data is not C contiguous (as STRIDES is required to handle any non-C contiguous cases). STRIDES - requests shape and stride info. Error if correct buffer access requires stride support and this flag is not passed. C_CONTIGUOUS/F_CONTIGUOUS/ANY_CONTIGUOUS - variants that also request shape and stride info but are limited to handling C contiguous memory, Fortran contiguous memory or either. INDIRECT - requests shape and suboffset info. Error if correct buffer access requires suboffset support and this flag is not passed. So, to address the specific confusion, the basic STRIDES request just says give me the strides info and I can deal with whatever you give me. The CONTIGUOUS variants say give me the strides info, but I can only cope with certain layouts, so error if you can't provide them. ND is a way to say I can copy with multiple dimensions, but only the C version without using strides info Suppose we have a 3x4 array of unsigned bytes (i.e. 12 bytes of data). In C format, the strides info would be [4, 1] (buf[0][0] and buf[0][1] are adjacent in memory, while buf[0][0] and buf[1][0] are 4 bytes apart). In FORTRAN format that layout is different, so the strides info would be [1, 3] (and now buf[0][0] and buf[1][0] are adjacent while buf[0][0] and buf[0][1] are 3 bytes apart). The difference between ND and C_CONTIGUOUS is that the latter asks for both the shape and strides fields in the Py_buffer object to be populated while the former only requests shape information. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: At least, that's the explanation based on the PEP - not sure where CONTIG as an alias for ND (N-dimensional) comes from. But then, smalltable was an undocumented novelty, too :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: To address the should PyManagedBuffer be public? question: yes, I think so. Given the amount of grief the raw PEP 3118 API has caused the memoryview implementation, I expect the easier lifecycle management provided by the PyObject based API may also help 3rd parties. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Regarding the Reitveld cc field: I tend not to add anyone to that and instead post comments to the tracker item to say that I've finished a review in Reitveld. If people want to see details they can go look at the review itself (or remove themselves from the bug nosy list if they have genuinely lost interest). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Regarding the Reitveld cc field: I tend not to add anyone to that and instead post comments to the tracker item to say that I've finished a review in Reitveld. If people want to see details they can go look at the review itself (or remove themselves from the bug nosy list if they have genuinely lost interest). Be aware the Rietveld integration is buggy: for example, I got no notification of the current reviews. So it's better to post a message mentioning the review anyway. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: I don't think that's a bug, it's a missing feature in the integration (there's a request on the metatracker to add automatic notifications of new reviews on the bug itself). I did mention the review above but it would have been easy to miss amongst the other comments. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: I don't think that's a bug, it's a missing feature in the integration (there's a request on the metatracker to add automatic notifications of new reviews on the bug itself). It is a bug, actually. People on the nosy list are also on the Rietveld cc list, but in the wrong form. See http://psf.upfronthosting.co.za/roundup/meta/issue382 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: (and so, for the record, I've added my own small review :)) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Moving this discussion out of the review comments: Antoine is wanting to make release() nondeterministic by having the underlying buffer only released when all views using it either have release() called or are no longer referenced. I contend that release() needs to mean release the underlying memory *right now* or it is completely pointless. The I don't want to care about lifecycle issues approach is already handled quite adequately by the ordinary refcounting semantics. If ensuring that all references have been eliminated before release() is called is too much work for a user then the answer is simple: don't call release() and let the refcounting do the work. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Antoine is wanting to make release() nondeterministic by having the underlying buffer only released when all views using it either have release() called or are no longer referenced. That's not nondeterministic if everyone calls release(). Less so than garbage collection anyway. I contend that release() needs to mean release the underlying memory *right now* or it is completely pointless. The I don't want to care about lifecycle issues approach is already handled quite adequately by the ordinary refcounting semantics. Well, if you assume refcounting and no reference cycles, then release() is AFAICT already useless. See issue9757 for the argument we had with Guido ;) My issue is that until now sliced memoryviews are independent objects and are not affected by the releasing of the original memoryview. With this patch, they are, and that's why I'm advocating for a subtler approach (which would really mirror the current slicing semantics, and wouldn't break compatibility ;)). release() is supposed to mean you can dispose of this memoryview, not you can dispose of any underlying memory area, even if there's some sharing that I as an user don't know anything about (*). By making release() affect related memoryviews we are exposing an internal implementation detail (the PyManagedBuffer sharing) as part of the API. (*) for something similar, if you close() a file-like object obtained through socket.makefile(), it doesn't close the underlying fd until all other file-like objects are closed too -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Antoine Pitrou rep...@bugs.python.org wrote: My issue is that until now sliced memoryviews are independent objects and are not affected by the releasing of the original memoryview. With this patch, they are, and that's why I'm advocating for a subtler approach (which would really mirror the current slicing semantics, and wouldn't break compatibility ;)). I wrote a comment on rietveld (which failed to get mailed again). My plan is to make the sliced views more independent by copying shape, strides, and suboffsets unconditionally on construction. Then it should always be possible to delete views independently. With respect to releasing, the views are of course still dependent. release() is supposed to mean you can dispose of this memoryview, not you can dispose of any underlying memory area, even if there's some sharing that I as an user don't know anything about (*). By making release() affect related memoryviews we are exposing an internal implementation detail (the PyManagedBuffer sharing) as part of the API. I thought the rationale for the release() method was to allow sequences like: b = bytearray() m1 = memoryview(b) m1.release() - must call releasebuffer instantly. b.resize(10) - this might fail otherwise if the garbage collection is too slow. So I think releasebuffer must be called on the original base object, and only the ManagedBuffer can do that. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: I thought the rationale for the release() method was to allow sequences like: b = bytearray() m1 = memoryview(b) m1.release() - must call releasebuffer instantly. b.resize(10) - this might fail otherwise if the garbage collection is too slow. Well, that would still work with my proposal. Now consider: def some_library_function(byteslike): with memoryview(byteslike) as m2: # do something with m2 with memoryview(some_object) as m1: some_library_function(m1) ... print(m1[0]) That m1 becomes unusable after m2 is released in the library function is completely counter-intuitive, and will make memoryviews a pain to use in real life. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: If someone is calling release() on all of their views (including slices) than they won't have any problems. The only way they can get into trouble is if they have a slice or copy that they *aren't* explicitly releasing, and in that case they *already* have a problem and we're just picking it up sooner (although, in the case of CPython, refcounting will likely take care of it, just as it does for similar problems with files). This is why I suggested that we should start exposing the source object at the Python level as an attribute of memoryview objects. Then people can decide for themselves if they want a view-of-a-view by slicing/copying the memoryview directly or a new, independent view by going back to the original object. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Oops, Antoine's right, the release() semantics in the patch are broken, albeit for the precisely opposite reasons: that example will actually blow up with BufferError inside some_library_function(). I withdraw my objection - Antoine's right that release() on a memoryview object has to just mean drop the reference to the ManagedBuffer instance. Calling release() when there are actual *exported* buffers outstanding should still trigger BufferError though (slices and copies don't count in that case, as they'll have their own independent references to the ManagedBuffer instance). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Sorry, I mischaracterised Antoine's suggestion in my last comment. It's more in the nature of ManagedBuffer exposing two APIs: 1. request_release(): tells ManagedBuffer if I have the last reference, release the buffer now. 2. release(): immediately releases the buffer, or triggers SystemError if there are additional references to the buffer. memoryview.release() would call the first method before dropping the reference to the buffer. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: In order to have a basis for discussion, I've set up a repo at http://hg.python.org/features/pep-3118#memoryview with an implementation of PyManagedBuffer. The whole test suite passes, also with refleak counting and Valgrind. Review is most welcome. If you think that this is roughly what PyManagedBuffer is supposed to look like, I can add some tests and documentation. A couple of remarks: o I have used refcounting assumptions that appear to be valid for the Python codebase, but that need to be documented. o The docs state that if an underlying object supports writable views, the memoryview will be readable and writable. I use this in PyManagedBuffer_FromObject(). In fact there are tests that write to the original exporting object. o Releasing a view raises if more than one view is active. o get_shape0() is used in memory_length(). This does not seem correct if ndim 1. o memory_getbuf() is still not a real getbufferproc, since it disregards almost all flags. o PyMemoryView_GetContiguous() needs to create a view with an underlying PyManagedBuffer. If the obj parameter is already a memoryview, the existing PyManagedBuffer has to be used, so I just do a check if the requested buffertype agrees with view-readonly. It would be possible to complicate the code further by having a PyMemoryView_FromObjectAndFlags(). o In the non-contiguous case, PyMemoryView_GetContiguous() creates a new buffer, but the returned view does not use that buffer. -- hgrepos: +36 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by Stefan Krah stefan-use...@bytereef.org: Added file: http://bugs.python.org/file22564/bbe70ca4e0e5.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Changes by STINNER Victor victor.stin...@haypocalc.com: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick Coghlan rep...@bugs.python.org wrote: The reason redirecting all requests to the underlying object doesn't work is because repeated calls to getbuffer on mutable objects are allowed to return *different* answers. Assume we have a mutable array type that allows changes while memory is exported by keeping the old buffer around until all references are released. Then we want the following behaviour: The builtin bytearray avoids this issue by simply disallowing mutation while a buffer is exported, but memoryview needs to cope with arbitrary third party objects which may behave like the hypothetical mutable_byte_array(). I find this behavior quite sane. Would it not be an option to make this a requirement in the PEP? As far as I can see, Numpy also disallows mutations on exported buffers. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Pauli Virtanen p...@iki.fi added the comment: @skrah: Yes, Numpy exposes only a single buffer per object. Making this a requirement in the PEP would probably be a sane change, as there is probably little real-world need to allow it behave otherwise. Comment on the patch: it seems you do not track the re-export count in memory_getbuf: a = memoryview(obj) b = numpy.asarray(a) a.release() b[0] = 123 # -- BOOM: the buffer was already released Could be fixed by Py_INCREF(self-mbuf) in getbuffer and DECREF in releasebuffer. In this design, the only choice is to make the `release()` call to fail. (I had some code for n-dim slicing etc. in my first patch that could be useful to have too; I'll see if I find time to dig them out here.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Pauli Virtanen rep...@bugs.python.org wrote: Comment on the patch: it seems you do not track the re-export count in memory_getbuf: a = memoryview(obj) b = numpy.asarray(a) a.release() b[0] = 123 # -- BOOM: the buffer was already released Could you give the exact sequence of events (including the creation of obj)? For me this works: Python 3.3a0 (memoryview:bbe70ca4e0e5+, Jul 4 2011, 13:55:55) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. import numpy obj = bytearray(b'123456789') a = memoryview(obj) b = numpy.asarray(a) a.release() Traceback (most recent call last): File stdin, line 1, in module BufferError: several memoryviews are active b[0] = 123 b array([123, 50, 51, 52, 53, 54, 55, 56, 57], dtype=uint8) del a b[0] = 224 b array([224, 50, 51, 52, 53, 54, 55, 56, 57], dtype=uint8) (I had some code for n-dim slicing etc. in my first patch that could be useful to have too; I'll see if I find time to dig them out here.) That would be nice. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: It might be worth postponing the multi-dimensional support to a second patch, though. If we can get the buffer lifecycle solid first then that provides a better foundation for any further development. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: As far as the rule of disallowing shape changes while a buffer is exported, I actually think that's a more sane approach as well. However, I've been burned enough times by going nobody would be insane enough to rely on that, would they? that I'm seriously reluctant to impose additional backwards incompatible rules on a published spec when it isn't *too* difficult to adjust the infrastructure to better handle the existing complexity. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: [I agree that multi-dimensional support should not be part of this patch. I was thinking about creating a separate branch for that.] Nick Coghlan rep...@bugs.python.org wrote: As far as the rule of disallowing shape changes while a buffer is exported, I actually think that's a more sane approach as well. However, I've been burned enough times by going nobody would be insane enough to rely on that, would they? that I'm seriously reluctant to impose additional backwards incompatible rules on a published spec when it isn't *too* difficult to adjust the infrastructure to better handle the existing complexity. Yes, I understand. However, Pauli said earlier: I don't see many use cases for the same object exporting multiple different buffers. It's not needed by Numpy, and I suspect there is no existing 3rd party code that relies on this (because it doesn't work with the current implementation of memoryview :) That's why I thought that no one could be using this feature at present. The main problem that have with PyManagedBuffer is that the capabilities of the original underlying buffer (flags) *at the time of the export* aren't stored anywhere. This makes it hard to respond to specific queries in memory_getbuf() and PyMemoryObject_GetContiguous(). You can't query the original object again since it might have changed. I suppose that an existing memoryview could recompute its own capabilities by using PyBuffer_IsContiguous() and friends. It would be much easier though if the original flags were stored in the buffer by PyObject_GetBuffer(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Nice work with the patch Stefan - I've added a few review comments (including a suggestion on how to deal with the GetContiguous problem). One idea that review did prompt is that if we aren't going back to the original object for fresh buffer requests, perhaps we should expose the underlying object as a read-only property of the memoryview objects. That way if the original view is unsuitable (e.g. read-only when a writable buffer is needed) it should be straightforward to go back to the underlying object to request something different. Another question Stefan raised is whether or not PyMemoryView_FromObject should accept a flags argument that it passes on to the underlying getbuffer call. I'm inclined to say yes - ideally I'd like to *only* expose PyMemoryView and PyManagedBuffer (rather than GetBuffer/ReleaseBuffer) for the limited API, which means accepting the flags argument for the two higher level interfaces. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: Is there anything stopping us just storing the flags on PyManagedBuffer? It's OK if the construction API requires the flag information in addition to the Py_buffer struct. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Pauli Virtanen p...@iki.fi added the comment: @skrah: Ahh, this always happens when I don't run it :) But my point stands -- the reason why it does not crash with Numpy is that Numpy calls PyMemoryView_FromObject to obtain a new memoryview and then uses PyMemoryView_GET_BUFFER. Along this code path the refcount of self-mbuf gets properly incremented, as it's explicitly done in PyMemoryView_FromObject. However, if you have a custom object Foo which just calls `PyObject_GetBuffer`, and then do the same sequence a = memoryview(whatever) b = Foo(a) # -- only grabs the buffer with PyObject_GetBuffer a.release() # -- will invalidate the buffer b.mogrify_buffer_contents() # -- boom Here, the buffer is released too early, as the refcount of `self-mbuf` is not incremented during the `PyObject_GetBuffer` call. Is there anything stopping us just storing the flags on PyManagedBuffer? Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`. It's OK if the construction API requires the flag information in addition to the Py_buffer struct. This would be useful, yes. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: On Tue, Jul 5, 2011 at 12:36 AM, Pauli Virtanen rep...@bugs.python.org wrote: Slicing memoryviews can invalidate the contiguity flags, and no-strides flags, so some checks are still probably needed in `memory_getbuf`. That makes sense, so just as memoryview has its own shape and strides data, it will also need its own flags data that *starts* as a copy of that in the original buffer, but may be subsequently modified. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick, you know a lot about this issue and I'm probably missing many things here. I misunderstood your concept of PyManagedBuffer, so my previous posting might have been hard to understand. I'd appreciate if you (or anyone in this thread) could comment if the following would work *in theory*, even if you are against an additional getslicedbufferproc: As I understand, there are two major issues that complicate the code: 1) the copying in PyMemoryView_FromBuffer() 2) slicing To address 1), I wanted to create new memoryview objects exclusively from proper base objects that implement the buffer protocol. So the plan was to create small wrapper object inside PyMemoryView_FromBuffer() that handles enough of the buffer protocol to be usable inside the stdlib for one-dimensional objects. The actual memoryview would then be created by calling PyMemoryView_FromObject() on that wrapper. [PyMemoryView_FromObject() would then obviously not call PyMemoryView_FromBuffer(), but would create the view directly.] To address 2), buffers would *always* have to be filled in by the original exporting object, hence the proposal to add a getslicedbufferproc. Then memoryview would always have a proper base object and could always call getbuffer()/INCREF(base) and releasebuffer()/DECREF(base). I thought this would make the code much cleaner. Direct: the view is directly accessing an underlying object via the PEP 3118 API Indirect: the view has a reference to another memoryview object that it is using as a data source Is there still a difference if only the original base object manages buffers and they are never copied? This is better than requiring that every implementor of the buffer API worry about the slicing logic - we can do it right in memoryview and then implementers of producer objects don't have to worry about it. I'm not sure, but my reasoning was that e.g. in numpy the slicing logic is already in place. Then again, I don't know if it is a legitimate use of buf, shapes and strides to implement slicing. According to this mail, slicing information was supposed to be part of the memoryview struct: http://mail.python.org/pipermail/python-dev/2007-April/072584.html -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Pauli Virtanen p...@iki.fi added the comment: skrah writes: I think slicing (esp. multidimensional slicing) would be greatly simplified if we added a requirement for the *exporting* object to provide a sliced view. (The same applies to sub-views, also see source comments below [1]). For example, an exporting object could provide a sliced view by adding a getslicedbufferproc to PyBufferProcs: int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, int flags, PyObject *key); The same thing can be done via PyObject_GetBuffer(obj, view, flags); PyBuffer_Slice(view, sliced_view, flags, key); given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does not depend on where the buffer comes from, and every buffer can be sliced. As far as I see, the advantage of `getslicedbufferproc` would be to make the implementation of PyMemoryView simpler, but not much else. In my view, having each exporter implement the same logic by itself would only be an unnecessary burden. o The invariant that all allocated memory in the buffer belongs to the exporting object remains intact. Numpy arrays do not have this invariant, and they happily re-export memory owned by someone else. This is one root of problems here: the PEP implicitly assumes that re-exporting buffers (e.g. memoryview's implementation of `getbuffer`) is done in the way Numpy does it. Because of this, there is no mechanism for incrementing the refcount of an existing buffer export. Maintaining the above invariant then unavoidably leads to strange behavior in corner cases (which probably are very rare, as mentioned above), and as happened here, make the implementation messy and lead to bugs. The invariant *is* required for guaranteeing that `memoryview.release()` always succeeds. Such a method probably wasn't foreseen in the PEP (and I did not remember that it existed in my first patch), as Numpy arrays don't have any equivalent. The alternatives here are (i) do as Numpy does and give up the invariant and allow `.release()` to fail in some cases, or (ii) document the corner cases in the interface spec and try to detect them and fail if they occur. Which of these is chosen probably does not matter much in practice, but having PyManagedBuffer will make implementing either choice easier. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: I'll try to do a summary of the conversation so far, since it's quite long and hard to follow. The basic issue is that memoryview needs to support copying and slicing that creates a new memoryview object. The major problem with that is that the PEP 3118 semantics as implemented operate in such a way that neither copying the Py_buffer struct *nor* requesting a new copy of the struct from the underlying object will do the right thing in all cases. (According to the PEP *as written* copying probably should have been OK, but the implementation doesn't match the PEP in several important respects such that copying is definitely wrong in the absence of tight control of the lifecycles of copies relative to the original). Therefore, we either need to redesign the buffer export from memoryview to use daisy chaining (such that in m = memoryview(obj); m2 = m[:]; m3 = m2[:] m3 references m2 which references m which in turn references obj) or else we need to introduce an internal reference counted object (PyManagedBuffer) which allows a single view of an underlying object to be safely shared amongst multiple clients (such that m, m2 and m3 would all reference the same managed buffer instance which holds the reference to obj). My preference is strongly for the latter approach as it prevents unbounded and wasteful daisy chaining while also providing a clean, easy to use interface that will make it easier for 3rd parties to write PEP 3118 API consumers (by using PyManagedBuffer instead of the raw Py_buffer struct). Once that basic lifecycle problem for the underlying buffers is dealt with then we can start worrying about other problems like exporting Py_buffer objects from memoryview instances correctly. The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time. The actual *slicing* code in memoryview isn't too bad - it just needs to use dedicated storage rather than messing with the contents of the Py_buffer struct it received from the underlying object. Probably the easiest way to handle that is by having the PyManagedBuffer reference be in *addition* to the current Py_buffer struct in the internal state - then the latter can be used to record the effects of the slicing, if any. Because we know the original Py_buffer struct is guaranteed to remain alive and unmodified, we don't need to worry about fiddling with any copied pointers - we can just leave them pointing into the original structure. When accessed via the PEP 3118 API, memoryview objects would then export that modified Py_buffer struct rather than the original one (so daisychaining would be possible, but we wouldn't make it easy to do from pure Python code, as both the memoryview constructor and slicing would give each new memoryview object a reference to the original managed buffer and just update the internal view details as appropriate. Here's the current MemoryView definition: typedef struct { PyObject_HEAD Py_buffer view; } PyMemoryViewObject; The TL;DR version of the above is that I would like to see it become: typedef struct { PyObject_HEAD PyManagedBuffer source_data; // shared read-only Py_buffer access Py_buffer view; // shape, strides, etc potentially modified } PyMemoryViewObject; Once the internal Py_buffer had been initialised, the memoryview code actually wouldn't *use* the source data reference all that much (aside from eventually releasing the buffer, it wouldn't use it at all). Instead, that reference would be retained solely to control the lifecycle of the original Py_buffer object relative to the modified copies in the various memoryview instances. Does all that make my perspective any clearer? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Pauli Virtanen rep...@bugs.python.org wrote: skrah writes: For example, an exporting object could provide a sliced view by adding a getslicedbufferproc to PyBufferProcs: int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, int flags, PyObject *key); The same thing can be done via PyObject_GetBuffer(obj, view, flags); PyBuffer_Slice(view, sliced_view, flags, key); given an implementation of PyBuffer_Slice. The logic in PyBuffer_Slice does not depend on where the buffer comes from, and every buffer can be sliced. Ok, that sounds good. I came across a comment that base objects can change their memory layout: http://mail.python.org/pipermail/python-dev/2007-April/072606.html Is that something that must be taken care of? o The invariant that all allocated memory in the buffer belongs to the exporting object remains intact. Numpy arrays do not have this invariant, and they happily re-export memory owned by someone else. I'm not sure if we use the same terminology. By exporting object I meant the original base object and this is the invariant I wanted: m1 = memoryview(base) # directly from base m2 = memoryview(m1) # redirects getbuffer/releasebuffer to base m3 = memoryview(m2) # redirects getbuffer/releasebuffer to base s1 = m3[1::2, 1::2] # redirects getslicedbuffer/releasebuffer to base That's also what you mean by re-exporting, right? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: Nick Coghlan rep...@bugs.python.org wrote: [Snip liberally] The lifecycle problem is unrelated to the details of the buffer *contents* though - it's entirely about the fact that clients can't safely copy all those pointers (as some may refer to addresses inside the struct) and asking the original object for a fresh copy is permitted to return a different answer each time. The actual *slicing* code in memoryview isn't too bad I promise that I'll keep quiet about the getslicedbufferproc from now on, since there isn't much enthusiasm. :) The reason I kept mentioning it was that I thought it would eliminate the need to copy anything at all. All buffers would come from a single, memory owning base object. Does all that make my perspective any clearer? Yes, thank you. The tricky part is to understand why always redirecting getbuffer/releasebuffer to the underlying *original base object* is not sufficient, but as I understood Pauli's last posting that is due to the addition of the release() method. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Antoine Pitrou pit...@free.fr added the comment: Le Mon, 27 Jun 2011 13:17:57 +, Nick Coghlan rep...@bugs.python.org a écrit : The TL;DR version of the above is that I would like to see it become: typedef struct { PyObject_HEAD PyManagedBuffer source_data; // shared read-only Py_buffer access Py_buffer view; // shape, strides, etc potentially modified } PyMemoryViewObject; Looks ok to me. (well, acceptable anyway) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Lenard Lindstrom le...@telus.net added the comment: Using Python reference counting and the garbage collector to control when PyBuffer_Release is called has problems. First, it assumes the CPython interpreter will always use reference counting. Second, PyBuffer_Release may never get called if circular references exist. Third, an exception could keep a memoryview object alive in a traceback, delaying the PyBuffer_Release call. Finally, it does not play well with the memoryview __enter__, __exit__, and release methods. It makes Python level context management pointless; instead, just del the memoryview instance and hope the garbage collector cooperates. For the old buffer protocol and the Numpy array interface, resources have to be released in an object's destructor at garbage collection time. There is no other choice. If that also becomes the case for the new buffer protocol, then there is little incentive to migrate to it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Pauli Virtanen p...@iki.fi added the comment: Lenard Lindstrom writes: Using Python reference counting and the garbage collector to control when PyBuffer_Release is called has problems. That's not what's being suggested. The refcounting discussed here is an implementation detail internal to memoryview, and does not affect the possibility to implement `release()` and context management reliably. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Lenard Lindstrom le...@telus.net added the comment: It would if the proposed PyManagedBuffer only releases the Py_buffer struct - calls PyBuffer_Release - when its Python reference count goes to zero. So a separate reference count will be maintained instead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Nick Coghlan ncogh...@gmail.com added the comment: memoryview.release() will raise an exception if you call it when the underlying PyManagedBuffer instance has a refcount 1. So the explicit memory management still works, and if you mess it up by leaving dangling references around you'll run the risk of getting an exception (CPython will mostly cleanly things up for you due to the refcounting, but we're well used to CPython users getting away with that kind of bug and being surprised when they port to a fully garbage collected implementation like PyPy, IronPython or Jython). It isn't pretty, but it's the most practical way to cope with the lifecycle issues. The other alternative is to obediently force release of the buffer and then throw exceptions on subsequent *access*, the way we do with files. I'm less in favour of that approach, since it results in additional runtime overhead as you have to keep checking the state validity and I think it generates the exception in the wrong place - the exception should be triggered where the implicit assertion that all other references are gone and this object should be released right now has been violated, not at some later random location where the object happens to be used. (I'm also swayed by the backwards compatibility argument: we can easily move from refcount must be 1 enforcement to the laissez-faire approach used by files. You can't go in the other direction without the potential to break working code) The reason redirecting all requests to the underlying object doesn't work is because repeated calls to getbuffer on mutable objects are allowed to return *different* answers. Assume we have a mutable array type that allows changes while memory is exported by keeping the old buffer around until all references are released. Then we want the following behaviour: a = mutable_byte_array(100) # 100 byte array m1 = memoryview(a) # View of original 100 byte array a.resize(0) # Array is now zero length, but old buffer is still valid m2 = m1[:] m3 = memoryview(m1) # Both m2 and m3 should provide access to the *original* 100 byte # array, not to the now empty *current* array # Having len(m2) and len(m3) differ from len(m1) because the # original source array had changed would be incredibly confusing # When m1, m2 and m3 all go away then the original array will be # released The builtin bytearray avoids this issue by simply disallowing mutation while a buffer is exported, but memoryview needs to cope with arbitrary third party objects which may behave like the hypothetical mutable_byte_array(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)
Stefan Krah stefan-use...@bytereef.org added the comment: [Nick] Another idea we may want to revisit is the PyManagedBuffer concept, which would be a true PyObject that existed solely to simplify sharing of Py_buffer structs. If memoryview used such an object internally, then copying and slicing would be quite simple - just INCREF the managed buffer instance without letting the original source object know anything was going on. I think this is the nicest solution since memoryview would then always have a proper base object. Do I understand correctly that PyManagedBuffer should only handle 1-dimensional objects? There is an additional point about slicing and sub-views: I think slicing (esp. multidimensional slicing) would be greatly simplified if we added a requirement for the *exporting* object to provide a sliced view. (The same applies to sub-views, also see source comments below [1]). For example, an exporting object could provide a sliced view by adding a getslicedbufferproc to PyBufferProcs: int PyObject_GetSlicedBuffer(PyObject *obj, Py_buffer *view, int flags, PyObject *key); By sliced view I mean that the exporting object changes buf, shape and strides. There are several advantages: o The invariant that all allocated memory in the buffer belongs to the exporting object remains intact. o The responsibility for creating correct multidimensional sliced views is shifted to the implementor of the exporting object. [1] memoryobject.c: /* XXX There should be an API to create a subbuffer */ /* XXX: This needs to be fixed so it actually returns a sub-view */ -- nosy: +skrah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10181 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com