On 2/25/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > Travis Oliphant wrote: > > > 2. There is no way for a consumer to tell the protocol-exporting > > object it is "finished" with its view of the memory and therefore no way > > for the object to be sure that it can reallocate the pointer to the > > memory that it owns (the array object reallocating its memory after > > sharing it with the buffer object led to the infamous buffer-object > > problem). > > I'm not sure I'd categorise this problem that way -- it was > more the buffer object's fault for assuming that it could > hold on to a C pointer to the memory long-term. > > I'm a bit worried about having a get/release kind of thing > in the protocol, because it risks forcing all objects which > implement the protocol to provide some kind of refcounting > and locking mechanism for their data. Some objects may not > be able to do that easily or efficiently, especially if > they're wrapping some external library that has no such > notion.
Only if their buffer can actually move; if the buffer can't be moved or resized once the object is created, the acquire and release can be no-ops. Another problem that would be solved by this is the current unsafety of blocking I/O operations like file.readinto() and socket.recv_into(). These operations do roughly the following: (1) get the pointer and length from the buffer API (2) release the GIL (3) call the blocking read() or recv() system call with the pointer and length (4) reacquire the GIL The problem is that while the GIL is released, another thread with access to the object whose buffer is being read into, could modify it causing the buffer to be moved in memory, and the read() or recv() operation will be overwriting freed memory (or worse, memory allocated for a different purpose). I realized this thinking about the 3.0 bytes object, but the 2.x array object has the same problems, and probably every other object that uses the buffer API and has a mutable size (if there are any). > > All that is needed is to create a Python "memory_view" object that can > > contain all the information needed and be returned when the buffer > > protocol is called --- when it is garbage-collected, the > > "bp_release_view" function is called on the exporting object. > > That sounds too heavyweight. Getting a memory view through > this protocol should be a very lightweight operation -- ideally > it shouldn't require allocating any memory at all, and it > certainly shouldn't require creating a Python object. I agree that getting the pointer and length should be separated from finding out how the bytes should be interpreted. I'd like to propose a simple stack or hierarchy of classes to address (what I think are) Travis's needs: - At the bottom is a redesigned buffer API: add locking, remove segcount and char buffers. - This API is implemented by things like mmap, and also by a "raw bytes" object which allocates a buffer from the heap; other libraries may have their own objects that implement this (e.g. numpy, PIL). - There is a mixin class (at least conceptually it's a mixin) which takes anything implementing the redesigned buffer API and adds the bytes API (see recently updated PEP 358); operations like .strip() or slicing should return copies (of the same or a different type) or views at the discretion of the underlying object. (Maybe there should be a read-only and read-write version of this; note that read-only is not the same as immutable, since the underlying buffer may be modified by other APIs, if it allows this.) - *Another* API built on top of the redesigned buffer API would be something more aligned with numpy's needs, adding (a) a shape descriptor indicating the size, offset and stride of each dimension, and (b) a record descriptor indicating the interpretation of one element of the array. For (a), a list of 3-tuples of ints would probably be sufficient (constrained so that no valid combination of indexes points outside the buffer); for (b), I propose (with Jim Hugunin who first suggested this at PyCon) to use the same concise but expressing format-string-like notation used by the struct module. (The bytes API is not quite a special case of this, since it provides more string-like operations.) The crucial idea here (like so often :-) is not to use inheritance but composition. This means that we can separate management of the buffer (e.g. malloc, mmap, whatever) from providing APIs on top of this (either the bytes API or the multi-dimensional array API). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com