On 12/25/18, Steven D'Aprano <st...@pearwood.info> wrote: > On Tue, Dec 25, 2018 at 04:51:18PM -0600, eryk sun wrote: >> >> Alternatively, we can memory-map the file via mmap. An important >> difference is that the mmap buffer interface is low-level (e.g. no >> file pointer and the offset has to be page aligned), so we have to >> slice out bytes for the given offset and size. We can avoid copying >> via memoryview slices. > > Seems awfully complicated. How do we do all these things, and what > advantage does it give?
Refer to the mmap and memoryview docs. It is more complex, not significantly, but not something I'd suggest to a novice. Anyway, another disadvantage is that this requires a real OS file, not just a file-like interface. One possible advantage is that we can work naively and rely on the OS to move pages of the file to and from memory on demand. However, making this really convenient requires the ability to access memory directly with on-demand conversion, as is possible with ctypes (records & arrays) or numpy (arrays). Out of the box, multiprocessing works like this for shared-memory access. For example: import ctypes import multiprocessing class Record(ctypes.LittleEndianStructure): _pack_ = 1 _fields_ = (('a', ctypes.c_int), ('b', ctypes.c_char * 4)) a = multiprocessing.Array(Record, 2) a[0].a = 1 a[0].b = b'spam' a[1].a = 2 a[1].b = b'eggs' >>> a._obj <multiprocessing.sharedctypes.Record_Array_2 object at 0x7f96974c9f28> Shared values and arrays are accessed out of a heap that uses arenas backed by mmap instances: >>> a._obj._wrapper._state ((<multiprocessing.heap.Arena object at 0x7f96991faf28>, 0, 16), 16) >>> a._obj._wrapper._state[0][0].buffer <mmap.mmap object at 0x7f96974c4d68> The two records are stored in this shared memory: >>> a._obj._wrapper._state[0][0].buffer[:16] b'\x01\x00\x00\x00spam\x02\x00\x00\x00eggs' >> We can also use ctypes instead of >> memoryview/struct. > > Only if you want non-portable code. ctypes has good support for at least Linux and Windows, but it's an optional package in CPython's standard library and not necessarily available with other implementations. > What advantage over struct is ctypes? If it's available, I find that ctypes is often more convenient than the manual pack/unpack approach of struct. If we're writing to the file, ctypes lets us directly assign data to arrays and the fields of records on disk (the ctypes instance knows the address and its data descriptors handle converting values implicitly). The tradeoff is that defining structures in ctypes can be tedious (_pack_, _fields_) compared to the simple format strings of the struct module. With ctypes it helps to already be fluent in C. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/