On 9/26/18 3:26 PM, Laszlo Ersek wrote:
(+Eric)
I see shm_open() is used heavily in ivshmem-related tests. I haven't looked much at shm_open() before. (I've always known it existed in POSIX, but I've never cared.)
I've never actually played with shm_open() myself, but understand the theory of it enough to reply.
So now I first checked what shm_open() would give me over a regular temporary file created with open(); after all, the file descriptor returned by either would have to be mmap()'d. From the rationale in POSIX: <http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_08_14>, it seems like the {shm_open(), mmap()} combo has two significant guarantees over {open(), mmap()}: - the namespace may be distinct (there need not be a writeable filesystem at all), - the shared object will *always* be locked in RAM ("Shared memory is not just simply providing common access to data, it is providing the fastest possible communication between the processes"). The rationale seems to permit, on purpose, an shm_open() implementation that is actually based on open(), using a special file system -- and AIUI, /dev/shm is just that, on Linux. Eric, does the above sound more or less correct?
You're right about it being permitted to be a distinct namespace; on the other hand, it doesn't even have to be a special file system. An implementation could even use a compile-time fixed directory name visible to the rest of the system (although of course you shouldn't rely on being able to use the file system to poke at the shmem objects, nor should you manipulate the file system underneath the reserved directory behind shmem's back if that is what the implementation is using). So I'm less certain of whether you are guaranteed that the shared memory has to be locked in place (where it can never be paged out), since an implementation on top of the filesystem does not have to do such locking - but you are also right that a high quality-of-implementation will strive to keep the memory live rather than paging it out precisely because it is used for interprocess communication that would be penalized if it can be paged out.
If it is correct, then I think shm_open() is exactly what I *don't* want for this use case. Because, while I do need a pathname for an mmap()-able object (regular file, or otherwise), just so I can do: -object memory-backend-file,id=mem-obj,...,mem-path=... \ -device ivshmem-plain,memdev=mem-obj,... , I want the underlying object to put as little pressure on the system that runs the test suite as possible. This means I should specifically ask for a regular file, to be mmap()'d (with MAP_SHARED). Then the kernel knows in advance that it can always page out the dirty stuff, and the mapping shouldn't clash with cgroups, or disabled memory overcommit.
Indeed, shmem CAN be a thin veneer on top of the file system, and support being paged out; but since an implementation that pins the memory such that it cannot page is permitted (and in fact maybe desirable), you are right that using shmem can indeed put pressure on different resources in relation to what you can accomplish by using the file system yourself.
Now, in order to make that actually safe, I should in theory ask for preallocation on the filesystem (otherwise, if the filesystem runs out of space, while the kernel is allocating fs extents in order to flush the dirty pages to them, the process gets a SIGBUS, IIRC). However, because I know that nothing will be in fact dirtied, I can minimize the footprint on the filesystem as well, and forego preallocation too. This suggests that, in my test case, - I call g_file_open_tmp() for creating the temporary file, - pass the returned fd to ftruncate() for resizing the temporary file, - pass the returned pathname to the "memory-backend-file" objects, in the "mem-path" property, - set "share=on", - set "prealloc=off", - "discard-data" is irrelevant (there won't be any dirty pages). Thanks Laszlo
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org