On Thu, May 30, 2024 at 01:12:40PM -0400, Steven Sistare wrote: > On 5/29/2024 3:25 PM, Peter Xu wrote: > > On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote: > > > On 5/28/2024 5:44 PM, Peter Xu wrote: > > > > On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote: > > > > > Preserve fields of RAMBlocks that allocate their host memory during > > > > > CPR so > > > > > the RAM allocation can be recovered. > > > > > > > > This sentence itself did not explain much, IMHO. QEMU can share memory > > > > using fd based memory already of all kinds, as long as the memory > > > > backend > > > > is path-based it can be shared by sharing the same paths to dst. > > > > > > > > This reads very confusing as a generic concept. I mean, QEMU migration > > > > relies on so many things to work right. We mostly asks the users to > > > > "use > > > > exactly the same cmdline for src/dst QEMU unless you know what you're > > > > doing", otherwise many things can break. That should also include > > > > ramblock > > > > being matched between src/dst due to the same cmdlines provided on both > > > > sides. It'll be confusing to mention this when we thought the ramblocks > > > > also rely on that fact. > > > > > > > > So IIUC this sentence should be dropped in the real patch, and I'll try > > > > to > > > > guess the real reason with below.. > > > > > > The properties of the implicitly created ramblocks must be preserved. > > > The defaults can and do change between qemu releases, even when the > > > command-line > > > parameters do not change for the explicit objects that cause these > > > implicit > > > ramblocks to be created. > > > > AFAIU, QEMU relies on ramblocks to be the same before this series. Do you > > have an example? Would that already cause issue when migrate? > > Alignment has changed, and used_length vs max_length changed when > resizeable ramblocks were introduced. I have dealt with these issues > while supporting cpr for our internal use, and the learned lesson is to > explicitly communicate the creation-time parameters to new qemu.
Why used_length can change? I'm looking at ram_mig_ram_block_resized(): if (!migration_is_idle()) { /* * Precopy code on the source cannot deal with the size of RAM blocks * changing at random points in time - especially after sending the * RAM block sizes in the migration stream, they must no longer change. * Abort and indicate a proper reason. */ error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr); migration_cancel(err); error_free(err); } We sent used_length upfront of a migration during SETUP phase. Looks like what you're describing can be something different, though? Regarding to rb->align: isn't that mostly a constant, reflecting the MR's alignment? It's set when ramblock is created IIUC: rb->align = mr->align; When will the alignment change? > > These are not an issue for migration because the ramblock is re-created > and the data copied into the new memory. > > > > > > Mirror the mr->align field in the RAMBlock to simplify the vmstate. > > > > > Preserve the old host address, even though it is immediately > > > > > discarded, > > > > > as it will be needed in the future for CPR with iommufd. Preserve > > > > > guest_memfd, even though CPR does not yet support it, to maintain > > > > > vmstate > > > > > compatibility when it becomes supported. > > > > > > > > .. It could be about the vfio vaddr update feature that you mentioned > > > > and > > > > only for iommufd (as IIUC vfio still relies on iova ranges, then it > > > > won't > > > > help here)? > > > > > > > > If so, IMHO we should have this patch (or any variance form) to be there > > > > for your upcoming vfio support. Keeping this around like this will make > > > > the series harder to review. Or is it needed even before VFIO? > > > > > > This patch is needed independently of vfio or iommufd. > > > > > > guest_memfd is independent of vfio or iommufd. It is a recent addition > > > which I have not tried to support, but I added this placeholder field > > > to it can be supported in the future without adding a new field later > > > and maintaining backwards compatibility. > > > > Is guest_memfd the only user so far, then? If so, would it be possible we > > split it as a separate effort on top of the base cpr-exec support? > > I don't understand the question. I am indeed deferring support for > guest_memfd > to a future time. For now, I am adding a blocker, and reserving a field for > it in the preserved ramblock attributes, to avoid adding a subsection later. I meant I'm thinking whether the new ramblock vmsd may not be required for the initial implementation. E.g., IIUC vaddr is required by iommufd, and so far that's not part of the initial support. Then I think a major thing is about the fds to be managed that will need to be shared. If we put guest_memfd aside, it can be really, mostly, about VFIO fds. For that, I'm wondering whether you looked into something like this: commit da3e04b26fd8d15b344944504d5ffa9c5f20b54b Author: Zhenzhong Duan <zhenzhong.d...@intel.com> Date: Tue Nov 21 16:44:10 2023 +0800 vfio/pci: Make vfio cdev pre-openable by passing a file handle I just notice this when I was thinking of a way where it might be possible to avoid QEMU vfio-pci open the device at all, then I found we have something like that already.. Then if the mgmt wants, IIUC that fd can be passed down from Libvirt cleanly to dest qemu in a no-exec context. Would this work too, and cleaner / reusing existing infrastructures? I think it's nice to always have libvirt managing most, or possible, all fds that qemu uses, then we don't even need scm_rights. But I didn't look deeper into this, just a thought. When thinking about this, I also wonder how cpr-exec handles the limited environments like cgroups and especially seccomps. I'm not sure what's the status of that in most cloud environments, but I think exec() / fork() is definitely not always on the seccomp whitelist, and I think that's also another reason why we can think about avoid using them. -- Peter Xu