Thanks for looking into this! On 11/11/2025 10:46, Fabian Grünbichler wrote: > On November 10, 2025 6:06 pm, Friedrich Weber wrote: >> On 07/11/2025 18:19, Friedrich Weber wrote: >>> Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot >>> fails with "qemu-img: Failed to load snapshot: Can't find snapshot". >>> The reason is that the special case for EFI disks calls `qemu-img dd` >>> with the additional `-l snapname` option, which is only valid for >>> qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the >>> source volume is already the volume corresponding to the snapshot. >>> >>> Fix this by checking whether the snapshot in question is an external >>> snapshot, and if it is, omitting the `-l` option. >>> >>> Reported in enterprise support. >> >> Turns out this only fixes the issue in case of snapshot-as-volume-chain >> on file-level storages, not on LVM. The reason is that >> LVMPlugin::volume_snapshot_info does not set the `ext` key to 1 [1], so >> the code doesn't recognize the external snapshot. >> >> LVMPlugin::volume_snapshot_info should probably set `ext` (assuming that >> "ext" means "external")? >> >>> >>> Signed-off-by: Friedrich Weber <[email protected]> >>> --- >>> >>> Notes: >>> I noticed that two oddities about volume_snapshot_info: >>> >>> - it doesn't seem to work for qcow2-internal snapshots: It assumes >>> that if `qemu img info` returns a JSON object, the snapshots are >>> internal, and if it returns a JSON array, they are "external", but >>> this doesn't seem correct -- I think because we pass >>> --backing-chain, qemu-img will return an one-element JSON array even >>> for internal snapshots. Hence, volume_snapshot_info incorrectly >>> returns a hash ref with only a single member "current" if the >>> snapshots are internal. But since I only found callsites guarded >>> with snapshot-as-volume-chain, I guess this didn't hurt until now? >>> >>> It doesn't hurt for my patch either because it only checks for >>> existence of $info->{$snapshot}->{ext} (which also works for >>> qcow2-internal snapshots, because alreay $info->{$snapshot} will not >>> exist then), but that's admittedly not nice and it's probably a good >>> idea to fix volume_snapshot_info in addition. >>> >>> - but in which way should we fix it? Right now, for volume-chain >>> snapshots, it seems like volume_snapshot_info will always have a >>> member "current" pointing to the "actual" volume, even if the volume >>> doesn't have any snapshots. Is this intended and do we want this >>> also in case of qcow2-internal snapshots? From reading >>> ZFSPoolPlugin::volume_snapshot_info, it looks like that one will >>> return an empty hash (without "current") if the volume has no >>> snapshots. >> >> Looks like >> >> - ZFSPoolPlugin::volume_snapshot_info indeed doesn't return "current" >> - but there are several callsites of {LVM,}Plugin::volume_snapshot_info >> that rely on 'current' being present for snapshot-as-volume chain [2] >> >> So the implementations differ w.r.t. returning a "current" entry or not, >> which is IMO a bit awkward API-wise. But fixing this in either way >> (removing it for {LVM,}Plugin or adding it for ZFSPoolPlugin) would >> probably require bigger changes in either the snapshot-as-volume-chain >> or the ZFS replication code? >> >> For fixing this particular bug, I guess we could instead call >> `volume_qemu_snapshot_method` and only add the `-l snapname` parameter >> if that returns 'qemu'. But I think this is only safe because we don't >> allow enabling/disabling snapshot-as-volume-chain for file-level >> storages on-the-fly. >> >> What do you think? > > that sounds like a good approach - changing snapshot-as-volume-chain > already breaks all sorts of things anyway, so that is not supported. >
Sounds good, I sent a v2: https://lore.proxmox.com/pve-devel/[email protected]/T/ > cleaning up the rest would be good as well.. Yes. I didn't have time to look into it more, but what sounds somewhat sensible to me is to - specify that volume_snapshot_info also returns a 'current' entry - add 'current' entry to ZFSPoolPlugin implementation - skip over 'current' entries in the replication callsites of volume_snapshot_info _______________________________________________ pve-devel mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
