Do you mean "reservation" rather than "ref-counting"? Am 03.04.24 um 17:07 schrieb Hannes Duerr: > As Fabian has already mentioned here[0], there can be a race between two > parallel imports. More specifically, if both imports have --allow-rename > set and the desired name already exists, then it can happen that both > imports get the same name. The reason for this is that we currently only > check which names have already been assigned in the vm config and then > use the next free one, and do not check whether a running process has > already been assigned the same name. However, while writing and testing > the patch, I found that this is often not a problem, as > - in most cases the source VM config is locked and therefore only one > process can generate a new name (migrate, clone, move disk, update > config) > - a name must be specified and therefore no new name is generated (pvesm > alloc) > - the timeframe for the race is very short. > > At the same time, it is possible that I have not considered all edge > cases and that there are other cases where the race can occur, > especially with regard to remote_migrations. You can provoke the problem > with two parallel imports on a host where local-lvm:vm-100-disk-0 > already exists: > > pvesm import local-lvm:vm-100-disk-0 raw+size <exported_volume> > --allow-rename 1 >
>From a quick glance, it seems remote migration does one disk-import at a time, i.e. only starts the next one once the previous one is finished. In fact, even a malicious client could only do one import at a time, because there only is one socket (per VM): > $params->{unix} = "/run/qemu-server/$state->{vmid}.storage"; > > return PVE::StorageTunnel::handle_disk_import($state, > $params); > Now that I've looked into the problem a bit, I'm not sure this patch is > even necessary as it adds more complexity. So I wanted to ask for your > opinion, wether you think it makes sense to add this change or not. > Please either use RFC for sending the patch then and/or put this paragraph below the --- so it won't appear in the commit message. IMHO, it would be nice to fix, but yes, it is a bunch of new code. I think you could make it slightly simpler though, by doing the following in the locked section: 1. iterate over reservations -> if stale, remove -> if not stale, add to $disk_list 2. get the next free name using the larger $disk_list 3. print the updated list of reservations including the new name > The patch introduces a tmp file which stores the newly assigned disk > names and the pid of the process which requested the disk name. If a > second process is assigned the same name, it will see from the file that > the name has already been assigned to another process, and will take the > next available name. Reading and writing to the tmp file requires a lock > to prevent races. > > [0] https://lists.proxmox.com/pipermail/pve-devel/2024-January/061526.html > > Signed-off-by: Hannes Duerr <h.du...@proxmox.com> > --- > src/PVE/Storage/Plugin.pm | 99 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 86 insertions(+), 13 deletions(-) > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > index 7456c8e..f76550a 100644 > --- a/src/PVE/Storage/Plugin.pm > +++ b/src/PVE/Storage/Plugin.pm > @@ -10,8 +10,10 @@ use File::chdir; > use File::Path; > use File::Basename; > use File::stat qw(); > +use File::Copy; > So this is used for the single move() below? Any reason not to use file_set_contents() like in other places? You are using locking after all and then you would also not need the tmp file. > use PVE::Tools qw(run_command); > +use PVE::ProcFSTools; > use PVE::JSONSchema qw(get_standard_option register_standard_option); > use PVE::Cluster qw(cfs_register_file); > > @@ -779,23 +781,94 @@ my $get_vm_disk_number = sub { > sub get_next_vm_diskname { > my ($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix) = @_; > > - $fmt //= ''; > - my $prefix = ($fmt eq 'subvol') ? 'subvol' : 'vm'; > - my $suffix = $add_fmt_suffix ? ".$fmt" : ''; > + my $code = sub { > + my $reserved_names_file = "/var/tmp/pve-reserved-volnames"; > + my $tmp_file = "/var/tmp/pve-reserved-volnames.tmp"; > > - my $disk_ids = {}; > - foreach my $disk (@$disk_list) { > - my $disknum = $get_vm_disk_number->($disk, $scfg, $vmid, $suffix); > - $disk_ids->{$disknum} = 1 if defined($disknum); > - } > + $fmt //= ''; > + my $prefix = ($fmt eq 'subvol') ? 'subvol' : 'vm'; > + my $suffix = $add_fmt_suffix ? ".$fmt" : ''; > + my $disk_id; > + my $disk_ids = {}; > > - for (my $i = 0; $i < $MAX_VOLUMES_PER_GUEST; $i++) { > - if (!$disk_ids->{$i}) { > - return "$prefix-$vmid-disk-$i$suffix"; > + foreach my $disk (@$disk_list) { > + my $disknum = $get_vm_disk_number->($disk, $scfg, $vmid, $suffix); > + $disk_ids->{$disknum} = 1 if defined($disknum); > } > - } > > - die "unable to allocate an image name for VM $vmid in storage > '$storeid'\n" > + for (my $i = 0; $i < $MAX_VOLUMES_PER_GUEST; $i++) { > + if (!$disk_ids->{$i}) { > + $disk_id = $i; > + last; > + } > + } > + > + if (! -e $reserved_names_file) { > + my $create_h = IO::File->new($reserved_names_file, "w") || > + die "can't open or create'$reserved_names_file' - $!\n"; > + print $create_h "$storeid $vmid $disk_id $$"; Not sure about using PIDs here. PIDs are re-used and can't this also be the PID of some long-running daemon? Maybe we can use some expire time like the next_unused_port() helper in PVE::Tools does? Can also be a few minutes in this case, as we don't expect reservations to saturate. [...] > + my $new_disk_name = > PVE::Tools::lock_file('/var/lock/pve-disk-names.lck', 10, $code); The lock file is local to the node, so this won't work for shared storages. For those, you would need track the reservations in a file on the cluster file system. > + die $@ if $@; > + > + if (!$new_disk_name) { > + die "unable to allocate an image name for VM $vmid in storage > '$storeid'\n"; > + } > + return $new_disk_name; > } > > sub find_free_diskname { _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel