Situation: couple of PVE nodes, with a direct link between the two to manage replica and migration. For an hardware failure, the NIC on one of the server (cnpve2) failed, and server need to be powered off.
After node cnpve2 reboot, all replica recovered, apart one (runing on cnpve2): 2026-02-13 09:26:01 121-0: start replication job 2026-02-13 09:26:01 121-0: guest => VM 121, running => 4345 2026-02-13 09:26:01 121-0: volumes => local-zfs:vm-121-disk-0,rpool-data:vm-121-disk-0,rpool-data:vm-121-disk-1 2026-02-13 09:26:04 121-0: freeze guest filesystem 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0 2026-02-13 09:26:06 121-0: create snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1 2026-02-13 09:26:06 121-0: thaw guest filesystem 2026-02-13 09:26:06 121-0: using insecure transmission, rate limit: 10 MByte/s 2026-02-13 09:26:06 121-0: incremental sync 'local-zfs:vm-121-disk-0' (__replicate_121-0_1770876001__ => __replicate_121-0_1770971161__) 2026-02-13 09:26:06 121-0: using a bandwidth limit of 10000000 bytes per second for transferring 'local-zfs:vm-121-disk-0' 2026-02-13 09:26:08 121-0: send from @__replicate_121-0_1770876001__ to rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ estimated size is 2.76G 2026-02-13 09:26:08 121-0: total estimated size is 2.76G 2026-02-13 09:26:08 121-0: TIME SENT SNAPSHOT rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__ 2026-02-13 09:26:08 121-0: 663540 B 648.0 KB 0.69 s 964531 B/s 941.92 KB/s 2026-02-13 09:26:08 121-0: write: Broken pipe 2026-02-13 09:26:08 121-0: warning: cannot send 'rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__': signal received 2026-02-13 09:26:08 121-0: cannot send 'rpool/data/vm-121-disk-0': I/O error 2026-02-13 09:26:08 121-0: command 'zfs send -Rpv -I __replicate_121-0_1770876001__ -- rpool/data/vm-121-disk-0@__replicate_121-0_1770971161__' failed: exit code 1 2026-02-13 09:26:08 121-0: [cnpve1] cannot receive incremental stream: dataset is busy 2026-02-13 09:26:08 121-0: [cnpve1] command 'zfs recv -F -- rpool/data/vm-121-disk-0' failed: exit code 1 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on local-zfs:vm-121-disk-0 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-0 2026-02-13 09:26:08 121-0: delete previous replication snapshot '__replicate_121-0_1770971161__' on rpool-data:vm-121-disk-1 2026-02-13 09:26:08 121-0: end replication job with error: failed to run insecure migration: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=cnpve1' -o 'UserKnownHostsFile=/etc/pve/nodes/cnpve1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' [email protected] -- pvesm import local-zfs:vm-121-disk-0 zfs tcp://10.10.251.0/24 -with-snapshots 1 -snapshot __replicate_121-0_1770971161__ -allow-rename 0 -base __replicate_121-0_1770876001__' failed: exit code 255 On the rebooted node there's no holds: root@cnpve2:~# zfs list -t snapshot | grep 121 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 5.99G - 2.49T - rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 115M - 22.1G - rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 1.57G - 35.4G - root@cnpve2:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve2:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve2:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP on the opposite node too: root@cnpve1:~# zfs list -t snapshot | grep 121 rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 2.49T - rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ 0B - 22.1G - rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ 0B - 35.4G - root@cnpve1:~# zfs holds rpool-data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve1:~# zfs holds rpool-data/vm-121-disk-1@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP root@cnpve1:~# zfs holds rpool/data/vm-121-disk-0@__replicate_121-0_1770876001__ NAME TAG TIMESTAMP It is clear that somethink remains 'locked' on the non-rebooted node (cnpve1), but how identify and unlock them? Thanks. --
