I would advise trying to reproduce. start migration, then either: - configure timeout so that it''s way too low, so that migration fails due to timeouts. - restart mgmt server in the middle of migrations This should cause migration to fail - and you can observe if you have reproduced the problem. keep in mind, that there might be some garbage left, due to not-properly handling the failed migration But from QEMU point of view - if migration fails, by all means the new VM should be destroyed...
On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <www.rakeshv....@gmail.com> wrote: > Hi Andrija > > > Sorry for the late reply. > > Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40 > > Im not sure if ACS job failed or libvirt job as I didnt see into logs. > Yes the vm will be in paused state during migration but after the failed > migration, the same vm was in "running" state on two different hypervisors. > We wrote a script to find out how duplicated vm's are running and found out > that more than 5 vm's had this issue. > > > On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <andrija.pa...@gmail.com> > wrote: > > > I've been running KVM public cloud up to recently and have never seen > such > > behaviour. > > > > What versions (ACS, qemu, libvrit) are you running? > > > > How does the migration fail - ACS job - or libvirt job? > > destination VM is by default always in PAUSED state, until the migration > is > > finished - only then the destination VM (on the new host) will get > RUNNING, > > while previously pausing the original VM (on the old host). > > > > i,e. > > phase1 source vm RUNNING, destination vm PAUSED (RAM content being > > copied over... takes time...) > > phase2 source vm PAUSED, destination vm PAUSED (last bits of RAM > > content are migrated) > > phase3 source vm destroyed, destination VM RUNNING. > > > > Andrija > > > > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh < > www.rakeshv....@gmail.com> > > wrote: > > > > > Hello Users > > > > > > > > > Recently we have seen cases where when the Vm migration fails, > cloudstack > > > ends up running two instances of the same VM on different hypervisors. > > The > > > state will be "running" and not any other transition state. This will > of > > > course lead to corruption of disk. Does CloudStack has any option of > > volume > > > locking so that two instances of the same VM wont be running? > > > Anyone else has faced this issue and found some solution to fix it? > > > > > > We are thinking of using "virtlockd" of libvirt or implementing custom > > lock > > > mechanisms. There are some pros and cons of the both the solutions and > i > > > want your feedback before proceeding further. > > > > > > -- > > > Thanks and regards > > > Rakesh venkatesh > > > > > > > > > -- > > > > Andrija Panić > > > > > -- > Thanks and regards > Rakesh venkatesh > -- Andrija Panić