I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due
to timeouts.
- restart mgmt server in the middle of migrations
This should cause migration to fail - and you can observe if you have
reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly
handling the failed migration
But from QEMU point of view - if migration fails, by all means the new VM
should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <www.rakeshv....@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the failed
> migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and found out
> that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <andrija.pa...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> www.rakeshv....@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This will
> of
> > > course lead to corruption of disk. Does CloudStack has any option of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

Reply via email to